Overview
In this lesson we review some of the fundamental data types in R.
Objectives
Readings
Lander, Ch 5.1, 5.2
Data frames are like matrices, in that they store data in a rectangular format. Unlike matrices though, each column can be a different type of data (numbers, characters, dates, etc.). So data frames are the default way to store simple datasets in R.
Data frames can be created like matrices: the data.frame()
function automatically cbinds vectors (of the same length) together into a data frame:
df1 <- data.frame(1:4,rep(3,4),c(2,4,1,5))
df1
X1.4 rep.3..4. c.2..4..1..5.
1 1 3 2
2 2 3 4
3 3 3 1
4 4 3 5
As you can see, it also gives column and row names. We can get the column names with:
colnames(df1)
[1] "X1.4" "rep.3..4." "c.2..4..1..5."
To change the column names, we just assign a vector of strings to the colnames function:
colnames(df1) <- c("c1","c2","c3")
df1
c1 c2 c3
1 1 3 2
2 2 3 4
3 3 3 1
4 4 3 5
To work with one of the columns, we use the $
to pick it out of the data frame:
ourcol <- df1$c1
ourcol
[1] 1 2 3 4
ourcol
is now a new vector with values taken from the first column of the data frame.
We can also pull out subsets of the dataframe just as we did with matrices:
df1[2:3,2:3]
c2 c3
2 3 4
3 3 1
or we can pull out the same using column names:
df1[2:3,c("c2","c3")]
c2 c3
2 3 4
3 3 1
df1[c(1,3),"c3"]
?
2 1
.
df1$c3[c(1,3)]
?
2 1
.
Most importantly, data frames allow us to combine vector types:
df1 <- cbind(df1,c("truck","car","lettuce","porkchop"))
colnames(df1)[4] <- "things"
df1
c1 c2 c3 things
1 1 3 2 truck
2 2 3 4 car
3 3 3 1 lettuce
4 4 3 5 porkchop
To see the internal structure of our data frame, including what types the variables in it are, we can use the str
function:
str(df1)
'data.frame': 4 obs. of 4 variables:
$ c1 : int 1 2 3 4
$ c2 : num 3 3 3 3
$ c3 : num 2 4 1 5
$ things: Factor w/ 4 levels "car","lettuce",..: 4 1 2 3
Using str
we saw that our final column has been stored as a factor rather than as a character. To change it, we can use our earlier technique:
df1$things <- as.character(df1$things)
str(df1)
'data.frame': 4 obs. of 4 variables:
$ c1 : int 1 2 3 4
$ c2 : num 3 3 3 3
$ c3 : num 2 4 1 5
$ things: chr "truck" "car" "lettuce" "porkchop"
RStudio also allows you to view certain rectangular datasets in a graphical manner that might be more familiar from other statistics programs:
View(df1)
You can even directly edit some data objects using
edit(df1)
But this is strongly not recommended, since it leaves no record of what you have done. Much better to use textual commands to assign new values to your data.
Lists are the most general-purpose data containers in R that we will be using. Lists can hold collections of almost any type of R object: variables, vectors, data frames, other lists, etc.
list1 <- list(1:5,df1,3)
list1
[[1]]
[1] 1 2 3 4 5
[[2]]
c1 c2 c3 things
1 1 3 2 truck
2 2 3 4 car
3 3 3 1 lettuce
4 4 3 5 porkchop
[[3]]
[1] 3
Note that the objects in a list are indexed with the double bracket [[ ]]
and can be picked out that way:
list1[[1]][3]
[1] 3
In the above example, the [[1]]
picks out the first object in the list (the vector 1:5
) and the [3]
picks out the third element in that vector (3
).
list1[[2]][2:4,3]
?
4 1 5
. That is, the second through fourth items in the third column of the second element in list1.
We haven’t given names to the three objects in list1
, so currently they can only be picked out with numbers in [[ ]]
, just as a data frame without column names can only be identified with column numbers. The way to name objects in a list is much like that for a data frame:
names(list1) <- c("onetofive","df1","three")
list1$onetofive[3]
[1] 3
Note that str()
shows you the internal structures of lists just as it does for data frames or any other object.
str(list1)
List of 3
$ onetofive: int [1:5] 1 2 3 4 5
$ df1 :'data.frame': 4 obs. of 4 variables:
..$ c1 : int [1:4] 1 2 3 4
..$ c2 : num [1:4] 3 3 3 3
..$ c3 : num [1:4] 2 4 1 5
..$ things: chr [1:4] "truck" "car" "lettuce" "porkchop"
$ three : num 3
Many R objects, including the outputs of many functions, are lists. All the objects in the workspace in R are stored as a list, and can be seen with
ls()
[1] "df1" "list1" "ourcol"
This is also what the “Environment” pane in RStudio shows.
R can read many data formats, including those from SPSS, STATA, SAS, and others. We will discuss reading and manipulating external data in a later lesson, but it is useful to know up front R’s native native .RData
format, which can contain multiple R objects including lists. For instance, here we save our simple data frame as an RData object, remove that ojbect from R, and then reload it from our file.
save(df1,file="df1.RData")
rm(df1)
load(file="df1.RData",verbose=TRUE)
Loading objects:
df1
df1
c1 c2 c3 things
1 1 3 2 truck
2 2 3 4 car
3 3 3 1 lettuce
4 4 3 5 porkchop
Note that the .RData
file is a full R entity, including the names of the objects.
Since .Rdata
essentially saves list objects, it is easy to save the entirety of your workspace as a (nameless) .Rdata
file, which applications like RStudio do automatically:
save.image()
To reload this file (which is just an invisible file named .Rdata
) simply write:
load(".RData")
Finally, to erase everything in your workspace, you can write:
rm(list=ls())