Overview

In this lesson we review some of the fundamental data types in R.

Objectives

  1. Create and manipulate data frames and lists

Readings

Lander, Ch 5.1, 5.2

1 Data Frames

Data frames are like matrices, in that they store data in a rectangular format. Unlike matrices though, each column can be a different type of data (numbers, characters, dates, etc.). So data frames are the default way to store simple datasets in R.

Data frames can be created like matrices: the data.frame() function automatically cbinds vectors (of the same length) together into a data frame:

df1 <- data.frame(1:4,rep(3,4),c(2,4,1,5))
df1
  X1.4 rep.3..4. c.2..4..1..5.
1    1         3             2
2    2         3             4
3    3         3             1
4    4         3             5

As you can see, it also gives column and row names. We can get the column names with:

colnames(df1)
[1] "X1.4"          "rep.3..4."     "c.2..4..1..5."

1.1 Columns

To change the column names, we just assign a vector of strings to the colnames function:

colnames(df1) <- c("c1","c2","c3")
df1
  c1 c2 c3
1  1  3  2
2  2  3  4
3  3  3  1
4  4  3  5

To work with one of the columns, we use the $ to pick it out of the data frame:

ourcol <- df1$c1
ourcol
[1] 1 2 3 4

ourcol is now a new vector with values taken from the first column of the data frame.

1.2 Subsets

We can also pull out subsets of the dataframe just as we did with matrices:

df1[2:3,2:3]
  c2 c3
2  3  4
3  3  1

or we can pull out the same using column names:

df1[2:3,c("c2","c3")]
  c2 c3
2  3  4
3  3  1
What is df1[c(1,3),"c3"]?
2 1.
What is df1$c3[c(1,3)]?
Yup: 2 1.

1.3 Mixed types

Most importantly, data frames allow us to combine vector types:

df1 <- cbind(df1,c("truck","car","lettuce","porkchop"))
colnames(df1)[4] <- "things"
df1
  c1 c2 c3   things
1  1  3  2    truck
2  2  3  4      car
3  3  3  1  lettuce
4  4  3  5 porkchop

To see the internal structure of our data frame, including what types the variables in it are, we can use the str function:

str(df1)
'data.frame':   4 obs. of  4 variables:
 $ c1    : int  1 2 3 4
 $ c2    : num  3 3 3 3
 $ c3    : num  2 4 1 5
 $ things: Factor w/ 4 levels "car","lettuce",..: 4 1 2 3

1.4 Editing

Using str we saw that our final column has been stored as a factor rather than as a character. To change it, we can use our earlier technique:

df1$things <- as.character(df1$things)
str(df1)
'data.frame':   4 obs. of  4 variables:
 $ c1    : int  1 2 3 4
 $ c2    : num  3 3 3 3
 $ c3    : num  2 4 1 5
 $ things: chr  "truck" "car" "lettuce" "porkchop"

RStudio also allows you to view certain rectangular datasets in a graphical manner that might be more familiar from other statistics programs:

View(df1)

You can even directly edit some data objects using

edit(df1)

But this is strongly not recommended, since it leaves no record of what you have done. Much better to use textual commands to assign new values to your data.

2 Lists

Lists are the most general-purpose data containers in R that we will be using. Lists can hold collections of almost any type of R object: variables, vectors, data frames, other lists, etc.

list1 <- list(1:5,df1,3)
list1
[[1]]
[1] 1 2 3 4 5

[[2]]
  c1 c2 c3   things
1  1  3  2    truck
2  2  3  4      car
3  3  3  1  lettuce
4  4  3  5 porkchop

[[3]]
[1] 3

2.1 List structure

Note that the objects in a list are indexed with the double bracket [[ ]] and can be picked out that way:

list1[[1]][3]
[1] 3

In the above example, the [[1]] picks out the first object in the list (the vector 1:5) and the [3] picks out the third element in that vector (3).

What is list1[[2]][2:4,3] ?
4 1 5. That is, the second through fourth items in the third column of the second element in list1.

2.2 List naming

We haven’t given names to the three objects in list1, so currently they can only be picked out with numbers in [[ ]], just as a data frame without column names can only be identified with column numbers. The way to name objects in a list is much like that for a data frame:

names(list1) <- c("onetofive","df1","three")
list1$onetofive[3]
[1] 3

Note that str() shows you the internal structures of lists just as it does for data frames or any other object.

str(list1)
List of 3
 $ onetofive: int [1:5] 1 2 3 4 5
 $ df1      :'data.frame':  4 obs. of  4 variables:
  ..$ c1    : int [1:4] 1 2 3 4
  ..$ c2    : num [1:4] 3 3 3 3
  ..$ c3    : num [1:4] 2 4 1 5
  ..$ things: chr [1:4] "truck" "car" "lettuce" "porkchop"
 $ three    : num 3

2.3 R’s internal data

Many R objects, including the outputs of many functions, are lists. All the objects in the workspace in R are stored as a list, and can be seen with

ls()
[1] "df1"    "list1"  "ourcol"

This is also what the “Environment” pane in RStudio shows.

2.4 RData file format

R can read many data formats, including those from SPSS, STATA, SAS, and others. We will discuss reading and manipulating external data in a later lesson, but it is useful to know up front R’s native native .RData format, which can contain multiple R objects including lists. For instance, here we save our simple data frame as an RData object, remove that ojbect from R, and then reload it from our file.

save(df1,file="df1.RData")
rm(df1)
load(file="df1.RData",verbose=TRUE)
Loading objects:
  df1
df1
  c1 c2 c3   things
1  1  3  2    truck
2  2  3  4      car
3  3  3  1  lettuce
4  4  3  5 porkchop

Note that the .RData file is a full R entity, including the names of the objects.

Since .Rdata essentially saves list objects, it is easy to save the entirety of your workspace as a (nameless) .Rdata file, which applications like RStudio do automatically:

save.image() 

To reload this file (which is just an invisible file named .Rdata) simply write:

load(".RData")

Finally, to erase everything in your workspace, you can write:

rm(list=ls())