Overview
In this module we go over the basic mathematical tools and variable types built into R.
Objectives
After completing this module, students should be able to:
Reading
Lander, Chapter 4.1-4.4.
Now that we have installed R and RStudio, we can start playing with it!
Here is perhaps the simplest task you can give R:
1+1
[1] 2
The most direct approach is to simply type 1+1
into the console and hit return. The answer (surprise) is 2
– we’ll explain what that [1]
is doing there in a minute.
R knows the usual mathematical operations: + - * / ^
as well as many others.
Give it a try:
3.14^3+17/2
?
39.45914
, congratulations – your R isn’t broken!
R can also do many other things:
sqrt(2)
[1] 1.414214
sqrt(-1)
Warning in sqrt(-1): NaNs produced
[1] NaN
Note that this produces a NaN
– Not a Number – as a response. R can also handle imaginary numbers, but we won’t be needing that.
log(1000)
[1] 6.907755
Note that log
is the natural log unless otherwise specified, eg: log(1000,10)
.
R knows some fundamental constants:
pi
[1] 3.141593
And can round
round(5.75,1)
[1] 5.8
Again, the second number specifies the option – in this case, the number of decimal places to round to. We’ll see more of such functions in a minute.
And of course order of operation matters. Which is larger?
3.14^(3+17)/2
4340731928
. R handles big numbers well.
3.14^(3+17/2)
518431
. Watch out for those parentheses!
Unlike some programming languages, any string of characters can be a variable.
To assign a variable, rather than use the =
sign (though that also works), we use <-
, which reminds us that the thing on the right is being saved in the thing on the left.
i <- 3
porkchop <- log(6)
To evaluate a variable, we can just type it and hit enter.
porkchop
[1] 1.791759
And to remove it:
rm(i)
As you may have learned in a statistics class, data comes in many varieties: integers (ordinal), continuous numbers (cardinal), factors (categorial), logical (binary), characters (strings), dates, etc.
class
will tell us what type something already is:
class(2.8)
[1] "numeric"
We can also directly inquire about specific types with is.TYPE
, and often coerce a variable of one type into another with as.TYPE
:
is.character("2014-08-13")
[1] TRUE
properdate <- as.Date("2014-08-13")
Note the default date format; see ?as.Date
for more formats.
Is it this type? | Make it this type |
---|---|
is.numeric() |
as.numeric() |
is.character() |
as.character() |
is.factor() |
as.factor() |
NA | as.Date() |
Logical (TRUE/FALSE) is another type, and is the output of many basic tests – equality ==
, inequality !=
, greater than >
, less-than-or-equal-to <=
, etc.
1 == 2
[1] FALSE
1 <= porkchop
[1] TRUE
A vector is an ordered sequence of numbers. One way to create a vector is with c()
v <- c(1,5,9,7,2)
v
[1] 1 5 9 7 2
There are shortcuts for creating sequences and repeated values:
1:10
[1] 1 2 3 4 5 6 7 8 9 10
rep(3,5)
[1] 3 3 3 3 3
You can get the length of a vector:
length(v)
[1] 5
And you can easily pick out specific elements from the vector:
v[3]
[1] 9
v[3:5]
[1] 9 7 2
v[length(v)]
[1] 2
Note how length(v)
, which is a function that gives the length of v
, is evaluated inside the brackets and thus v[length(v)]
gives you the 5th elemenet of v
. You can put quite elaborate equations inside those brackets if you want, but it’s generally better for anything complicated to define it first and then put it inside the brackets. For instance,
myelements <- c(1,2,length(v))
myelements
[1] 1 2 5
v[myelements]
[1] 1 5 2
is more readable than trying to cram it all into one line with v[c(1,2,length(v))]
.
In addition to being able to use anything that produces a list of numbers, you can also pick out elements with anything that produces a list of TRUE/FALSE values. The TRUE values will be selected, and the FALSE values will be ignored. So for instance, v > 3
will produce as its output a vector of TRUE and FALSE values by testing whether each element of v is greater than 3. You can then put that inside the brackets, so that the vector of TRUE/FALSE values selects only those elements of v that get TRUE.
myvals <- v > 3
myvals
[1] FALSE TRUE TRUE TRUE FALSE
v[myvals]
[1] 5 9 7
# or equivalently
v[v > 3]
[1] 5 9 7
This is a very powerful approach to selecting subsets of vectors and other objects in R. Another related way to do this is using which()
, which instead of giving a bunch of TRUE/FALSE values, gives you the element numbers associated with the TRUE elements. This can also be put inside the brackets, for the same result.
which(v > 3)
[1] 2 3 4
v[which(v > 3)]
[1] 5 9 7
You can also reassign values easily using the same notation. Basically to replace or modify anything in R, you just write over it.
v[3] <- 100
v
[1] 1 5 100 7 2
v[3:5] <- c(17,0,11)
v
[1] 1 5 17 0 11
What is v[c(2,4])]
?
5 17 0
v[2:4]
.
5 7
v
! Remember v
has been changed.
5 0
v
.
Using the TRUE/FALSE approach, you can selectively replace elements in your R object:
z <- v
z[z > 3] <- 100
z
[1] 1 100 100 0 100
# or equivalently
z <- v
z[which(z > 3)] <- 100
z
[1] 1 100 100 0 100
Standard mathematical operations on vectors affect each element individually:
3+v
[1] 4 8 20 3 14
Vectors of the same length can be added, etc, which again works by element:
w <- 1:5
w+v
[1] 2 7 20 4 16
As you may recall from linear algebra, a vector V
with n
elements can be considered as a point in n-dimensional space, often represented by an arrow pointing from the origin to that point.
The above image shows the addition of two vectors: (4,1)+(3,4)=(7,5)
.
The inner or dot product is a measure of how similar two vectors are, and is the sum of the pairwise products of their elements:
\(w \cdot v = \sum_{i=1}^n w_i*v_i = w_1*v_1+w_2*v_2+ ... +w_n*v_n\)
This can be calculated with R, via the %*%
operator (to distguish it from *
)
w %*% v
[,1]
[1,] 117
Matrices are rectangular or tabular arrays of data. Although they look like data sets, our data will usually be stored in the data.frame
format, which allows for mixes of data types. But matrices are a good introduction to how data can be organized in rows and columns.
One way to make a matrix is to bind two vector columns together:
m <- cbind(w,v)
m
w v
[1,] 1 1
[2,] 2 5
[3,] 3 17
[4,] 4 0
[5,] 5 11
This is a 5x2 matrix.
You can also bind two vectors as rows together:
m <- rbind(w,v)
m
[,1] [,2] [,3] [,4] [,5]
w 1 2 3 4 5
v 1 5 17 0 11
Vectors are flexible, in that the can act as either rows or columns here.
But we can do the same with matrices, and then it matters: cbind()
glues them side-by-side, and rbind()
glues them above and below.
rbind(m,m)
[,1] [,2] [,3] [,4] [,5]
w 1 2 3 4 5
v 1 5 17 0 11
w 1 2 3 4 5
v 1 5 17 0 11
Note that R retains the vector names (w, v) as row names in the new matrices.
We can also make a matrix directly, via matrix()
:
u <- matrix(1:6,nrow=2)
u
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
(note the order that the numbers fill the matrix)
We can specify the number of rows and columns and fill it up with NAs, say:
matrix(NA,nrow=3,ncol=2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
NA
is “not available,” and is R’s standard way of denoting missing data. We might create a matrix of NAs to later replace with pieces of data.
We can refer to individual elements via:
u[2,3]
[1] 6
This is the element from the second row, third column.
And as with vectors, we can select subsets:
u[2,1:2]
[1] 2 4
u[,1:2]
[,1] [,2]
[1,] 1 3
[2,] 2 4
Note that in the second case, the blank before the comma is a shortcut for asking for everything – in this case, all the rows.
And also as with vectors, we can do logical operators
u == 3
[,1] [,2] [,3]
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE FALSE
which(u == 3)
[1] 3
Note that the elements in a matrix are ordered as we saw them assigned: down first, then across.
u[which(u == 3)] <- 5
u
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 2 4 6
How might we make the first and third columns of u
all 1s?
u[1,3] <- c(1,1,1,1,1)
u[,c(1,3)] <- rep(1,4)
u[,1:3] <- 1
u
with 1s, but note that R does know to fill it all with 1 without having specified so.
If we want to know the dimensions of a matrix, that is easily done:
dim(u)
[1] 2 3
ncol(u)
[1] 3
nrow(u)
[1] 2
Matrices can also be transposed (flipped around the diagonal).
t(u)
[,1] [,2]
[1,] 1 2
[2,] 5 4
[3,] 5 6
As with vectors, basic addition happens element-by-element:
t <- u + 3
t
[,1] [,2] [,3]
[1,] 4 8 8
[2,] 5 7 9
Matrices, like vectors, have a spatial interpretation. A matrix is a “linear operator” that transforms one vector into another vector.
We don’t have the time to go into the foundations of linear algebra, but just as we can take the inner or dot product of two vectors, we can also take the product of two matrices. If we multiply a matrix \(M_1\) times a vector \(v_1\), we get a new vector \(v_2\); if we multiply that vector \(v_2\) by another matrix \(M_2\), we get \(v_3\). But we can also get \(v_3\) if we first multiply \(M_2\) and \(M_1\) to get \(M_{21}\), and then multiply \(v_1\) by \(M_{21}\), which directly yields \(v_3\).
That is, if \(M_1 v_1 = v_2\) and \(M_2 v_2 = v_3\), then if \(M_{21} = M_2 M_1\), we also can say \(M_{21} v_1 = v_3\). \(M_{21}\) is just a new matrix that combines the operations of \(M_1\) and \(M_2\) into a single matrix. Thanks to the rules of matrix multiplication, we can easily generate \(M_{21}\) by multiplying \(M_2\) and \(M_1\), but like so much of math, these mathematical tools are useful for many different tasks.
A matrix \(M\) with x rows and y columns times a vector \(v\) with x elements, \(M v\), yields a new vector with y elements where each element is the dot product of the vector \(v\) and the yth row of the matrix.
An \(i \times j\) matrix A times a \(j \times k\) matrix B is a new \(i \times k\) matrix N where each element \(n_{i,k}\) of N is the dot product of the ith row of matrix A with the kth column of matrix B.
In this image, A is 4x2, B is 2x3, and their product, C, is 4x3. \(C = AB\). Note that \(AB\) is not the same as \(BA\): for matrices, the order of multiplication matters. In fact, in this case \(BA\) is not even defined, because their sizes don’t match. (Flip A and B on the diagram and try to take the inner product of the rows of B with the columns of A; it can’t be done because they aren’t the same length.)
For example, here is a matrix times a vector in R. Note the use of the special %*%
notation in R, to distinguish matrix multiplication from regular multiplication.
u
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 2 4 6
v <- 1:3
v
[1] 1 2 3
u %*% v
[,1]
[1,] 26
[2,] 28
We can think of the vector v
as being equivalent to a matrix with just one column, so the rule for multiplying a matrix times a vector is just a special case of multiplying two matrices.
Try to do this matrix multiplication by hand before putting it into R. (If needed, use the image from the “Matrix Multiplication” slide to remind yourself how to multiply.)
Say we define w
as:
w <- t(u)
w
[,1] [,2]
[1,] 1 2
[2,] 5 4
[3,] 5 6
What is u %*% w
?
w %*% u
. The order of multiplication matters for matrices!