In this module we go over the basic mathematical tools and variable types built into R.
After completing this module, students should be able to:
Lander, Chapter 4.1-4.4.
Now that we have installed R and RStudio, we can start playing with it!
Here is perhaps the simplest task you can give R:
1+1
[1] 2
The most direct approach is to simply type 1+1
into the console and hit return. The answer (surprise) is 2
– we’ll explain what that [1]
is doing there in a minute.
R knows the usual mathematical operations: + - * / ^
as well as many others.
Give it a try:
3.14^3+17/2
?
39.45914
, congratulations – your R isn’t broken!
R can also do many other things:
sqrt(2)
[1] 1.414
sqrt(-1)
Warning: NaNs produced
[1] NaN
Note that this produces a NaN
– Not a Number – as a response. R can also handle imaginary numbers, but we won’t be needing that.
log(1000)
[1] 6.908
Note that log
is the natural log unless otherwise specified, eg: log(1000,10)
.
R knows some fundamental constants:
pi
[1] 3.142
And can round
round(5.75,1)
[1] 5.8
Again, the second number specifies the option – in this case, the number of decimal places to round to. We’ll see more of such functions in a minute.
And of course order of operation matters. Which is larger?
3.14^(3+17)/2
4340731928
. R handles big numbers well.
3.14^(3+17/2)
518431
. Watch out for those parentheses!
Unlike some programming languages, any string of characters can be a variable.
To assign a variable, rather than use the =
sign (though that also works), we use <-
, which reminds us that the thing on the right is being saved in the thing on the left.
i <- 3
porkchop <- log(6)
To evaluate a variable, we can just type it and hit enter.
porkchop
[1] 1.792
And to remove it:
rm(i)
As you may have learned in a statistics class, data comes in many varieties: integers (ordinal), continuous numbers (cardinal), factors (categorial), logical (binary), characters (strings), dates, etc.
class
will tell us what type something already is:
class(2.8)
[1] "numeric"
We can also directly inquire about specific types with is.TYPE
, and often coerce a variable of one type into another with as.TYPE
:
is.character("2014-08-13")
[1] TRUE
properdate <- as.Date("2014-08-13")
Note the default date format; see ?as.Date
for more formats.
Is it this type? | Make it this type |
---|---|
is.numeric() |
as.numeric() |
is.character() |
as.character() |
is.factor() |
as.factor() |
NA | as.Date() |
Logical (TRUE/FALSE) is another type, and is the output of many basic tests – equality ==
, inequality !=
, greater than >
, less-than-or-equal-to <=
, etc.
1 == 2
[1] FALSE
1 <= porkchop
[1] TRUE
A vector is an ordered sequence of numbers. One way to create a vector is with c()
v <- c(1,5,9,7,2)
v
[1] 1 5 9 7 2
There are shortcuts for creating sequences and repeated values:
1:10
[1] 1 2 3 4 5 6 7 8 9 10
rep(3,5)
[1] 3 3 3 3 3
You can get the length of a vector:
length(v)
[1] 5
And you can easily pick out specific elements from the vector:
v[3]
[1] 9
v[3:5]
[1] 9 7 2
v[length(v)]
[1] 2
You can also reassign values easily using the same notation:
v[3] <- 100
v
[1] 1 5 100 7 2
v[3:5] <- c(17,0,11)
v
[1] 1 5 17 0 11
What is v[c(2,4])]
?
5 17 0
v[2:4]
.
5 7
v
! Remember v
has been changed.
5 0
v
.
Standard mathematical operations on vectors affect each element individually:
3+v
[1] 4 8 20 3 14
Vectors of the same length can be added, etc, which again works by element:
w <- 1:5
w+v
[1] 2 7 20 4 16
You can also do logical checks on vectors, and get the element numbers as well:
v < 3
[1] TRUE FALSE FALSE TRUE FALSE
which(v < 3)
[1] 1 4
As you may recall from linear algebra, a vector V
with n
elements can be considered as a point in n-dimensional space, often represented by an arrow pointing from the origin to that point.
The above image shows the addition of two vectors: (4,1)+(3,4)=(7,5)
.
The inner or dot product is a measure of how similar two vectors are, and is the sum of the pairwise products of their elements:
\(w \cdot v = \sum_{i=1}^n w_i*v_i = w_1*v_1+w_2*v_2+ ... +w_n*v_n\)
This can be calculated with R, via the %*%
operator (to distguish it from *
)
w %*% v
[,1]
[1,] 117
Matrices are rectangular or tabular arrays of data. Although they look like data sets, our data will usually be stored in the data.frame
format, which allows for mixes of data types. But matrices are a good introduction to how data can be organized in rows and columns.
One way to make a matrix is to bind two vector columns together:
m <- cbind(w,v)
m
w v
[1,] 1 1
[2,] 2 5
[3,] 3 17
[4,] 4 0
[5,] 5 11
This is a 5x2 matrix.
You can also bind two vectors as rows together:
m <- rbind(w,v)
m
[,1] [,2] [,3] [,4] [,5]
w 1 2 3 4 5
v 1 5 17 0 11
Vectors are flexible, in that the can act as either rows or columns here.
But we can do the same with matrices, and then it matters: cbind()
glues them side-by-side, and rbind()
glues them above and below.
rbind(m,m)
[,1] [,2] [,3] [,4] [,5]
w 1 2 3 4 5
v 1 5 17 0 11
w 1 2 3 4 5
v 1 5 17 0 11
Note that R retains the vector names (w, v) as row names in the new matrices.
We can also make a matrix directly, via matrix()
:
u <- matrix(1:6,nrow=2)
u
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
(note the order that the numbers fill the matrix)
We can specify the number of rows and columns and fill it up with NAs, say:
matrix(NA,nrow=3,ncol=2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
NA
is “not available,” and is R’s standard way of denoting missing data. We might create a matrix of NAs to later replace with pieces of data.
We can refer to individual elements via:
u[2,3]
[1] 6
This is the element from the second row, third column.
And as with vectors, we can select subsets:
u[2,1:2]
[1] 2 4
u[,1:2]
[,1] [,2]
[1,] 1 3
[2,] 2 4
Note that in the second case, the blank before the comma is a shortcut for asking for everything – in this case, all the rows.
And also as with vectors, we can do logical operators
u == 3
[,1] [,2] [,3]
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE FALSE
which(u == 3)
[1] 3
Note that the elements in a matrix are ordered as we saw them assigned: down first, then across.
u[which(u == 3)] <- 5
u
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 2 4 6
How might we make the first and third columns of u
all 1s?
u[1,3] <- c(1,1,1,1,1)
u[,c(1,3)] <- rep(1,4)
u[,1:3] <- 1
u
with 1s, but note that R does know to fill it all with 1 without having specified so.
If we want to know the dimensions of a matrix, that is easily done:
dim(u)
[1] 2 3
ncol(u)
[1] 3
nrow(u)
[1] 2
Matrices can also be transposed (flipped around the diagonal).
t(u)
[,1] [,2]
[1,] 1 2
[2,] 5 4
[3,] 5 6
As with vectors, basic addition happens element-by-element:
t <- u + 3
t
[,1] [,2] [,3]
[1,] 4 8 8
[2,] 5 7 9
Matrices, like vectors, have a spatial interpretation. A matrix is a “linear operator” that transforms one vector into another vector.
We don’t have the time to go into the foundations of linear algebra, but just as we can take the inner or dot product of two vectors, we can also take the product of two matrices. If we multiply a matrix \(M_1\) times a vector \(v_1\), we get a new vector \(v_2\); if we multiply that vector \(v_2\) by another matrix \(M_2\), we get \(v_3\). But we can also get \(v_3\) if we first multiply \(M_2\) and \(M_1\) to get \(M_{21}\), and then multiply \(v_1\) by \(M_{21}\), which directly yields \(v_3\).
That is, if \(M_1 v_1 = v_2\) and \(M_2 v_2 = v_3\), then if \(M_{21} = M_2 M_1\), we also can say \(M_{21} v_1 = v_3\). \(M_{21}\) is just a new matrix that combines the operations of \(M_1\) and \(M_2\) into a single matrix. Thanks to the rules of matrix multiplication, we can easily generate \(M_{21}\) by multiplying \(M_2\) and \(M_1\), but like so much of math, these mathematical tools are useful for many different tasks.
A matrix \(M\) with x rows and y columns times a vector \(v\) with x elements, \(M v\), yields a new vector with y elements where each element is the dot product of the vector \(v\) and the yth row of the matrix.
An \(i \times j\) matrix A times a \(j \times k\) matrix B is a new \(i \times k\) matrix N where each element \(n_{i,k}\) of N is the dot product of the ith row of matrix A with the kth column of matrix B.
In this image, A is 4x2, B is 2x3, and their product, C, is 4x3. \(C = AB\). Note that \(AB\) is not the same as \(BA\): for matrices, the order of multiplication matters. In fact, in this case \(BA\) is not even defined, because their sizes don’t match. (Flip A and B on the diagram and try to take the inner product of the rows of B with the columns of A; it can’t be done because they aren’t the same length.)
For example, here is a matrix times a vector in R. Note the use of the special %*%
notation in R, to distinguish matrix multiplication from regular multiplication.
u
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 2 4 6
v <- 1:3
v
[1] 1 2 3
u %*% v
[,1]
[1,] 26
[2,] 28
We can think of the vector v
as being equivalent to a matrix with just one column, so the rule for multiplying a matrix times a vector is just a special case of multiplying two matrices.
Try to do this matrix multiplication by hand before putting it into R. (If needed, use the image from the “Matrix Multiplication” slide to remind yourself how to multiply.)
Say we define w
as:
w <- t(u)
w
[,1] [,2]
[1,] 1 2
[2,] 5 4
[3,] 5 6
What is u %*% w
?
w %*% u
. The order of multiplication matters for matrices!