Overview

In this module we go over the basic mathematical tools and variable types built into R.

Objectives

After completing this module, students should be able to:

Perform mathematical calculations with R.
Recognize and use fundamental data types.
Understand and manipulate vectors and matrices.

Reading

Lander, Chapter 4.1-4.4.

1 Basic math

Now that we have installed R and RStudio, we can start playing with it!

Here is perhaps the simplest task you can give R:

1+1

[1] 2

The most direct approach is to simply type 1+1 into the console and hit return. The answer (surprise) is 2 – we’ll explain what that [1] is doing there in a minute.

R knows the usual mathematical operations: + - * / ^ as well as many others.

Give it a try:

What is 3.14^3+17/2?: If you got 39.45914, congratulations – your R isn’t broken!

1.1 Math 2

R can also do many other things:

sqrt(2)

[1] 1.414214

sqrt(-1)

Warning in sqrt(-1): NaNs produced

[1] NaN

Note that this produces a NaN – Not a Number – as a response. R can also handle imaginary numbers, but we won’t be needing that.

log(1000)

[1] 6.907755

Note that log is the natural log unless otherwise specified, eg: log(1000,10).

1.2 Math 3

R knows some fundamental constants:

pi

[1] 3.141593

And can round

round(5.75,1)

[1] 5.8

Again, the second number specifies the option – in this case, the number of decimal places to round to. We’ll see more of such functions in a minute.

And of course order of operation matters. Which is larger?

3.14^(3+17)/2: 4340731928. R handles big numbers well.
3.14^(3+17/2): 518431. Watch out for those parentheses!

2 Variables

Unlike some programming languages, any string of characters can be a variable.

To assign a variable, rather than use the = sign (though that also works), we use <-, which reminds us that the thing on the right is being saved in the thing on the left.

i <- 3

porkchop <- log(6)

To evaluate a variable, we can just type it and hit enter.

porkchop

[1] 1.791759

And to remove it:

rm(i)

3 Data types

As you may have learned in a statistics class, data comes in many varieties: integers (ordinal), continuous numbers (cardinal), factors (categorial), logical (binary), characters (strings), dates, etc.

class will tell us what type something already is:

class(2.8)

[1] "numeric"

We can also directly inquire about specific types with is.TYPE, and often coerce a variable of one type into another with as.TYPE:

is.character("2014-08-13")

[1] TRUE

properdate <- as.Date("2014-08-13")

Note the default date format; see ?as.Date for more formats.

3.1 Changing types

Is it this type?	Make it this type
`is.numeric()`	`as.numeric()`
`is.character()`	`as.character()`
`is.factor()`	`as.factor()`
NA	`as.Date()`

Logical (TRUE/FALSE) is another type, and is the output of many basic tests – equality ==, inequality !=, greater than >, less-than-or-equal-to <=, etc.

1 == 2

[1] FALSE

1 <= porkchop

[1] TRUE

4 Vectors

A vector is an ordered sequence of numbers. One way to create a vector is with c()

v <- c(1,5,9,7,2)
v

[1] 1 5 9 7 2

There are shortcuts for creating sequences and repeated values:

1:10

 [1]  1  2  3  4  5  6  7  8  9 10

rep(3,5)

[1] 3 3 3 3 3

4.1 Vector properties

You can get the length of a vector:

length(v)

[1] 5

And you can easily pick out specific elements from the vector:

v[3]

[1] 9

v[3:5]

[1] 9 7 2

v[length(v)]

[1] 2

Note how length(v), which is a function that gives the length of v, is evaluated inside the brackets and thus v[length(v)] gives you the 5th elemenet of v. You can put quite elaborate equations inside those brackets if you want, but it’s generally better for anything complicated to define it first and then put it inside the brackets. For instance,

myelements <- c(1,2,length(v))
myelements

[1] 1 2 5

v[myelements]

[1] 1 5 2

is more readable than trying to cram it all into one line with v[c(1,2,length(v))].

In addition to being able to use anything that produces a list of numbers, you can also pick out elements with anything that produces a list of TRUE/FALSE values. The TRUE values will be selected, and the FALSE values will be ignored. So for instance, v > 3 will produce as its output a vector of TRUE and FALSE values by testing whether each element of v is greater than 3. You can then put that inside the brackets, so that the vector of TRUE/FALSE values selects only those elements of v that get TRUE.

myvals <- v > 3
myvals

[1] FALSE  TRUE  TRUE  TRUE FALSE

v[myvals]

[1] 5 9 7

# or equivalently
v[v > 3]

[1] 5 9 7

This is a very powerful approach to selecting subsets of vectors and other objects in R. Another related way to do this is using which(), which instead of giving a bunch of TRUE/FALSE values, gives you the element numbers associated with the TRUE elements. This can also be put inside the brackets, for the same result.

which(v > 3)

[1] 2 3 4

v[which(v > 3)]

[1] 5 9 7

4.2 Modifying vectors

You can also reassign values easily using the same notation. Basically to replace or modify anything in R, you just write over it.

v[3] <- 100
v

[1]   1   5 100   7   2

v[3:5] <- c(17,0,11)
v

[1]  1  5 17  0 11

What is v[c(2,4])] ?

5 17 0: Nope. That’s v[2:4].
5 7: That’s the old v! Remember v has been changed.
5 0: Right. The result is a new vector consisting of the second and fourth elements from v.

Using the TRUE/FALSE approach, you can selectively replace elements in your R object:

z <- v
z[z > 3] <- 100
z

[1]   1 100 100   0 100

# or equivalently
z <- v
z[which(z > 3)] <- 100
z

[1]   1 100 100   0 100

5 Vector math

Standard mathematical operations on vectors affect each element individually:

3+v

[1]  4  8 20  3 14

Vectors of the same length can be added, etc, which again works by element:

w <- 1:5
w+v

[1]  2  7 20  4 16

5.1 Vector multiplication

As you may recall from linear algebra, a vector V with n elements can be considered as a point in n-dimensional space, often represented by an arrow pointing from the origin to that point.

The above image shows the addition of two vectors: (4,1)+(3,4)=(7,5).

The inner or dot product is a measure of how similar two vectors are, and is the sum of the pairwise products of their elements:

\(w \cdot v = \sum_{i=1}^n w_i*v_i = w_1*v_1+w_2*v_2+ ... +w_n*v_n\)

This can be calculated with R, via the %*% operator (to distguish it from *)

w %*% v

     [,1]
[1,]  117

6 Matrices

Matrices are rectangular or tabular arrays of data. Although they look like data sets, our data will usually be stored in the data.frame format, which allows for mixes of data types. But matrices are a good introduction to how data can be organized in rows and columns.

One way to make a matrix is to bind two vector columns together:

m <- cbind(w,v)
m

     w  v
[1,] 1  1
[2,] 2  5
[3,] 3 17
[4,] 4  0
[5,] 5 11

This is a 5x2 matrix.

6.1 Building matrices 1

You can also bind two vectors as rows together:

m <- rbind(w,v)
m

  [,1] [,2] [,3] [,4] [,5]
w    1    2    3    4    5
v    1    5   17    0   11

Vectors are flexible, in that the can act as either rows or columns here.

But we can do the same with matrices, and then it matters: cbind() glues them side-by-side, and rbind() glues them above and below.

rbind(m,m)

  [,1] [,2] [,3] [,4] [,5]
w    1    2    3    4    5
v    1    5   17    0   11
w    1    2    3    4    5
v    1    5   17    0   11

Note that R retains the vector names (w, v) as row names in the new matrices.

6.2 Building matrices 2

We can also make a matrix directly, via matrix():

u <- matrix(1:6,nrow=2)
u

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

(note the order that the numbers fill the matrix)

We can specify the number of rows and columns and fill it up with NAs, say:

matrix(NA,nrow=3,ncol=2)

     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
[3,]   NA   NA

NA is “not available,” and is R’s standard way of denoting missing data. We might create a matrix of NAs to later replace with pieces of data.

6.3 Matrix properties 1

We can refer to individual elements via:

u[2,3]

[1] 6

This is the element from the second row, third column.

And as with vectors, we can select subsets:

u[2,1:2]

[1] 2 4

u[,1:2]

     [,1] [,2]
[1,]    1    3
[2,]    2    4

Note that in the second case, the blank before the comma is a shortcut for asking for everything – in this case, all the rows.

6.4 Matrix properties 2

And also as with vectors, we can do logical operators

u == 3

      [,1]  [,2]  [,3]
[1,] FALSE  TRUE FALSE
[2,] FALSE FALSE FALSE

which(u == 3)

[1] 3

Note that the elements in a matrix are ordered as we saw them assigned: down first, then across.

u[which(u == 3)] <- 5
u

     [,1] [,2] [,3]
[1,]    1    5    5
[2,]    2    4    6

6.5 Matrix properties 3

How might we make the first and third columns of u all 1s?

u[1,3] <- c(1,1,1,1,1): Too many 1s.
u[,c(1,3)] <- rep(1,4): That does the job!
u[,1:3] <- 1: Close. That fills all of u with 1s, but note that R does know to fill it all with 1 without having specified so.

If we want to know the dimensions of a matrix, that is easily done:

dim(u)

[1] 2 3

ncol(u)

[1] 3

nrow(u)

[1] 2

6.6 Transposition and addition

Matrices can also be transposed (flipped around the diagonal).

t(u)

     [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]    5    6

As with vectors, basic addition happens element-by-element:

t <- u + 3
t

     [,1] [,2] [,3]
[1,]    4    8    8
[2,]    5    7    9

7 Matrix algebra

Matrices, like vectors, have a spatial interpretation. A matrix is a “linear operator” that transforms one vector into another vector.

We don’t have the time to go into the foundations of linear algebra, but just as we can take the inner or dot product of two vectors, we can also take the product of two matrices. If we multiply a matrix \(M_1\) times a vector \(v_1\), we get a new vector \(v_2\); if we multiply that vector \(v_2\) by another matrix \(M_2\), we get \(v_3\). But we can also get \(v_3\) if we first multiply \(M_2\) and \(M_1\) to get \(M_{21}\), and then multiply \(v_1\) by \(M_{21}\), which directly yields \(v_3\).

That is, if \(M_1 v_1 = v_2\) and \(M_2 v_2 = v_3\), then if \(M_{21} = M_2 M_1\), we also can say \(M_{21} v_1 = v_3\). \(M_{21}\) is just a new matrix that combines the operations of \(M_1\) and \(M_2\) into a single matrix. Thanks to the rules of matrix multiplication, we can easily generate \(M_{21}\) by multiplying \(M_2\) and \(M_1\), but like so much of math, these mathematical tools are useful for many different tasks.

7.1 Matrix multiplication

A matrix \(M\) with x rows and y columns times a vector \(v\) with x elements, \(M v\), yields a new vector with y elements where each element is the dot product of the vector \(v\) and the yth row of the matrix.

An \(i \times j\) matrix A times a \(j \times k\) matrix B is a new \(i \times k\) matrix N where each element \(n_{i,k}\) of N is the dot product of the ith row of matrix A with the kth column of matrix B.

In this image, A is 4x2, B is 2x3, and their product, C, is 4x3. \(C = AB\). Note that \(AB\) is not the same as \(BA\): for matrices, the order of multiplication matters. In fact, in this case \(BA\) is not even defined, because their sizes don’t match. (Flip A and B on the diagram and try to take the inner product of the rows of B with the columns of A; it can’t be done because they aren’t the same length.)

7.2 Matrix times vector

For example, here is a matrix times a vector in R. Note the use of the special %*% notation in R, to distinguish matrix multiplication from regular multiplication.

     [,1] [,2] [,3]
[1,]    1    5    5
[2,]    2    4    6

v <- 1:3
v

[1] 1 2 3

u %*% v

     [,1]
[1,]   26
[2,]   28

We can think of the vector v as being equivalent to a matrix with just one column, so the rule for multiplying a matrix times a vector is just a special case of multiplying two matrices.

7.3 Matrix times matrix

Try to do this matrix multiplication by hand before putting it into R. (If needed, use the image from the “Matrix Multiplication” slide to remind yourself how to multiply.)

Say we define w as:

w <- t(u)
w

     [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]    5    6

What is u %*% w?

5 13 17 13 41 49 17 49 61: No – that’s w %*% u. The order of multiplication matters for matrices!
51 52 52 56: Yup!

Computational Statistics 1.2: Variables and Math