Overview
In this lesson we learn how to install R and understand its basic philosophy.
Objectives
After completing this lesson, students should be able to:
Reading
Lander, Chapters 1-2.
R is what’s known as a statistical programming language. As the name suggests, it can do a lot of things: programming, statistics, data storage and manipulation, graphics, etc. It can even scrape websites or Twitter, interact with databases, run other languages within it or be run by other languages, and produce pretty documents in PDF or html formats – such as these very slides, which were written (in part) in R.
Fundamentally, R is a command based language – that is, it takes in lines of text as input, and spits out formatted text as output. While it can also do lots of other things (eg, reading in data or outputting graphs), the text interface is fundamental. Thus doing data analysis in R fundamentally takes place in a “console”: you write commands to R, and it spits out replies.
Generally, one uses a graphical user interface (GUI) of some sort to handle this interaction with R and to organize your workplace. The GUI we will be using in this course is RStudio, which is an additional program separate from R itself, but which runs R in the background and helps the user interact with R, with data sets, and with R’s outputs.
Before going into further detail about how R and RStudio work, it is best to install both. It will be essential throughout these lectures to have a copy of RStudio running at all times, to try out for yourself the various bits of code we will exploring.
First we install R, and then RStudio.
To install R, go here and follow the instructions:
For Windows you want the base install; for Mac the first pkg file should be fine.
To install RStudio, go here and follow the instructions to install RStudio Desktop (the free version).
http://www.rstudio.com/products/rstudio/download/
In both cases, installation should work just like installing any other application. Please stick to the default installation locations and settings to avoid mishaps.
Once R and RStudio are installed, you shouldn’t need to worry about running R directly – RStudio will take care of that. So we just need to launch RStudio, and familiarize ourselves with it.
When you launch RStudio, you will see a large window with four “panes”. Each of these panes has tabs at the top, each with a different view. Note that the default arrangement of panes may be different from the arrangement seen here; be sure to check the name of each tab within each pane (such as “Console” or “Source”) to know which one is which.
The most important pane is the “Console” – this is where the main interaction with R occurs. If you type 2+3
at the >
prompt and hit return, R should immediately return 5
.
>2+3
[1] 5
You could do all of your R work just in the Console window, but this is a bad idea because it is difficult to retain a record of what you’ve done.
Instead, it is better to write and save all your commands in a text file, and execute lines of commands as needed. The Source pane is just a text viewer. To start a new file, go to File -> New File -> R script. This R script is just a plain text file where you can write whatever you want.
To run the line (ie, send it to the Console) that the cursor is currently on, you can hit the “Run” button in the upper right of the Source pane. You can also go to Code -> Run Line(s). As a shortcut, you can also just hit command-return (Mac) to execute that line. To execute multiple lines, just highlight those lines first before doing the above.
As you can see in the Code -> Run Region menu, you can also run everything up until the current line, or after it, or other variants.
To add a comment line that doesn’t get run in R (such as notes to yourself), just preface each line you don’t want to run with a #
.
# What is 2+3?
2+3
To save an R script or to open a data file, you need to tell R where to save or open it. You can do this with the GUI in RStudio each time, or you can run an R command to set this “Working Directory” location. Being able to do this from within your R script is important because you might want to save or open files in various locations.
To set the working directory, you issue something like the following command:
setwd("~/Desktop/comp_stats_NB")
If you are unfamiliar with how Windows or Mac machines encode directory information, you can use RStudio to set the working directory via Session -> Set Working Directory -> Choose Directory. This will let you pick your folder in the usual graphical way, but will also print the R command in the Console, which you can then copy to your R script and paste in at the top so that every time that script is run, it first sets R to the correct working directory.
(Note that RStudio does almost everything via sending commands to the Console. The RStudio GUI is basically just a set of shortcuts for sending text commands to the R Console.)
Other important RStudio panes include “History,” “Plot,” “Files,” “Help,” “Environment,” and “Packages.”
History shows a list of every command you’ve issued R via the Console. To reissue a command, you can double-click it and hit return. You can also paste it into your Source file using the button in the upper left of the History pane.
Plot shows the results of graphs and other plots in R. We will return to this in Module 2.
Files shows all the files in the current Working Directory.
Help shows the outputs from issuing help
commands. For instance, if you want the documentation on a function like mean
you would type help(mean)
or ?mean
and the documentation appears in the Help window. We will return to this in Module 2.
Environment shows all the data currently loaded into R: both their names and (where possible) their values. We’ll return to variables in Module 1.3.
Packages shows all the packages you’ve installed. Packages are specialized R functions that don’t come bundled with R but are installed by the user. For instance, if you want to import STATA data files in .dta
format, you would install the foreign
package and load it, which loads the functions needed to read these files. We will return to Packages in Module 2.
RStudio has various preferences that allow you to customize the layout and appearance of these panes.
In particular, the “Appearance” tab in the Preferences allows you to set how code is colored in the Source window; these colors help the eye pick out names, functions, commands, etc, but different people have difference color preferences.
The “Panes” tab lets you arrange the panes in RStudio. I personally prefer the Source to be in the upper left and the Console in the upper right. The window icons in the upper-right-most corner of each pane allow you to shrink that pane down to a mere title-bar; thus most of the time I have the bottom left and bottom right panes shrunk, and the RStudio screen is thus mostly just the Source on the left and the Console on the right. But that’s just personal preference.