Overview
In this lesson we introduce a number of other more complex types of graphics that are possible with R and various packages.
Objectives
Readings
External source: https://www.r-graph-gallery.com/index.html
R and Ggplot is a powerful tool for making all kinds of visualizations that go far beyond the standard boxplots and scatter plots we are familiar with from basic data analysis. A few more are discussed below, but basically any type of chart you have seen can be done elegantly in R if you put in the time to tweak every aspect to get it perfect. Again the R Graph Gallery has quick a lot of nice, simple examples of this.
One increasingly common form of visualization is a network graph, where nodes (points) are connected by edges (lines). Nodes can be individual, organizations, species, ideas, or anything else, and the edges between them can reflect any manner of social, physical, or conceptual connections between them. One great package for doing network analysis and visualization is iGraph
, which has some simple datasets built in which we use to illustate a network plot. “Karate” is a famous social network of friendships among a set of karate students. The network is specified by a matrix of 1’s (person i is friends with person j) and 0’s (here denoted with a dot). The plot itself is done by drawing nodes who are connected closer together while repelling nodes that are not connected, producing a picture that immediately shows how the community is divided in two (and indeed, during this study there was a schisml the nodes are colored by which side of this schism the individuals ended up on).
library(igraph)
library(igraphdata)
data(karate)
karate[1:10,1:10]
## 10 x 10 sparse Matrix of class "dgCMatrix"
##
## Mr Hi . 4 5 3 3 3 3 2 2 .
## Actor 2 4 . 6 3 . . . 4 . .
## Actor 3 5 6 . 3 . . . 4 5 1
## Actor 4 3 3 3 . . . . 3 . .
## Actor 5 3 . . . . . 2 . . .
## Actor 6 3 . . . . . 5 . . .
## Actor 7 3 . . . 2 5 . . . .
## Actor 8 2 4 4 3 . . . . . .
## Actor 9 2 . 5 . . . . . . .
## Actor 10 . . 1 . . . . . . .
plot(karate, vertex.frame.color="white",vertex.label.cex=.5)
There are myriad possible setting here, many of which can be found in the iGraph documentation.
R is also capable of doing sophisticated mapping, drawing upon external sources of mapping data dynamically. As always, this is done with a package that makes it easy, in this case the leaflet
package, though there are numerous mapping packages.
library(leaflet)
leaflet() %>%
addTiles() %>%
setView( lng = -71, lat = 42, zoom = 8 )
Note that you can dynamically interact with this plot. Also note how we use the %>% pipe function here, which is what leaflet expects and works best with. We can also layer data onto this map, eg in the following example we use the locations of a small collection of geo-located tweets:
load(file="boston1k.Rdata",verbose=TRUE)
## Loading objects:
## bostwi
leaflet(bostwi) %>%
addTiles() %>%
setView( lng = -71.06, lat = 42.36, zoom = 14 ) %>%
addCircles(~as.numeric(longitude), ~as.numeric(latitude), weight = 3, radius=20, color="blue", stroke = TRUE, fillOpacity = 0.5)
We can also do something similar using the “ggmap” package as part of ggplot.
library(ggmap)
boston_mp <- get_map(location = c(left = -71.15, bottom = 42.3, right = -70.985746, top = 42.4))
ggmap(boston_mp) + geom_point(data=bostwi, aes(x=as.numeric(longitude), y=as.numeric(latitude)) )
Finally, there is a whole world of interactive and dynamic visualizations available in R. Some of these approaches, like the shiny
require back-end servers to run, but others can be rendered straight in HTML like here.
One nice example is the plotly
package, which provides dynamic information about specific data points, and much more. See their website for much more.
library(plotly)
airdate <- as.Date(paste("1972","-",airquality$Month,"-",airquality$Day,sep=""))
airquality2 <- cbind(airquality,airdate)
gg <- ggplot(data=airquality2,aes(x=airdate,y=Temp)) + geom_point() + xlab("Date") + ylab("Temperature") + theme_bw()
ggplotly(gg)
Another simple way to animate data is via the gganimate
package, which with the simple addition of transition_reveal()
produces a nice animated gif of your data.
library(gganimate)
ggplot(data=airquality2,aes(x=airdate,y=Temp)) + geom_line() + xlab("Date") + ylab("Temperature") + theme_bw() + transition_reveal(airdate)
Another common but non-traditional plot is the word cloud, a colorful way of giving a basic overview of the contents of some document, where words are scaled according to their frequency. Once again we use a package, this time wordcloud2
. It takes as its input the simple table of word counts we constructed earlier, (after dropping the first 12 words, which are boring words like “the”).
singleString <- paste(readLines("i_have_a_dream.txt"), collapse=" ")
splitstring <- strsplit(singleString,"[^a-zA-Z]")[[1]]
lowerstring <- tolower(splitstring)
lowerstring_noblanks <- lowerstring[lowerstring != ""]
wordcounts <- as.data.frame(table(lowerstring_noblanks))
wordcounts2 <- wordcounts[order(-wordcounts$Freq),]
library(wordcloud2)
wordcloud2(data=wordcounts2[13:nrow(wordcounts2),], size=0.5, shape="circle")