Exercise

Install ggplot2

Grammar of graphics is found in package ggplot2.

## # once:
## install.packages("ggplot2")
## 
## # everytime:
library(ggplot2)

Using ggplot2

The function is called ggplot (note no number 2 on the end of the function). You want to store this output in a name because all you are doing below is telling ggplot how to “think”" about the data, i.e., what represents x and y in this case. If you run it directly, you’ll get an error “No layers in plot” because you’ve not added a layer to draw.

You will need to refer frequently to the ggplot2 documentation found at http://docs.ggplot2.org/current/. Note that the functions often have “aesthetics” listed, that control the way the data is displayed in addition to the usual “arguments” that all R functions have.
Aesthetics are placed within a function called aes within your function call as an argument. Watch for that pattern below as we work to recreate a version of our optic nerve head figure:

library(adm)
myplot <- ggplot(onhlong, aes(x = age, y = mean))

Now we can add a layer to plot the points:

myplot + geom_point()

More aesthetics

But we want the different points to have colors depending on the groups right? This is an aesthetic (see the documentation for geom_point to see what else you can control).

myplot + geom_point(aes(color = case))

Yet we really want three groups right? To do that we need a little trick to combine the two variables into a new one. Just paste them together and save it as a new variable to the dataset:

# the trick - see what it did?  "1-Yes", "0-No", and "1-No" are three groups
(onhlong$group <- factor(with(onhlong, paste(case, clinhypo, sep = "-"))))
##   [1] 1-Yes 1-Yes 1-Yes 1-No  1-No  1-Yes 1-Yes 1-No  1-No  1-Yes 1-Yes
##  [12] 1-No  1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No  1-Yes 1-Yes
##  [23] 1-Yes 1-Yes 1-No  1-Yes 1-Yes 0-No  0-No  0-No  0-No  0-No  0-No 
##  [34] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
##  [45] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
##  [56] 0-No  0-No  1-No  1-No  1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes
##  [67] 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No 
##  [78] 1-Yes 1-Yes 1-Yes 1-No  1-Yes 1-Yes 1-Yes 0-No  0-No  0-No  0-No 
##  [89] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
## [100] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
## [111] 0-No  0-No  0-No  0-No 
## Levels: 0-No 1-No 1-Yes
# need to resave myplot because it "stores" a memory of the dataset before we made
# the change
myplot <- ggplot(onhlong, aes(x = age, y = mean))
myplot + geom_point(aes(color = group))

Regression Lines

Good start. Now let’s add a regression line with a 95% confidence limit based on the data itself:

myplot + 
    geom_point(aes(color = group)) + 
    stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"))

But you wanted the prediction interval in this case. There is no automatic way to create the prediction band with ggplot. So we’ll use our onhfit dataset.

myplot +
  geom_point(aes(color = group)) +
  # se = FALSE just for the line (no standard error)
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
  # now the "ribbon"
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95))

Ooops… order of layers and transparency matters:

myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE)

or

myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) 

Legends

Control the legends with guides():

myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
  # just get rid of them for now; use the term that created the legend to get rid of it
  guides(color = "none", alpha = "none")

Axes

Control the axes with scale_... functions:

myplot + 
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
# the specify the axis of interest and the variable type
scale_y_continuous(name = "Mean optic nerve size, mm") + 
# change the title
guides(color = "none", alpha = "none") 

Add back the color legend only and change the labels of the color “axis”:

myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +     
# the labels are part of the "scale" think of the color as a third axis
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) + 
# change the title
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none") 

Themes

Make it “cleaner” for publications, etc:

myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +     
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) + 
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none") +
# a very "clean" theme
theme_bw()

Pure exploration

Non-parametric smoother with 95% CI. Big for the green band because it is uncertain due to a lack of data:

myplot + 
    geom_point(aes(color = group)) + 
    geom_smooth(aes(color = group, fill = group))
## `geom_smooth()` using method = 'loess'

Separate plots but on same axis:

myplot + 
    geom_point(aes(color = group)) + 
    geom_smooth(aes(color = group, fill = group)) + 
    facet_grid(~ group)
## `geom_smooth()` using method = 'loess'

Explore and Extend

The rop dataset contains data on where three types of physicians (variable type) thought the macula position was (variables x and y) on 30 photographs (variable picture).

# library(devtools); install_github("advdatamgmt/adm")
library(adm)
head(rop)
##   picture time person type  x   y
## 1       1    1      1 comp 56 -21
## 2       1    1      1 peds 55 -17
## 3       1    1      1  ret 50 -24
## 4       1    1      2 comp 53 -22
## 5       1    1      2 peds 55 -22
## 6       1    1      2  ret 55 -19

Can you create the following picture? (It only uses commands and aesthetics above, you’ll get some warnings because there are some missing data points — you can safely ignore those):

Evaluate

Modify the figure to display only the data from the retina physicians (type = "ret") and add a regression line (even though it doesn’t make much sense here) to each graph within your figure.