Grammar of graphics is found in package ggplot2.
## # once:
## install.packages("ggplot2")
## 
## # everytime:
library(ggplot2)The function is called ggplot (note no number 2 on the end of the function). You want to store this output in a name because all you are doing below is telling ggplot how to “think”" about the data, i.e., what represents x and y in this case. If you run it directly, you’ll get an error “No layers in plot” because you’ve not added a layer to draw.
You will need to refer frequently to the ggplot2 documentation found at http://docs.ggplot2.org/current/. Note that the functions often have “aesthetics” listed, that control the way the data is displayed in addition to the usual “arguments” that all R functions have.
Aesthetics are placed within a function called aes within your function call as an argument. Watch for that pattern below as we work to recreate a version of our optic nerve head figure:
library(adm)
myplot <- ggplot(onhlong, aes(x = age, y = mean))Now we can add a layer to plot the points:
myplot + geom_point()But we want the different points to have colors depending on the groups right? This is an aesthetic (see the documentation for geom_point to see what else you can control).
myplot + geom_point(aes(color = case))Yet we really want three groups right? To do that we need a little trick to combine the two variables into a new one. Just paste them together and save it as a new variable to the dataset:
# the trick - see what it did?  "1-Yes", "0-No", and "1-No" are three groups
(onhlong$group <- factor(with(onhlong, paste(case, clinhypo, sep = "-"))))##   [1] 1-Yes 1-Yes 1-Yes 1-No  1-No  1-Yes 1-Yes 1-No  1-No  1-Yes 1-Yes
##  [12] 1-No  1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No  1-Yes 1-Yes
##  [23] 1-Yes 1-Yes 1-No  1-Yes 1-Yes 0-No  0-No  0-No  0-No  0-No  0-No 
##  [34] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
##  [45] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
##  [56] 0-No  0-No  1-No  1-No  1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes
##  [67] 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No 
##  [78] 1-Yes 1-Yes 1-Yes 1-No  1-Yes 1-Yes 1-Yes 0-No  0-No  0-No  0-No 
##  [89] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
## [100] 0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No  0-No 
## [111] 0-No  0-No  0-No  0-No 
## Levels: 0-No 1-No 1-Yes# need to resave myplot because it "stores" a memory of the dataset before we made
# the change
myplot <- ggplot(onhlong, aes(x = age, y = mean))
myplot + geom_point(aes(color = group))Good start. Now let’s add a regression line with a 95% confidence limit based on the data itself:
myplot + 
    geom_point(aes(color = group)) + 
    stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"))But you wanted the prediction interval in this case. There is no automatic way to create the prediction band with ggplot. So we’ll use our onhfit dataset.
myplot +
  geom_point(aes(color = group)) +
  # se = FALSE just for the line (no standard error)
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
  # now the "ribbon"
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95))Ooops… order of layers and transparency matters:
myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE)or
myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) Control the legends with guides():
myplot +
  geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
  geom_point(aes(color = group)) +
  stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
  # just get rid of them for now; use the term that created the legend to get rid of it
  guides(color = "none", alpha = "none")Control the axes with scale_... functions:
myplot + 
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
# the specify the axis of interest and the variable type
scale_y_continuous(name = "Mean optic nerve size, mm") + 
# change the title
guides(color = "none", alpha = "none") Add back the color legend only and change the labels of the color “axis”:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +     
# the labels are part of the "scale" think of the color as a third axis
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) + 
# change the title
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none") Make it “cleaner” for publications, etc:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +     
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) + 
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none") +
# a very "clean" theme
theme_bw()Non-parametric smoother with 95% CI. Big for the green band because it is uncertain due to a lack of data:
myplot + 
    geom_point(aes(color = group)) + 
    geom_smooth(aes(color = group, fill = group))## `geom_smooth()` using method = 'loess'Separate plots but on same axis:
myplot + 
    geom_point(aes(color = group)) + 
    geom_smooth(aes(color = group, fill = group)) + 
    facet_grid(~ group)## `geom_smooth()` using method = 'loess'The rop dataset contains data on where three types of physicians (variable type) thought the macula position was (variables x and y) on 30 photographs (variable picture).
# library(devtools); install_github("advdatamgmt/adm")
library(adm)
head(rop)##   picture time person type  x   y
## 1       1    1      1 comp 56 -21
## 2       1    1      1 peds 55 -17
## 3       1    1      1  ret 50 -24
## 4       1    1      2 comp 53 -22
## 5       1    1      2 peds 55 -22
## 6       1    1      2  ret 55 -19Can you create the following picture? (It only uses commands and aesthetics above, you’ll get some warnings because there are some missing data points — you can safely ignore those):
Modify the figure to display only the data from the retina physicians (type = "ret") and add a regression line (even though it doesn’t make much sense here) to each graph within your figure.