Grammar of graphics is found in package ggplot2
.
## # once:
## install.packages("ggplot2")
##
## # everytime:
library(ggplot2)
The function is called ggplot
(note no number 2 on the end of the function). You want to store this output in a name because all you are doing below is telling ggplot how to “think”" about the data, i.e., what represents x and y in this case. If you run it directly, you’ll get an error “No layers in plot” because you’ve not added a layer to draw.
You will need to refer frequently to the ggplot2
documentation found at http://docs.ggplot2.org/current/. Note that the functions often have “aesthetics” listed, that control the way the data is displayed in addition to the usual “arguments” that all R functions have.
Aesthetics are placed within a function called aes
within your function call as an argument. Watch for that pattern below as we work to recreate a version of our optic nerve head figure:
library(adm)
myplot <- ggplot(onhlong, aes(x = age, y = mean))
Now we can add a layer to plot the points:
myplot + geom_point()
But we want the different points to have colors depending on the groups right? This is an aesthetic (see the documentation for geom_point
to see what else you can control).
myplot + geom_point(aes(color = case))
Yet we really want three groups right? To do that we need a little trick to combine the two variables into a new one. Just paste them together and save it as a new variable to the dataset:
# the trick - see what it did? "1-Yes", "0-No", and "1-No" are three groups
(onhlong$group <- factor(with(onhlong, paste(case, clinhypo, sep = "-"))))
## [1] 1-Yes 1-Yes 1-Yes 1-No 1-No 1-Yes 1-Yes 1-No 1-No 1-Yes 1-Yes
## [12] 1-No 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No 1-Yes 1-Yes
## [23] 1-Yes 1-Yes 1-No 1-Yes 1-Yes 0-No 0-No 0-No 0-No 0-No 0-No
## [34] 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No
## [45] 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No
## [56] 0-No 0-No 1-No 1-No 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes
## [67] 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 1-Yes 0-No
## [78] 1-Yes 1-Yes 1-Yes 1-No 1-Yes 1-Yes 1-Yes 0-No 0-No 0-No 0-No
## [89] 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No
## [100] 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No 0-No
## [111] 0-No 0-No 0-No 0-No
## Levels: 0-No 1-No 1-Yes
# need to resave myplot because it "stores" a memory of the dataset before we made
# the change
myplot <- ggplot(onhlong, aes(x = age, y = mean))
myplot + geom_point(aes(color = group))
Good start. Now let’s add a regression line with a 95% confidence limit based on the data itself:
myplot +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"))
But you wanted the prediction interval in this case. There is no automatic way to create the prediction band with ggplot. So we’ll use our onhfit
dataset.
myplot +
geom_point(aes(color = group)) +
# se = FALSE just for the line (no standard error)
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
# now the "ribbon"
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95))
Ooops… order of layers and transparency matters:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE)
or
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE)
Control the legends with guides()
:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
# just get rid of them for now; use the term that created the legend to get rid of it
guides(color = "none", alpha = "none")
Control the axes with scale_...
functions:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
# the specify the axis of interest and the variable type
scale_y_continuous(name = "Mean optic nerve size, mm") +
# change the title
guides(color = "none", alpha = "none")
Add back the color legend only and change the labels of the color “axis”:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +
# the labels are part of the "scale" think of the color as a third axis
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) +
# change the title
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none")
Make it “cleaner” for publications, etc:
myplot +
geom_ribbon(data = onhfit, aes(x = age, ymax = upr95, ymin = lwr95, alpha = 0.01)) +
geom_point(aes(color = group)) +
stat_smooth(method = "lm", data = subset(onhlong, group == "0-No"), se = FALSE) +
scale_x_continuous(name = "Age, years") +
scale_y_continuous(name = "Mean optic nerve size, mm") +
scale_color_discrete(labels = c("Control", "Case, Non-ONH eye", "Case, ONH eye")) +
guides(color = guide_legend(aes(title = "Eye type")), alpha = "none") +
# a very "clean" theme
theme_bw()
Non-parametric smoother with 95% CI. Big for the green band because it is uncertain due to a lack of data:
myplot +
geom_point(aes(color = group)) +
geom_smooth(aes(color = group, fill = group))
## `geom_smooth()` using method = 'loess'
Separate plots but on same axis:
myplot +
geom_point(aes(color = group)) +
geom_smooth(aes(color = group, fill = group)) +
facet_grid(~ group)
## `geom_smooth()` using method = 'loess'
The rop
dataset contains data on where three types of physicians (variable type
) thought the macula position was (variables x
and y
) on 30 photographs (variable picture
).
# library(devtools); install_github("advdatamgmt/adm")
library(adm)
head(rop)
## picture time person type x y
## 1 1 1 1 comp 56 -21
## 2 1 1 1 peds 55 -17
## 3 1 1 1 ret 50 -24
## 4 1 1 2 comp 53 -22
## 5 1 1 2 peds 55 -22
## 6 1 1 2 ret 55 -19
Can you create the following picture? (It only uses commands and aesthetics above, you’ll get some warnings because there are some missing data points — you can safely ignore those):
Modify the figure to display only the data from the retina physicians (type = "ret"
) and add a regression line (even though it doesn’t make much sense here) to each graph within your figure.