Once you learn R well, you will likely find that you do not tend to use spreadsheets like Microsoft Excel as much for common tasks that you might have previously done in spreadsheet software. We will see a spreadsheet-like structure develop as we work with the composite data types of vector
, factor
, list
, and data.frame
. Type the following at the R console and verify you get the same results, but also try to be sure you understand what is happening. This first part will cover only vector
and factor
.
A vector
is constructed using a function called simply c
. Note that c
is lower case which matters because R is a case-sensitive language. If you type a capital, or upper case, C
instead of lower case c
then R will not know what you mean.
c(1,2,3)
## [1] 1 2 3
A convenient way to make a vector of consecutive numbers is to use the :
between them:
1:5
## [1] 1 2 3 4 5
Vectors can only be made of values from the same atomic type:
c("a","b","c")
## [1] "a" "b" "c"
c(TRUE,FALSE,FALSE,TRUE)
## [1] TRUE FALSE FALSE TRUE
The is.numeric
, is.character
, and is.logical
functions that you learned about in Atomic Data Types work on vectors too:
is.numeric(1:3)
## [1] TRUE
is.logical(1:3)
## [1] FALSE
is.character(c("a","b"))
## [1] TRUE
is.logical(c(TRUE,TRUE,TRUE,FALSE,TRUE))
## [1] TRUE
By now you are probably pretty tired of typing things like c(TRUE,TRUE,TRUE,FALSE,TRUE)
and wonder if there is a way you could save things like that for later. There is! It is called assignment. We can assign a name to one of our vectors (or any R object). We do this using the <-
operator (that is made of two characters, the less than symbol, <
, and a hyphen, -
). Think of it as an arrow that directs the value on the right hand side of <-
into the name on the left hand side.
x <- c(TRUE, TRUE, TRUE, FALSE, TRUE) # nothing appears to happen, but everything is ok...
x # ...see? I told you so! :)
## [1] TRUE TRUE TRUE FALSE TRUE
is.logical(x)
## [1] TRUE
The name x
is just one of a nearly infinite number of names, but R does have some rules for valid names:
.
), and_
),When you name something the same thing, it is replaced:
x
## [1] TRUE TRUE TRUE FALSE TRUE
x <- c(1, 2, 3)
x
## [1] 1 2 3
The name can stand in for the object that is assigned to it anywhere that the object can be used, even in assignment:
y <- x
y
## [1] 1 2 3
Finally, you can not use one of the reserved words as names. You have seen a few of these already, e.g., TRUE
, NaN
. It could really wreck havoc on R if you could change the value of those names.
A factor
is similar to a vector
, but is used to represent a nominal or ordinal variable. This allows R to automatically give you back the right statistics for something that is not numeric. For now, we will use the summarize
function to show the difference.
x <- c(1,1,2,3,1)
y <- factor(x)
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 1.0 1.6 2.0 3.0
summary(y)
## 1 2 3
## 3 1 1
That last output may be a little confusing at first, but notice that it is a type of table with the different values in the first row and the number of elements in the factor that take that value. When you give the values labels, as you often will, it can be much easier to understand the summary of a factor.
state <- factor(x,labels=c('GA','FL','AL'))
state
## [1] GA GA FL AL GA
## Levels: GA FL AL
summary(state)
## GA FL AL
## 3 1 1
The labels=c(...)
is a named argument to the factor
function. The name is labels
and the argument is the c(...)
itself. Notice the =
sign that connects the name to the argument. We will discuss this a lot more in the future. For now, just use this exact syntax if you need to label a numeric vector
.
Note that the labels match the order of the values if you use numbers, not the order they appear in the vector
you convert to a factor
:
x <- c(2, 3, 1, 1, 1)
state <- factor(x, labels = c('GA', 'FL', 'AL'))
state
## [1] FL AL GA GA GA
## Levels: GA FL AL
summary(state)
## GA FL AL
## 3 1 1
You can also use a character vector
to create a factor
:
state <- factor(c("GA", "GA", "FL", "AL", "GA"))
state
## [1] GA GA FL AL GA
## Levels: AL FL GA
summary(state) # notice it is the same data, but alphabetically sorted
## AL FL GA
## 1 1 3
When you want to extract a specific element from a vector
or factor
, you use square brackets (i.e.,[ ]
) and the index (the number) of the element within the data structure:
x # same x as above
## [1] 2 3 1 1 1
x[5]
## [1] 1
x[2]
## [1] 3
x[2] + x[5]
## [1] 4
x[c(2, 5)] # create a new vector with the second and fifth element of x
## [1] 3 1
state[3] # works with vectors too
## [1] FL
## Levels: AL FL GA
Most operators in R operate elementwise on vectors
, e.g.:
x * 2
## [1] 4 6 2 2 2
2 ^ x
## [1] 4 8 2 2 2
x ^ 2
## [1] 4 9 1 1 1
5 - x
## [1] 3 2 4 4 4
x < 2
## [1] FALSE FALSE TRUE TRUE TRUE
You can also apply operators to two vectors
of the same length in which case the resulting vector
will be the result of the operator applied to the first element of each vector
, then the second element of each vector
, and so on:
x - c(1, 0, 1, 0, 1)
## [1] 1 3 0 1 0
You can even use a vector
that is shorter than the other. It will be recycled.
c(1, 0, 1, 0, 1, 0) + c(0, 1)
## [1] 1 1 1 1 1 1
However, you will get a warning
if the shorter vector
is not a multiple of the longer vector
because this usually unintentional and most often indicates a bug in your program. Here it works the way we expect:
c(1, 0, 1, 0, 1) + c(0, 1)
## Warning in c(1, 0, 1, 0, 1) + c(0, 1): longer object length is not a
## multiple of shorter object length
## [1] 1 1 1 1 1
A warning
is issued by a program when it can continue executing, but is not sure that how it is going to continue to execute is what you were expecting it to do. A warning
does not rise to the level of an error
which is something that the program cannot recover from.
Make a really long vector
by typing this: 5:123
. Now that you know how to extract elements from a vector
, explain why R has been printing [1]
at the beginning of the output all this time and why R is now printing other numbers between the square brackets ([ ]
).
Try and assign something to the reserved word TRUE
. What is the error? Can you make sense of it?
Can you guess what function tells you whether something is a factor
or not?
Is a factor
a vector
? Justify your answer.
Remember that I said that a vector can only be made from the same atomic type? Try the following (and additional similar examples) and try to understand what is going on. What is R doing? Why does this always “work”?
c(TRUE, 0, FALSE, 3)
## [1] 1 0 0 3
c("a", 3, TRUE)
## [1] "a" "3" "TRUE"
You know the drill. The stem of your R file should be rcel1_
.
Before continuing run the first two lines from your console, and add the third line to your R file. It downloads and installs two packages and then loads the a package that gives you access to the data for this exercise. We will discuss all of this a lot more later. For now, just do it mindlessly…
install.packages("devtools") # you only need to do this one time
devtools::install_github("advdatamgmt/adm") # this too, but I may update as course goes along
library(adm) # this you need every time you use the package 'adm'
rm(list = ls()) # this will make clean your working area
# so no naming conflicts arise
Is 2dogs
a valid name in R? Why or why not?
What is the sum of the 11th and 82nd element of x
after you’ve completed the steps above?
x[11] + x[82]
## [1] 101
y1 & y2
?y3 <- y1 & y2
y3[5]
## [1] FALSE
(y1 & y2)[5] # without creating a new variable
## [1] FALSE
z
?z[62]
## [1] "c"
z
?summary(factor(z))[3] # did you know the output is a vector? is.vector()?
## c
## 342