Exercise

Once you learn R well, you will likely find that you do not tend to use spreadsheets like Microsoft Excel as much for common tasks that you might have previously done in spreadsheet software. We will see a spreadsheet-like structure develop as we work with the composite data types of vector, factor, list, and data.frame. Type the following at the R console and verify you get the same results, but also try to be sure you understand what is happening. This first part will cover only vector and factor.

Vectors

A vector is constructed using a function called simply c. Note that c is lower case which matters because R is a case-sensitive language. If you type a capital, or upper case, C instead of lower case c then R will not know what you mean.

c(1,2,3)
## [1] 1 2 3

A convenient way to make a vector of consecutive numbers is to use the : between them:

1:5
## [1] 1 2 3 4 5

Vectors can only be made of values from the same atomic type:

c("a","b","c")
## [1] "a" "b" "c"
c(TRUE,FALSE,FALSE,TRUE)
## [1]  TRUE FALSE FALSE  TRUE

The is.numeric, is.character, and is.logical functions that you learned about in Atomic Data Types work on vectors too:

is.numeric(1:3)
## [1] TRUE
is.logical(1:3)
## [1] FALSE
is.character(c("a","b"))
## [1] TRUE
is.logical(c(TRUE,TRUE,TRUE,FALSE,TRUE))
## [1] TRUE

Assignment

By now you are probably pretty tired of typing things like c(TRUE,TRUE,TRUE,FALSE,TRUE) and wonder if there is a way you could save things like that for later. There is! It is called assignment. We can assign a name to one of our vectors (or any R object). We do this using the <- operator (that is made of two characters, the less than symbol, <, and a hyphen, -). Think of it as an arrow that directs the value on the right hand side of <- into the name on the left hand side.

x <- c(TRUE, TRUE, TRUE, FALSE, TRUE)  # nothing appears to happen, but everything is ok...
x                                  # ...see? I told you so! :)
## [1]  TRUE  TRUE  TRUE FALSE  TRUE
is.logical(x)
## [1] TRUE

The name x is just one of a nearly infinite number of names, but R does have some rules for valid names:

  • A valid name can only consist of:
    • letters,
    • numbers,
    • the dot or period character (.), and
    • the underscore or underline character (_),
  • But it can only start with:
    • a letter, or
    • the dot not followed by a number.

When you name something the same thing, it is replaced:

x
## [1]  TRUE  TRUE  TRUE FALSE  TRUE
x <- c(1, 2, 3)
x
## [1] 1 2 3

The name can stand in for the object that is assigned to it anywhere that the object can be used, even in assignment:

y <- x
y
## [1] 1 2 3

Finally, you can not use one of the reserved words as names. You have seen a few of these already, e.g., TRUE, NaN. It could really wreck havoc on R if you could change the value of those names.

Factors

A factor is similar to a vector, but is used to represent a nominal or ordinal variable. This allows R to automatically give you back the right statistics for something that is not numeric. For now, we will use the summarize function to show the difference.

x <- c(1,1,2,3,1)
y <- factor(x)
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     1.0     1.0     1.6     2.0     3.0
summary(y)
## 1 2 3 
## 3 1 1

That last output may be a little confusing at first, but notice that it is a type of table with the different values in the first row and the number of elements in the factor that take that value. When you give the values labels, as you often will, it can be much easier to understand the summary of a factor.

state <- factor(x,labels=c('GA','FL','AL'))
state
## [1] GA GA FL AL GA
## Levels: GA FL AL
summary(state)
## GA FL AL 
##  3  1  1

The labels=c(...) is a named argument to the factor function. The name is labels and the argument is the c(...) itself. Notice the = sign that connects the name to the argument. We will discuss this a lot more in the future. For now, just use this exact syntax if you need to label a numeric vector.

Note that the labels match the order of the values if you use numbers, not the order they appear in the vector you convert to a factor:

x <- c(2, 3, 1, 1, 1)
state <- factor(x, labels = c('GA', 'FL', 'AL'))
state
## [1] FL AL GA GA GA
## Levels: GA FL AL
summary(state)
## GA FL AL 
##  3  1  1

You can also use a character vector to create a factor:

state <- factor(c("GA", "GA", "FL", "AL", "GA"))
state
## [1] GA GA FL AL GA
## Levels: AL FL GA
summary(state)  # notice it is the same data, but alphabetically sorted
## AL FL GA 
##  1  1  3

Extract elements from vectors and factors

When you want to extract a specific element from a vector or factor, you use square brackets (i.e.,[ ]) and the index (the number) of the element within the data structure:

x               # same x as above
## [1] 2 3 1 1 1
x[5]
## [1] 1
x[2]
## [1] 3
x[2] + x[5]
## [1] 4
x[c(2, 5)]       # create a new vector with the second and fifth element of x
## [1] 3 1
state[3]        # works with vectors too
## [1] FL
## Levels: AL FL GA

Operating on vectors and factors

Most operators in R operate elementwise on vectors, e.g.:

x * 2
## [1] 4 6 2 2 2
2 ^ x
## [1] 4 8 2 2 2
x ^ 2
## [1] 4 9 1 1 1
5 - x
## [1] 3 2 4 4 4
x < 2
## [1] FALSE FALSE  TRUE  TRUE  TRUE

You can also apply operators to two vectors of the same length in which case the resulting vector will be the result of the operator applied to the first element of each vector, then the second element of each vector, and so on:

x - c(1, 0, 1, 0, 1)
## [1] 1 3 0 1 0

You can even use a vector that is shorter than the other. It will be recycled.

c(1, 0, 1, 0, 1, 0) + c(0, 1)
## [1] 1 1 1 1 1 1

However, you will get a warning if the shorter vector is not a multiple of the longer vector because this usually unintentional and most often indicates a bug in your program. Here it works the way we expect:

c(1, 0, 1, 0, 1) + c(0, 1)
## Warning in c(1, 0, 1, 0, 1) + c(0, 1): longer object length is not a
## multiple of shorter object length
## [1] 1 1 1 1 1

A warning is issued by a program when it can continue executing, but is not sure that how it is going to continue to execute is what you were expecting it to do. A warning does not rise to the level of an error which is something that the program cannot recover from.

Explore and Extend

c(TRUE, 0, FALSE, 3)
## [1] 1 0 0 3
c("a", 3, TRUE)
## [1] "a"    "3"    "TRUE"

Evaluate

You know the drill. The stem of your R file should be rcel1_.

Before continuing run the first two lines from your console, and add the third line to your R file. It downloads and installs two packages and then loads the a package that gives you access to the data for this exercise. We will discuss all of this a lot more later. For now, just do it mindlessly…

install.packages("devtools")                  # you only need to do this one time
devtools::install_github("advdatamgmt/adm")   # this too, but I may update as course goes along
library(adm)                                  # this you need every time you use the package 'adm'
rm(list = ls())                               # this will make clean your working area
                                              # so no naming conflicts arise
  1. Is 2dogs a valid name in R? Why or why not?

  2. What is the sum of the 11th and 82nd element of x after you’ve completed the steps above?

  3. What is the 5th element of y1 & y2?

  4. What is the 62nd element of z?

  5. How many letter c’s are in z?