This is not a typical module. There is only a brief tutorial on getting datasets in and out of R as well as some file system related commands.
Next, install the package readxl
which allows you to read Excel files:
install.packages('readxl') # only need to do this once per installed version of R
To read an Excel file into R you load the xlsx
library once and use the read.xlsx
function.
library(readxl) # only once per session
Because I’ve saved the files we are going to use in the adm
package you need to ask R where the package directory is (your answer will be different depending on your operating system and what version of R you are running, etc.). These steps are unique for accessing a raw data file contained in a package. You will NOT generally do this, but it is a convenient way to distribute the file to you.
Once you discover where the file is however, the rest of the steps are the same.
system.file(package = "adm")
## [1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/adm"
The Excel file we are going to use is a version of the esoph
dataset that is in the extdata
directory of the package. If you get a blank character
, you have typed something wrong or you adm
package is not up-to-date.
system.file("extdata/esoph.xlsx", package = "adm")
## [1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/adm/extdata/esoph.xlsx"
With our filename in hand, we can now call the read.xlsx
function. The 1
tells it to pull out the first sheet of the file:
read_excel(system.file("extdata/esoph.xlsx", package = "adm"), 1)
## agegp alcgp tobgp ncases ncontrols
## 1 25-34 0-39g/day 0-9g/day 0 40
## 2 25-34 0-39g/day 10-19 0 10
## 3 25-34 0-39g/day 20-29 0 6
## 4 25-34 0-39g/day 30+ 0 5
## 5 25-34 40-79 0-9g/day 0 27
## 6 25-34 40-79 10-19 0 7
## 7 25-34 40-79 20-29 0 4
## 8 25-34 40-79 30+ 0 7
## 9 25-34 80-119 0-9g/day 0 2
## 10 25-34 80-119 10-19 0 1
## 11 25-34 80-119 30+ 0 2
## 12 25-34 120+ 0-9g/day 0 1
## 13 25-34 120+ 10-19 1 1
## 14 25-34 120+ 20-29 0 1
## 15 25-34 120+ 30+ 0 2
## 16 35-44 0-39g/day 0-9g/day 0 60
## 17 35-44 0-39g/day 10-19 1 14
## 18 35-44 0-39g/day 20-29 0 7
## 19 35-44 0-39g/day 30+ 0 8
## 20 35-44 40-79 0-9g/day 0 35
## 21 35-44 40-79 10-19 3 23
## 22 35-44 40-79 20-29 1 14
## 23 35-44 40-79 30+ 0 8
## 24 35-44 80-119 0-9g/day 0 11
## 25 35-44 80-119 10-19 0 6
## 26 35-44 80-119 20-29 0 2
## 27 35-44 80-119 30+ 0 1
## 28 35-44 120+ 0-9g/day 2 3
## 29 35-44 120+ 10-19 0 3
## 30 35-44 120+ 20-29 2 4
## 31 45-54 0-39g/day 0-9g/day 1 46
## 32 45-54 0-39g/day 10-19 0 18
## 33 45-54 0-39g/day 20-29 0 10
## 34 45-54 0-39g/day 30+ 0 4
## 35 45-54 40-79 0-9g/day 6 38
## 36 45-54 40-79 10-19 4 21
## 37 45-54 40-79 20-29 5 15
## 38 45-54 40-79 30+ 5 7
## 39 45-54 80-119 0-9g/day 3 16
## 40 45-54 80-119 10-19 6 14
## 41 45-54 80-119 20-29 1 5
## 42 45-54 80-119 30+ 2 4
## 43 45-54 120+ 0-9g/day 4 4
## 44 45-54 120+ 10-19 3 4
## 45 45-54 120+ 20-29 2 3
## 46 45-54 120+ 30+ 4 4
## 47 55-64 0-39g/day 0-9g/day 2 49
## 48 55-64 0-39g/day 10-19 3 22
## 49 55-64 0-39g/day 20-29 3 12
## 50 55-64 0-39g/day 30+ 4 6
## 51 55-64 40-79 0-9g/day 9 40
## 52 55-64 40-79 10-19 6 21
## 53 55-64 40-79 20-29 4 17
## 54 55-64 40-79 30+ 3 6
## 55 55-64 80-119 0-9g/day 9 18
## 56 55-64 80-119 10-19 8 15
## 57 55-64 80-119 20-29 3 6
## 58 55-64 80-119 30+ 4 4
## 59 55-64 120+ 0-9g/day 5 10
## 60 55-64 120+ 10-19 6 7
## 61 55-64 120+ 20-29 2 3
## 62 55-64 120+ 30+ 5 6
## 63 65-74 0-39g/day 0-9g/day 5 48
## 64 65-74 0-39g/day 10-19 4 14
## 65 65-74 0-39g/day 20-29 2 7
## 66 65-74 0-39g/day 30+ 0 2
## 67 65-74 40-79 0-9g/day 17 34
## 68 65-74 40-79 10-19 3 10
## 69 65-74 40-79 20-29 5 9
## 70 65-74 80-119 0-9g/day 6 13
## 71 65-74 80-119 10-19 4 12
## 72 65-74 80-119 20-29 2 3
## 73 65-74 80-119 30+ 1 1
## 74 65-74 120+ 0-9g/day 3 4
## 75 65-74 120+ 10-19 1 2
## 76 65-74 120+ 20-29 1 1
## 77 65-74 120+ 30+ 1 1
## 78 75+ 0-39g/day 0-9g/day 1 18
## 79 75+ 0-39g/day 10-19 2 6
## 80 75+ 0-39g/day 30+ 1 3
## 81 75+ 40-79 0-9g/day 2 5
## 82 75+ 40-79 10-19 1 3
## 83 75+ 40-79 20-29 0 3
## 84 75+ 40-79 30+ 1 1
## 85 75+ 80-119 0-9g/day 1 1
## 86 75+ 80-119 10-19 1 1
## 87 75+ 120+ 0-9g/day 2 2
## 88 75+ 120+ 10-19 1 1
Hopefully, it looks familiar. Of course you can save it for later:
xesoph <- read_excel(system.file("extdata/esoph.xlsx", package = "adm"), 1)
read_excel
automatically reads in the data.frame
as though you called data.frame
with stringsAsFactors = FALSE
. In general this is the best way to import any data due to the problems you have seen when you start with factor
and want to do additional data manipulations.
We will soon learn ways to convert variables to factors nearly automatically anyway.
Comma delimited text or comma separated values (CSV) are a common text file format for representing data. You can use the builtin read.csv
to read CSV files:
iih <- read.csv(system.file("extdata/iih.csv", package = "adm"), stringsAsFactors = FALSE)
Excel can open and write CSV too.
R is working in a directory of your file system. You can figure out which one using getwd
which means “get working directory”:
getwd()
You can change this directory using setwd
:
setwd("/Users/bbbruce/Documents") # you need to modify this for your system and preferences
Since you may not be familiar with your file system, I recommend using RStudio’s menu. Go to Session -> Set Working Directory -> Choose Directory… so that you can browse to where you want to work. When you are in the working directory, you do not need any of the fancy commands we were using earlier to figure out where the file is.
For example, if you had a file called “iih.xlsx” in your working directory you would read it in like this:
read_excel("iih.xlsx", 1, stringsAsFactors = FALSE)
Likewise, if you write anything out, it will be in the working directory unless you tell R otherwise. So if you cannot find something you need make sure to check what directory you were working in!
There are write functions for CSV (which you can open in Excel easily):
write.csv(iih, "iih2.csv")
These will be written into your working directory. Be careful, if they exist, they will be overwritten.
You can also directly save R objects:
saveRDS(xesoph, file = "xesoph.rds")
saveRDS(iih, file = "iih.rds")
rm(xesoph)
rm(iih)
xesoph <- readRDS("xesoph.rds")
iih <- readRDS("iih.rds")
I often use these functions in conjunction with file.exists
to cache long data operations so that I don’t have to wait for them to finish everytime I run my code.
if(!file.exists("iih.rds")) {
# ... do time consuming task to create iih
saveRDS(iih, file = "iih.rds")
} else {
iih <- loadRDS("iih.rds")
}