Input and Output Files
Introduction
This is not a typical module. Just a brief tutorial on getting datasets in and out of R as well as some file system related commands. No grading just run each of the code chunks in order after you study each of them.
Excel files
To read Excel files you need to install the readxl
package.
To read an Excel file into R, you load the readxl
library once per R session, usually at the top of your R script.
If the library loads without error, you will not see any response and are now ready to read Excel files. The Excel file we are going to use is a version of the esoph
dataset named esoph.xlsx
that lives in the data
directory on this website.
To open that file, we use the read_excel
function from the readxl
package:
Now we take a look at it:
Comma Delimited Text Files
Comma delimited text or comma separated values (CSV) are a common text file format for representing data. You can use the builtin read.csv
to read CSV files. Let’s read in the iih.csv
file from the same data
directory:
Your Excel program can open and write CSV too.
File system commands
R is always working in a specific directory of your file system.
You can figure out which one using getwd
which means “get working directory”:
You can change this directory using setwd
. Let’s move into the data
directory for a moment.
R will not reply if it works, but we can check that we are in the data
directory by looking at getwd
again:
Now if we want to read in the iih.csv
file we would do it like this:
Note that we did not need to specify the data/
part of the path because we were already in that directory. Now, let’s go back to the parent directory of data
:
..
means the parent directory of the current directory while .
means the current directory. This is standard across many operating systems.
Now, look at getwd
one more time:
Ideally, we do not use these commands in our scripts because they can make our scripts less portable. Ideally, we use projects in RStudio so that our working directory is always the project directory. We will discuss that further in class, and you can read about it here too:
https://r4ds.hadley.nz/workflow-scripts.html#projects
However, when R cannot find something, it is often because you are not in the directory you think you are. So, if you get an error reading a file, you need make sure to check what directory you were working in using getwd
.
Likewise, if you write anything out, it will be in the working directory unless you direct R otherwise.
Writing data files
There are write functions for CSV (which you can open in Excel easily). Let’s try writing out the iih
dataset we just read in and put it in the data
directory:
Be careful, if a file exists of that name, it will be written over.
Well, is it there?
You can also directly save R objects directly which can be read back into R later. This is done using the saveRDS
function
You can see what objects are in your R session with the ls
function:
You can remove objects from your R session using rm
:
Check that they are gone with ls
:
And then read them back in using readRDS
:
Ok, are they back?
I often use these functions in conjunction with file.exists
to cache long operations so that I don’t have to wait for them to finish every time I run my code. Here’s an example of that pattern: