David Robinson
1/27/14
In these slides, we show blocks of R code, which are immediately followed by their output:
print("hello world")
[1] "hello world"
The gray box shows the original R code, which you can copy and paste into your own R console to try yourself. The white box shows the code's output: you can compare it to your own results (or just trust us that that's the output).
You store a value in a variable using the =
operator:
x = 42
This gives the variable a
a value of 42
. You can show the value of a
with:
print(x)
[1] 42
You can also assign a variable with <-
: this is equivalent.
x <- 42
Variable names consist of letters, digits, periods and underscores (_
), and cannot start with a digit. Convention is to use periods as spaces.
Legal variable names include:
Illegal names include:
You can perform mathematical operations using +
, -
, *
, and /
:
x = 6 + 4
print(x)
[1] 10
x / 2
[1] 5
y = 4
x / y
[1] 2.5
You can use exponentiation with ^
, or calculate the natural log:
x^2
[1] 100
y^3
[1] 64
log(x)
[1] 2.303
<-
and =
?
print(x)
to display a variable, and when x
?
print
is unnecessary. When you source a .R file, you need print(x)
in the line or it won't display.[1]
before each result?
You may have noticed the [1]
at the start of each result. That's because all numbers in R are actually represented as vectors of length 1. The [1]
is there to indicate rows of results.
For example, you can use :
to create a long vector of consecutive integers:
1:60
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
[52] 52 53 54 55 56 57 58 59 60
The [1]
, [18]
… [52]
at the start of each row helps keep track of the position within the vector.
You can also create vectors yourself using c
:
v1 = c(1, 2, 5, 7)
v2 = c(8, 6, 3, 2)
You can also use c
to combine existing vectors together:
v3 = c(v1, v2)
print(v3)
[1] 1 2 5 7 8 6 3 2
Use square brackets to retrieve a value from a vector, or multiple values:
v3
[1] 1 2 5 7 8 6 3 2
v3[4]
[1] 7
v3[4:7]
[1] 7 8 6 3
Mathematical operations on a vector apply to all elements:
v1 = c(1, 2, 5, 7)
v1 + 2
[1] 3 4 7 9
v1 / 2
[1] 0.5 1.0 2.5 3.5
sin(v1)
[1] 0.8415 0.9093 -0.9589 0.6570
Similarly, you can perform operations between two vectors:
v1
[1] 1 2 5 7
v2 = c(8, 6, 3, 2)
v1 + v2
[1] 9 8 8 9
v1 / v2
[1] 0.1250 0.3333 1.6667 3.5000
You can also easily summarize a vector by calculating the sum, mean, or length:
sum(v3)
[1] 34
mean(v3)
[1] 4.25
length(v3)
[1] 8
Not all values you could want to store in R are numeric. You could store:
We represent these as a series of characters (letters, digits, punctuation, etc).
Character vectors are surrounded by either single or double quotation marks.
chv = "hello"
chv2 = 'hi'
chv3 = c("hello", "world")
Like numeric values, they are always vectors, though sometimes they are of length 1.
Matrices are like two-dimensional vectors, organizing values into rows and columns:
m = matrix(1:9, ncol=3)
m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
You can get the number of rows, the number of columns, or both:
NROW(m)
[1] 3
NCOL(m)
[1] 3
dim(m)
[1] 3 3
To extract one value from a matrix, use the structure matrix[
row,
column]
.
m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
m[1, 3]
[1] 7
Leaving the “row” spot or the “column” spot empty will extract, respectively, an entire column or an entire row.
m[1, ]
[1] 1 4 7
m[, 2]
[1] 4 5 6
You can add or multiply a single value by a matrix:
m + 3
[,1] [,2] [,3]
[1,] 4 7 10
[2,] 5 8 11
[3,] 6 9 12
m * 2
[,1] [,2] [,3]
[1,] 2 8 14
[2,] 4 10 16
[3,] 6 12 18
Use the t
function to transpose a matrix:
t(m)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Use diag
to extract the diagonal:
diag(m)
[1] 1 5 9
You can also perform traditional matrix multiplication with the %*%
operator
m2 = matrix(21:32, nrow=3)
m %*% m2
[,1] [,2] [,3] [,4]
[1,] 270 306 342 378
[2,] 336 381 426 471
[3,] 402 456 510 564
Another type of variable is a logical value: TRUE
or FALSE
. Like numbers, logical values are always stored in vectors (sometimes of length 1).
x = TRUE
y = c(TRUE, FALSE, TRUE)
Logical vectors are useful because they are the result of logical operators, such as
>
: greater than<
: less than==
: equal to!=
: not equal to&
: and|
: orx = 2 # assignment
x > 0
[1] TRUE
x < 1
[1] FALSE
x != 10
[1] TRUE
==
and not =
?
=
is already reserved for assignment.Data frames store multiple columns of information together. Unlike a matrix, different columns in a data frame can store different kinds of information (numbers, factors, character vectors, etc)
R comes with built-in datasets that can be retrieved by name. You can access one with the data
function.
data(mtcars)
mtcars
contains statistics about 32 cars in 1974, including miles per gallon, weight, number of cylinders, etc. Each row is one car, and each column one piece of information.
View(mtcars)
See details and documentation about the data with:
?mtcars
or
help(mtcars)
One of the most useful functions is head
, which shows the first 6 rows of a data frame (a good way to get an idea of its contents):
head(mtcars)
mpg cyl disp hp drat wt qsec
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02
Datsun 710 22.8 4 108 93 3.85 2.320 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02
Valiant 18.1 6 225 105 2.76 3.460 20.22
vs am gear carb
Mazda RX4 0 1 4 4
Mazda RX4 Wag 0 1 4 4
Datsun 710 1 1 4 1
Hornet 4 Drive 1 0 3 1
Hornet Sportabout 0 0 3 2
Valiant 1 0 3 1
Get the number of rows, columns or both:
nrow(mtcars)
[1] 32
ncol(mtcars)
[1] 11
dim(mtcars)
[1] 32 11
Use $
to access one column by name:
mtcars$mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
[11] 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
[21] 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
Each column is a vector once it is extracted.
You can use square brackets with a comma to access a single row of a data frame:
mtcars[1, ]
mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4
carb
Mazda RX4 4
Or you can give row, column
to get a single value at a particular position:
mtcars[3, 2]
[1] 4
One common operation on data is to filter out rows based on some criterion.
You can get a set of rows using their indices:
mtcars[1:2, ]
mpg cyl disp hp drat wt qsec vs am
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1
gear carb
Mazda RX4 4 4
Mazda RX4 Wag 4 4
However, what if you want “all automatic cars” or “all cars with mpg > 20”?
Just like arithmetic operations, logical operators on a vector apply the test to each element individually:
v = c(1, 3, 12, 5, 2, 20)
v > 4
[1] FALSE FALSE TRUE TRUE FALSE TRUE
You can combine them using &
(and) or |
(or):
v > 4 & v < 15
[1] FALSE FALSE TRUE TRUE FALSE FALSE
v < 6 | v > 15
[1] TRUE TRUE FALSE TRUE TRUE TRUE
This can equally easily be applied to a column of mtcars
:
mtcars$mpg > 20
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
[9] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[25] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
This logical vector can be used to subset rows of the data frame- TRUE
means “keep the row”, FALSE
means drop it. Place it before the comma in the square brackets:
v = mtcars$mpg > 20
efficient.cars = mtcars[v, ]
or just:
efficient.cars = mtcars[mtcars$mpg > 20, ]
You can combine multiple conditions using &
(and) or |
(or), such as looking for automatic gearshift cars with mpg > 20:
efficient.auto = mtcars[mtcars$mpg > 20 & mtcars$am == 0, ]
head(efficient.auto, 3)
mpg cyl disp hp drat wt qsec vs
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1
am gear carb
Hornet 4 Drive 0 3 1
Merc 240D 0 4 2
Merc 230 0 4 2
data.table
is a third-party package that improves in many ways on the built-in data.frame
.
We'll go over some of its advantages on Wednesday and Friday, but will focus on one- how it makes filtering more convenient- today.
Since data.table
is a third-party package, you need to install it first. Once it is installed, you still have to load it into R:
library("data.table")
(You'll have to re-do that line each time you reopen R). Then convert your data.frame to a data.table:
mtcars.dt = as.data.table(mtcars)
A data.table
looks identical in many ways to a data.frame
, but has some useful features. One is that when you're filtering, you don't need to say mtcars$
each time when you're in the brackets- you can just refer to the column names:
mtcars.dt[mpg > 20 & am == 0, ]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
3: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
4: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
This doesn't mean the mpg
and am
variables exist: they exist only within those square brackets.
These slides were created as an RStudio presentation, which integrates R code using knitr.
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] ggplot2_0.9.3.1 data.table_1.8.10
[3] knitr_1.5
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 dichromat_2.0-0
[3] digest_0.6.4 evaluate_0.5.1
[5] formatR_0.10 grid_3.0.2
[7] gtable_0.1.2 labeling_0.2
[9] MASS_7.3-29 munsell_0.4.2
[11] plyr_1.8 proto_0.3-10
[13] RColorBrewer_1.0-5 reshape2_1.2.2
[15] scales_0.2.3 stringr_0.6.2
[17] tools_3.0.2