David Robinson

1/27/14

In these slides, we show blocks of R code, which are immediately followed by their output:

```
print("hello world")
```

```
[1] "hello world"
```

The gray box shows the original R code, which you can copy and paste into your own R console to try yourself. The white box shows the code's output: you can compare it to your own results (or just trust us that that's the output).

You store a value in a variable using the `=`

operator:

```
x = 42
```

This gives the variable `a`

a value of `42`

. You can show the value of `a`

with:

```
print(x)
```

```
[1] 42
```

You can also assign a variable with `<-`

: this is equivalent.

```
x <- 42
```

Variable names consist of letters, digits, periods and underscores (`_`

), and cannot start with a digit. Convention is to use periods as spaces.

Legal variable names include:

- my.variable
- my_variable

Illegal names include:

- my-variable
- dave's.variable
- 2ndvariable

You can perform mathematical operations using `+`

, `-`

, `*`

, and `/`

:

```
x = 6 + 4
print(x)
```

```
[1] 10
```

```
x / 2
```

```
[1] 5
```

```
y = 4
x / y
```

```
[1] 2.5
```

You can use exponentiation with `^`

, or calculate the natural log:

```
x^2
```

```
[1] 100
```

```
y^3
```

```
[1] 64
```

```
log(x)
```

```
[1] 2.303
```

- What is the difference between
`<-`

and`=`

?- In 99% of cases, they act exactly the same, so it's personal preference. See here to see a description of the rare cases where they differ.

- When do you need
`print(x)`

to display a variable, and when`x`

?- When working in the R interactive terminal, the result of each line are displayed after being evaluated-
`print`

is unnecessary. When you*source*a .R file, you need`print(x)`

in the line or it won't display.

- When working in the R interactive terminal, the result of each line are displayed after being evaluated-

- Why is there a
`[1]`

before each result?- You'll find out in the next section!

You may have noticed the `[1]`

at the start of each result. That's because all numbers in R are actually represented as *vectors* of length 1. The `[1]`

is there to indicate rows of results.

For example, you can use `:`

to create a long vector of consecutive integers:

```
1:60
```

```
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
[52] 52 53 54 55 56 57 58 59 60
```

The `[1]`

, `[18]`

… `[52]`

at the start of each row helps keep track of the position within the vector.

You can also create vectors yourself using `c`

:

```
v1 = c(1, 2, 5, 7)
v2 = c(8, 6, 3, 2)
```

You can also use `c`

to combine existing vectors together:

```
v3 = c(v1, v2)
print(v3)
```

```
[1] 1 2 5 7 8 6 3 2
```

Use square brackets to retrieve a value from a vector, or multiple values:

```
v3
```

```
[1] 1 2 5 7 8 6 3 2
```

```
v3[4]
```

```
[1] 7
```

```
v3[4:7]
```

```
[1] 7 8 6 3
```

Mathematical operations on a vector apply to all elements:

```
v1 = c(1, 2, 5, 7)
v1 + 2
```

```
[1] 3 4 7 9
```

```
v1 / 2
```

```
[1] 0.5 1.0 2.5 3.5
```

```
sin(v1)
```

```
[1] 0.8415 0.9093 -0.9589 0.6570
```

Similarly, you can perform operations between two vectors:

```
v1
```

```
[1] 1 2 5 7
```

```
v2 = c(8, 6, 3, 2)
v1 + v2
```

```
[1] 9 8 8 9
```

```
v1 / v2
```

```
[1] 0.1250 0.3333 1.6667 3.5000
```

You can also easily summarize a vector by calculating the sum, mean, or length:

```
sum(v3)
```

```
[1] 34
```

```
mean(v3)
```

```
[1] 4.25
```

```
length(v3)
```

```
[1] 8
```

Not all values you could want to store in R are numeric. You could store:

- subject names
- gene sequences
- text for analysis

We represent these as a series of characters (letters, digits, punctuation, etc).

Character vectors are surrounded by either single or double quotation marks.

```
chv = "hello"
chv2 = 'hi'
chv3 = c("hello", "world")
```

Like numeric values, they are always vectors, though sometimes they are of length 1.

Matrices are like two-dimensional vectors, organizing values into rows and columns:

```
m = matrix(1:9, ncol=3)
m
```

```
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
```

You can get the number of rows, the number of columns, or both:

```
NROW(m)
```

```
[1] 3
```

```
NCOL(m)
```

```
[1] 3
```

```
dim(m)
```

```
[1] 3 3
```

To extract one value from a matrix, use the structure `matrix[`

**row**`,`

**column**`]`

.

```
m
```

```
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
```

```
m[1, 3]
```

```
[1] 7
```

Leaving the “row” spot or the “column” spot empty will extract, respectively, an entire column or an entire row.

```
m[1, ]
```

```
[1] 1 4 7
```

```
m[, 2]
```

```
[1] 4 5 6
```

You can add or multiply a single value by a matrix:

```
m + 3
```

```
[,1] [,2] [,3]
[1,] 4 7 10
[2,] 5 8 11
[3,] 6 9 12
```

```
m * 2
```

```
[,1] [,2] [,3]
[1,] 2 8 14
[2,] 4 10 16
[3,] 6 12 18
```

Use the `t`

function to transpose a matrix:

```
t(m)
```

```
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
```

Use `diag`

to extract the diagonal:

```
diag(m)
```

```
[1] 1 5 9
```

You can also perform traditional matrix multiplication with the `%*%`

operator

```
m2 = matrix(21:32, nrow=3)
m %*% m2
```

```
[,1] [,2] [,3] [,4]
[1,] 270 306 342 378
[2,] 336 381 426 471
[3,] 402 456 510 564
```

Another type of variable is a logical value: `TRUE`

or `FALSE`

. Like numbers, logical values are always stored in vectors (sometimes of length 1).

```
x = TRUE
y = c(TRUE, FALSE, TRUE)
```

Logical vectors are useful because they are the result of logical operators, such as

`>`

: greater than`<`

: less than`==`

: equal to`!=`

: not equal to`&`

: and`|`

: or

```
x = 2 # assignment
x > 0
```

```
[1] TRUE
```

```
x < 1
```

```
[1] FALSE
```

```
x != 10
```

```
[1] TRUE
```

- Why is the logical operator for equals
`==`

and not`=`

?- Because
`=`

is already reserved for assignment.

- Because

Data frames store multiple columns of information together. Unlike a matrix, different columns in a data frame can store different kinds of information (numbers, factors, character vectors, etc)

R comes with built-in datasets that can be retrieved by name. You can access one with the `data`

function.

```
data(mtcars)
```

`mtcars`

contains statistics about 32 cars in 1974, including miles per gallon, weight, number of cylinders, etc. Each row is one car, and each column one piece of information.

```
View(mtcars)
```

See details and documentation about the data with:

```
?mtcars
```

or

```
help(mtcars)
```

One of the most useful functions is `head`

, which shows the first 6 rows of a data frame (a good way to get an idea of its contents):

```
head(mtcars)
```

```
mpg cyl disp hp drat wt qsec
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02
Datsun 710 22.8 4 108 93 3.85 2.320 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02
Valiant 18.1 6 225 105 2.76 3.460 20.22
vs am gear carb
Mazda RX4 0 1 4 4
Mazda RX4 Wag 0 1 4 4
Datsun 710 1 1 4 1
Hornet 4 Drive 1 0 3 1
Hornet Sportabout 0 0 3 2
Valiant 1 0 3 1
```

Get the number of rows, columns or both:

```
nrow(mtcars)
```

```
[1] 32
```

```
ncol(mtcars)
```

```
[1] 11
```

```
dim(mtcars)
```

```
[1] 32 11
```

Use `$`

to access one column by name:

```
mtcars$mpg
```

```
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
[11] 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
[21] 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
```

Each column is a vector once it is extracted.

You can use square brackets with a comma to access a single row of a data frame:

```
mtcars[1, ]
```

```
mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4
carb
Mazda RX4 4
```

Or you can give `row, column`

to get a single value at a particular position:

```
mtcars[3, 2]
```

```
[1] 4
```

One common operation on data is to filter out rows based on some criterion.

You can get a set of rows using their indices:

```
mtcars[1:2, ]
```

```
mpg cyl disp hp drat wt qsec vs am
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1
gear carb
Mazda RX4 4 4
Mazda RX4 Wag 4 4
```

However, what if you want “all automatic cars” or “all cars with mpg > 20”?

Just like arithmetic operations, logical operators on a vector apply the test to each element individually:

```
v = c(1, 3, 12, 5, 2, 20)
v > 4
```

```
[1] FALSE FALSE TRUE TRUE FALSE TRUE
```

You can combine them using `&`

(and) or `|`

(or):

```
v > 4 & v < 15
```

```
[1] FALSE FALSE TRUE TRUE FALSE FALSE
```

```
v < 6 | v > 15
```

```
[1] TRUE TRUE FALSE TRUE TRUE TRUE
```

This can equally easily be applied to a column of `mtcars`

:

```
mtcars$mpg > 20
```

```
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
[9] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[25] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
```

This logical vector can be used to subset rows of the data frame- `TRUE`

means “keep the row”, `FALSE`

means drop it. Place it before the comma in the square brackets:

```
v = mtcars$mpg > 20
efficient.cars = mtcars[v, ]
```

or just:

```
efficient.cars = mtcars[mtcars$mpg > 20, ]
```

You can combine multiple conditions using `&`

(and) or `|`

(or), such as looking for automatic gearshift cars with mpg > 20:

```
efficient.auto = mtcars[mtcars$mpg > 20 & mtcars$am == 0, ]
head(efficient.auto, 3)
```

```
mpg cyl disp hp drat wt qsec vs
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1
am gear carb
Hornet 4 Drive 0 3 1
Merc 240D 0 4 2
Merc 230 0 4 2
```

`data.table`

is a third-party package that improves in many ways on the built-in `data.frame`

.

We'll go over some of its advantages on Wednesday and Friday, but will focus on one- how it makes filtering more convenient- today.

Since `data.table`

is a third-party package, you need to install it first. Once it is installed, you still have to load it into R:

```
library("data.table")
```

(You'll have to re-do that line each time you reopen R). Then convert your data.frame to a data.table:

```
mtcars.dt = as.data.table(mtcars)
```

A `data.table`

looks identical in many ways to a `data.frame`

, but has some useful features. One is that when you're filtering, you don't need to say `mtcars$`

each time when you're in the brackets- you can just refer to the column names:

```
mtcars.dt[mpg > 20 & am == 0, ]
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
3: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
4: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
```

This doesn't mean the `mpg`

and `am`

variables exist: they exist only within those square brackets.