Try our Free Online Math Solver!

Introduction to R Workshop
1 Introduction
In this lab, we will be exploring how to use R. We will
work on generating and accessing
elements/components from R objects including vectors, matrices, lists, factors ,
data frames
and functions (both built in and user defined). We will also explore R's basic
graphics utilities
including plot, hist, and boxplot. Finally, we'll introduce you to R's control
structures:
ifelse, for and while loops. Have fun!
2 Vectors
Create a numerical vector of all the integers from 11 to
20 named num using the sequence
generating operator :. Use this vector to generate 6 logical vectors named
lg1...lg6 by
applying conditions using comparison operators >, >=, <, <=, == and !=. Generate
a character
vector named char using the concatenate function c(...). Use this vector to
create 2 logical
vectors, lg7 and lg8, using the comparison operators == and !=. View the
elements of all
these vectors by typing their names and hitting "enter" on your keyboard. Create
a mixed
vector named mix1 that contains values with a decimal point and integers using
the c(...)
function. What type of vector is produced? Check by typing mix1 and hitting
"enter" on
your keyboard as well as using the mode(...) function. Create a mixed vector
named mix2
that contains values with a decimal point, integers and characters with the
c(...) function.
What type of vector is produced? Again, check by typing mix2 and hitting "enter"
on your
keyboard as well as using the mode function.
Extract a subset of elements from num using the :
operator, c(...) as well as all 6 of
the logical vectors lg1...lg6. Extract the elements of char by using lg7 and
lg8. Extract
subsets of mix1 and mix2 using negative indexes together with the : operator and
the c(...)
function.
Perform the following mathematical operations on num :
num/num, num*num, num**2, num
+ num, 2*num and num  num. Are these standard matrix operations?
> num = 11:20
> num # components of num
[1] 11 12 13 14 15 16 17 18 19 20
> lg1 = num > 15
> lg1 # components of lg1
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> lg2 = num < 12
> lg3 = num >= 16
> lg4 = num <= 10
> lg5 = num == 20
> lg6 = num != 11
> char = c("R", "Perl", "stats", "bioconductor", "ChIPSeq")
> lg7 = char == "R"
> lg8 = char != "Perl"
> mix1 = c(1, 2, 3.3)
> mix1 # doubles
[1] 1.0 2.0 3.3
> mode(mix1)
[1] "numeric"
> mix2 = c(1, 2, 3.3, "R")
> mix2 # character
[1] "1" "2" "3.3" "R"
> mode(mix2)
[1] "character"
> num[2:6]
[1] 12 13 14 15 16
> num[c(1,3,5)]
[1] 11 13 15
> num[lg1]
[1] 16 17 18 19 20
> num[lg2]
[1] 11
> num[lg3]
[1] 16 17 18 19 20
> num[lg4]
integer(0)
> num[lg5]
[1] 20
> num[lg6]
[1] 12 13 14 15 16 17 18 19 20
> char[lg7]
[1] "R"
> char[lg8]
[1] "R" "stats" "bioconductor" "ChIPSeq"
> mix1[(3:4)]
[1] 1 2
> mix2[c(3,4)]
[1] "1" "2"
> num/num
[1] 1 1 1 1 1 1 1 1 1 1
> num*num
[1] 121 144 169 196 225 256 289 324 361 400
> num**2
[1] 121 144 169 196 225 256 289 324 361 400
> num+num
[1] 22 24 26 28 30 32 34 36 38 40
> 2*num
[1] 22 24 26 28 30 32 34 36 38 40
> numnum
[1] 0 0 0 0 0 0 0 0 0 0
3 Matrices
Create a 5 column matrix named mat from num using the
matrix() function and lling in
the values by row first. What are the dimensions of mat? Type mat at the prompt
then
"enter" and use the dim() function to find out. Extract the element in the second
row and
third column of mat. Extract the full first row and, separately, the full fourth
column of
mat. Extract all rows and the 4th and 5th columns of mat using the : operator
and c()
command. Create a logical vector lg9 by checking to see which elements in the
rst row
of mat are <= 14. Apply lg9 to the columns of mat. Perform the following
mathematical
operations on mat: mat/mat, mat*mat, mat**2, mat + mat, 2*mat and mat  mat.
> mat = matrix(num, ncol=5, byrow=T)
> mat
> dim(mat)
[1] 2 5
> mat[2,3]
[1] 18
> mat[1,]
[1] 11 12 13 14 15
> mat[,4]
[1] 14 19
> mat[,4:5]
> mat[,c(4,5)]
> lg9 = mat[1,] <= 14
> lg9
[1] TRUE TRUE TRUE TRUE FALSE
> mat[,lg9]
> mat/mat
> mat*mat
> mat**2
> mat + mat
> 2*mat
> matmat
4 Lists and Data Frames
Generate a list named ExpList with three components:
ExpLevel (3 numeric elements),
Exp (3 logical elements with at least one TRUE ) and GeneName (3 character
elements). Type
ExpList and hit "enter". Extract the GeneName component using the $ operator,
double
brackets,[[]], and single brackets, [], after ExpList. Do you notice any di
erences in the
outputs? Extract the third element of the GeneName component. Extract the
ExpLevel
and GeneName components in one view using single brackets after ExpList , [].
Generate a
character vector of length 3 named ids. Type help(as.data.frame). Read the help
page.
Apply the function as.data.frame on the list ExpList to generate a data frame
named
ExpData with row names ids (setting stringsAsFactors=F). Type ExpData and hit
"enter".
Extract the rst row and then the third column ( two separate operations) of
ExpData using
indexes. Use the $ operator to extract the Exp column. Extract the rows that are
TRUE
in the Exp column. Check the attributes of ExpData by applying the dim() and
mode()
functions.
> ExpList = list(ExpLevel=c(1,2,3), Exp=c(F,T,T), GeneName=c("p53",
"cMyc", "Sp1"))
> ExpList
$ExpLevel
[1] 1 2 3
$Exp
[1] FALSE TRUE TRUE
$GeneName
[1] "p53" "cMyc" "Sp1"
> ExpList$GeneName
[1] "p53" "cMyc" "Sp1"
> ExpList[[2]]
[1] FALSE TRUE TRUE
> ExpList[2]
$Exp
[1] FALSE TRUE TRUE
> ExpList$GeneName[3]
[1] "Sp1"
> ExpList[c(1,3)]
$ExpLevel
[1] 1 2 3
$GeneName
[1] "p53" "cMyc" "Sp1"
> ids = c("id1", "id2", "id3")
> ExpData = as.data.frame(ExpList, row.names=ids, stringsAsFactors=F)
> ExpData
> ExpData[1,]
> ExpData[,3]
[1] "p53" "cMyc" "Sp1"
> ExpData$Exp
[1] FALSE TRUE TRUE
> ExpData[ExpData$Exp,]
> dim(ExpData)
[1] 3 3
> mode(ExpData)
[1] "list"
5 Reading and Writing Data
Now we're going to learn to read and write data into and
out of R respectively. We're going
to start by writing so that we have les to read in. First, we're going to write
the matrix mat
to a le named "mat.txt". We'll use the write() function which writes a vector or
matrix
to a le. Type help(write). You'll see that write requires you to transpose your
matrix
(i.e., switch rows and columns). So try the following:
> t(mat) #transpose mat matrix
> write(t(mat), file="matrix.txt", ncol=5, sep="\t")
Check to see if the le"matrix.txt"is in the same directory
in which you called R by typing
system("ls"). If it is, view its contents using the command system("less
matrix.txt").
Was it written correctly? What if we had omitted the t() function? Try it.
Next, we'll write our data frame ExpData to a le named "ExpData.txt" using the
write.table() function:
> write.table(ExpData,file="ExpData.txt",quote=F,sep="\t",row.names=T,col.names=T)
Let's use system("ls") to see if the le was written and
system("less ExpData.txt")
to view the contents. Is the output what you expected? Note, I normally don't
include row
names in my output les (i.e., I set row.names=F).
Now we'll try to read in our matrix mat and data frame
ExpData. There are two ma
jor function that allow you to read text les into R: scan() which returns a
vector and
read.table which returns a data frame. If we want to read our le "matrix.txt" in
as a
matrix using scan we also have to use the matix function.
> mat2 = scan("matrix.txt")
> mat2 # This is a vector, not a matrix!
[1] 11 12 13 14 15 16 17 18 19 20
> mat2 = matrix(scan("matrix.txt"), byrow=T, ncol=5)
> mat2 # This is correct.
Now let's read our le"ExpData.txt"into a data frame called ExpData2 using read.table.
> ExpData2 = read.table("ExpData.txt", header=T, sep="\t")
> ExpData2 # This is correct.
6 Graphics
Now we'll explore some of R's graphics functions. The
function plot is R 's basic plotting
function. Type help(plot). If you look at all the parameters available to plot
by typing
help(par), you'll see that we could spend hours leaning all the details of plot
alone. Instead,
I'll just take you through a few examples of generating a scatter plot and a
line:
> x = seq(0,1,by=0.01) # a vector of values from 0 to 1 in
increments of 0.01.
> y = x + rnorm(length(x), mean=0, sd=0.1) # add a little Gaussian noise to x.
> plot(x,y,xlab="x",ylab="y",main="L",xlim=c(0,1),ylim=c(0,1),pch=18,col="red")
> lines(x,x,col="blue")
Redraw the above plot by using the type="l" option in plot
and points command
instead of line below plot.
> plot(x,y,type="l",xlab="x",ylab="y",main="L",xlim=c(0,1),ylim=c(0,1),col="red")
> points(x,x,col="blue")
Make a plot with two lines and two sets of corresponding
scatter points (similar to the
rst plot; use 4 colors): one with slope equal to one and another with slope
equal to two
using the plot, seq, points, lines and rnorm functions.
> z = 2*x + rnorm(length(x), mean=0, sd=0.5)
> plot(x,y,main="2 Lines",xlim=c(0,1),ylim=c(0,1),pch=18,col="red")
> points(x,z,pch=18,col="green")
> lines(x,x,col="blue")
> lines(x,2*x,col="purple")
Can we see all the "green" data points? If not, how would
get them all in the plot? Try
it.
Now let's generate a plot of the histogram (using the function hist), smoothed
density
(using the function density in plot) and boxplot (using the function boxplot) of
a random
vector r which is normally distributed with a mean of 2 and standard deviation
of 1. First
we have to generate the random vector (using rnorm) and then the plots:
> r = rnorm(1000,mean=2, sd=1)
> hist(r, main="Hist of r")
> plot(density(r), "Density of r")
> boxplot(r, main="Boxplot of r")
7 Control Structures
R's control structures are very similar to those of other
programming languages. We will
return to our numerical vector num to illustrate the use of the if statement,
for loop and
while loop:
> if (length(num) > 2) {
+ long = TRUE
+ variance = var(num)
+ } else {
+ long = FALSE
+ variance = NA
+ }
> long
[1] TRUE
> variance
[1] 9.166667
What does the chunk of code written above do?
> squareRoot = numeric()
> for (i in 1:length(num)) {
+ squareRoot = c(squareRoot, sqrt(num[i]))
+ }
> squareRoot
[1] 3.316625 3.464102 3.605551 3.741657 3.872983 4.000000
4.123106 4.242641
[9] 4.358899 4.472136
Why did I declare squareRoot as a numeric vector before
the loop? Remove the vec
tor squareRoot by typing rm(squareRoot) and try the loop again without declaring
the
variable. Did you get an error message? What was the problem? Could we have done
this
another, much simpler , way?
> i = 1
> sumSqrt = 0
> while (squareRoot[i] <= 4) {sumSqrt = sumSqrt + squareRoot[i]; i=i+1}
> sumSqrt
[1] 22.00092
What does the chunk of code written above do? Why did I set the variable i
before the
while loop?
8 Functions
R's strength are the thousands of powerful functions that
allow you to apply the latest
computational statistics algorithms to your data. In our case, the Bioconductor
suite of tools
is extremely powerful for array analysis and more. So, take a little time and
explore some of
the basic functions that I listed on the "R Functions and Packages" slide of the
"Introduction
to R" lecture. Use the help function to understand proper usage/input
requirements and
apply some of these basic functions to your R objects. Next, read the "Calling
Conventions
for Functions" slide to get a feel for applying a ttest and then type t.test
and read the
help page. Generate two vectors named x and y of length 10 whose elements are
normally
distributed with zero mean and standard deviation equal to one using the
function rnorm.
Next, create a vector of length 10 named z with mean two and standard deviation
one. Apply
a t.test between (1) x and y and (2) x and z using the "greater"alternative
option. Given
what you know about how you created x, y, and z, order the vectors in t.test to
yield the
lowest possible pvalue.
> x = rnorm(10)
> y = rnorm(10)
> z = rnorm(10, mean=2)
> t.test(x,y,alternative="greater") # ordering doesn't matter
Welch Two Sample ttest
data: x and y
t = 0.8286, df = 14.305, pvalue = 0.7895
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
1.363578 Inf
sample estimates:
mean of x mean of y
0.3222004 0.1145024
> t.test(z,y,alternative="greater") # correct ordering
Welch Two Sample ttest
data: z and y
t = 4.1771, df = 16.909, pvalue = 0.0003194
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
1.042854 Inf
sample estimates:
mean of x mean of y
1.9020314 0.1145024
> t.test(y,z,alternative="greater") # incorrect ordering
Welch Two Sample ttest
data: y and z
t = 4.1771, df = 16.909, pvalue = 0.9997
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
2.532204 Inf
sample estimates:
mean of x mean of y
0.1145024 1.9020314
We'll end with learning how to write our own functions.
We're going to write a function
called medmean that calculates the median of a vector if its length is below a
user defined
value n and the mean otherwise. We'll apply it to two vectors of di erent length
which include
a bad outlier.
> medmean = function(x, n) {if (length(x) > n) {mean(x)}
else {median(x)}}
> fewdata = c(rnorm(3),100)
> manydata = c(rnorm(1000),100)
> medmean(fewdata,10) # case 1
[1] 1.411065
> medmean(fewdata,3) # case 2
[1] 25.53877
> medmean(manydata,10) # case 3
[1] 0.006908776
> medmean(manydata,1001) # case 4
[1] 0.1188903
For each of the four cases, which branch of the if
statement did we execute? Can you
draw any conclusions about applying the mean or median to data with outliers?
We'll continue next with more R and Bioconductor. Hope you had some fun learning
R.
Prev  Next 