Reading data

So far, all the code we’ve run is somewhat static. Variables get assinged a value in the script and very time we run it, the output is always be the same. We will see a lot more of this throughout this workshop, but for now we will introduce one more function that R provides, read.csv.

The print and cat functions are one way of how we get information out of our program, and the read.csv function is a way of getting data into it. The function will open and read a Comma-Separated Values (csv) file, parse it, and assign the data to a variable on the left hand side of the <-.

We will load a dataset of a study about gene expression data of 42 ER- and ER+ breast cancer patients

read_csv.r
breast <- read.csv("https://raw.githubusercontent.com/Bristol-Training/intro-r-1/refs/heads/main/data/GDS3716.soft",
            sep="\t",
            skip=99)

print(breast[1:10,1:5])
      ID_REF IDENTIFIER GSM512539 GSM512540 GSM512541
1  1007_s_at    MIR4640    2461.4    3435.7    1932.5
2    1053_at       RFC2      26.7     159.0      31.2
3     117_at      HSPA6      82.6     243.4     150.2
4     121_at       PAX8     942.3     897.5     840.8
5  1255_g_at     GUCA1A      71.8      87.9      75.4
6    1294_at    MIR5193     630.2     571.4     346.3
7    1316_at       THRA     186.0     208.7     141.0
8    1320_at     PTPN21      15.8      18.0      10.6
9  1405_i_at       CCL5      71.3      26.1      26.3
10   1431_at     CYP2E1      58.7      45.0      98.3
NoteExercise

Using the breast cancer dataser, modify read_csv.R to print the average expresion of the gene TP53.

#| filename: read_csv.r

breast <- read.csv("https://raw.githubusercontent.com/Bristol-Training/intro-r-1/refs/heads/main/data/GDS3716.soft",
            sep="\t",
            skip=100)

which(breast$IDENTIFIER == "TP53") #returns  1274 10723

v1 <- breast[1274,-c(1,2)]
mean(as.numeric(v1)) # 310.8952

v2 <- breast[10723,-c(1,2)]
mean(as.numeric(v2)) # 17.15714