Reading data

So far, all the code we’ve run is somewhat static. Variables get assinged a value in the script and very time we run it, the output is always be the same. We will see a lot more of this throughout this workshop, but for now we will introduce one more function that R provides, read.csv.

The print and cat functions are one way of how we get information out of our program, and the read.csv function is a way of getting data into it. The function will open and read a Comma-Separated Values (csv) file, parse it, and assign the data to a variable on the left hand side of the <-.

We will load a dataset of a study about gene expression data of 42 ER- and ER+ breast cancer patients

read_csv.r
breast <- read.csv("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",
            sep="\t",
            skip=99)

print(breast[1:10,1:5])
      ID_REF IDENTIFIER GSM512539 GSM512540 GSM512541
1  1007_s_at    MIR4640    2461.4    3435.7    1932.5
2    1053_at       RFC2      26.7     159.0      31.2
3     117_at      HSPA6      82.6     243.4     150.2
4     121_at       PAX8     942.3     897.5     840.8
5  1255_g_at     GUCA1A      71.8      87.9      75.4
6    1294_at    MIR5193     630.2     571.4     346.3
7    1316_at       THRA     186.0     208.7     141.0
8    1320_at     PTPN21      15.8      18.0      10.6
9  1405_i_at       CCL5      71.3      26.1      26.3
10   1431_at     CYP2E1      58.7      45.0      98.3
Exercise

Using the breast cancer dataser, modify read_csv.R to print the average expresion of the gene TP53.

#| filename: read_csv.r

breast <- read.csv("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",
            sep="\t",
            skip=100)