R
metadata <- metadata %>%
mutate(condition = paste(treatment, timepoint, sep = "-"))
dds <- DESeqDataSetFromMatrix(countData = raw_counts,
colData = metadata,
design = ~ condition)In this workshop, we went all the way from experimental design, to bioinformatic processing of .fastq files, to differential gene expression analysis in DESeq2. We hope that you are now ready to design and analyze your own transcriptomics experiments. Remember that we only covered the very basics of transcriptome analysis, expect to learn a lot more when you start analyzing your own data.
If you first want to practice more, or donβt have your own data yet, you can analyze publicly available RNA-seq datasets. Papers often make their raw sequencing reads available on NCBI or ENA, which you can download to reanalyze. A repository where pre-processed count tables are available is EMBL-EBI. We collected four count tables to practice DESeq2 analysis, from three different domains of the life sciences. They are available in the practice/ folder of the GitHub repository of this workshop.
Practice dataset #1 is very simple: it has one experimental factor, just like the dataset we worked with so far. All other practice datasets (#1, #3, and #4) include two or more experimental factors. This makes the analysis more complex but also more informative, especially since your own experiments will probably involve multiple factors as well.
π± π The Plant Scientist #1: Arabidopsis seedlings that experienced spaceflight
GLDS38condition, with two levels: space_flight and ground_controlπ½ π± The Plant Scientist #2: maize plants infected with the fungus Ustilago maydis
E-CURD-40treatment, with levels: Ustilago maydis and nonetimepoint, with levels: 0.5, 1, 2, 4, 6, 8, 12, days after infectionπ§ β‘ The Neuroscientist: different brain regions from patients with different psychiatric problems
E-GEOD-78936disease, with levels: normal, schizophrenia, bipolar disorderbrain_region, with levels: area 9, area 11, area 24π¦ π¬ The microbiologist: Pseudomonas aeruginosa bacteria exposed to different concentrations of copper
GSE160187genotype, with levels: POA1 (wild-type bacteria) and XEN41 (tetracyclin resistant bacteria)treatment, with levels: none, MIC/10, MIC levels of coppertimepoint, with levels: 0, 24, 48, 72 hours of incubationYou can follow the DESeq2 episode with these new datasets. Broadly, these are the steps you can take:
dds object by combining the counts table with the experimental design table.ggplot2: e.g., map the treatment to to the shape of the points, and timepoint to color.For studies with multiple experimental factors, itβs often useful to combine different factors into a new variable. Then, your design formula becomes as simple as ~ condition.
R
metadata <- metadata %>%
mutate(condition = paste(treatment, timepoint, sep = "-"))
dds <- DESeqDataSetFromMatrix(countData = raw_counts,
colData = metadata,
design = ~ condition)