Practice on your own

In this workshop, we went all the way from experimental design, to bioinformatic processing of .fastq files, to differential gene expression analysis in DESeq2. We hope that you are now ready to design and analyze your own transcriptomics experiments. Remember that we only covered the very basics of transcriptome analysis, expect to learn a lot more when you start analyzing your own data.

If you first want to practice more, or don’t have your own data yet, you can analyze publicly available RNA-seq datasets. Papers often make their raw sequencing reads available on NCBI or ENA, which you can download to reanalyze. A repository where pre-processed count tables are available is EMBL-EBI. We collected four count tables to practice DESeq2 analysis, from three different domains of the life sciences. They are available in the practice/ folder of the GitHub repository of this workshop.

Practice dataset #1 is very simple: it has one experimental factor, just like the dataset we worked with so far. All other practice datasets (#1, #3, and #4) include two or more experimental factors. This makes the analysis more complex but also more informative, especially since your own experiments will probably involve multiple factors as well.

The datasets

🌱 🚀 The Plant Scientist #1: Arabidopsis seedlings that experienced spaceflight
- Experiment ID: GLDS38
- Experimental factors:
  - condition, with two levels: space_flight and ground_control
🌽 🌱 The Plant Scientist #2: maize plants infected with the fungus Ustilago maydis
- Experiment ID: E-CURD-40
- Experimental factors:
  - treatment, with levels: Ustilago maydis and none
  - timepoint, with levels: 0.5, 1, 2, 4, 6, 8, 12, days after infection
🧠 ⚡ The Neuroscientist: different brain regions from patients with different psychiatric problems
- Experiment ID: E-GEOD-78936
- Experimental factors:
  - disease, with levels: normal, schizophrenia, bipolar disorder
  - brain_region, with levels: area 9, area 11, area 24
🦠 🔬 The microbiologist: Pseudomonas aeruginosa bacteria exposed to different concentrations of copper
- Experiment ID: GSE160187
- Experimental factors:
  - genotype, with levels: POA1 (wild-type bacteria) and XEN41 (tetracyclin resistant bacteria)
  - treatment, with levels: none, MIC/10, MIC levels of copper
  - timepoint, with levels: 0, 24, 48, 72 hours of incubation

Where to start?

You can follow the DESeq2 episode with these new datasets. Broadly, these are the steps you can take:

Create a dds object by combining the counts table with the experimental design table.
Perform a PCA to see how the samples are clustering. Try mapping different experimental factors to different aestethics in ggplot2: e.g., map the treatment to to the shape of the points, and timepoint to color.
Perform differential gene expression analysis between different conditions.
Visualize DEG results in a volcano plot or a heatmap.
Select a group of DEGs to dive into the biological intepretation via GO term enrichment.

Note

For studies with multiple experimental factors, it’s often useful to combine different factors into a new variable. Then, your design formula becomes as simple as ~ condition.

metadata <- metadata %>%
    mutate(condition = paste(treatment, timepoint, sep = "-"))

dds <- DESeqDataSetFromMatrix(countData = raw_counts, 
                              colData = metadata, 
                              design = ~ condition)