Practice on your own
In this workshop, we went all the way from experimental design, to bioinformatic processing of .fastq files, to differential gene expression analysis in DESeq2. We hope that you are now ready to design and analyze your own transcriptomics experiments. Remember that we only covered the very basics of transcriptome analysis, expect to learn a lot more when you start analyzing your own data.
If you first want to practice more, or don’t have your own data yet, you can analyze publicly available RNA-seq datasets. Papers often make their raw sequencing reads available on NCBI or ENA, which you can download to reanalyze. A repository where pre-processed count tables are available is EMBL-EBI. We collected four count tables to practice DESeq2 analysis, from three different domains of the life sciences. They are available in the practice/ folder of the GitHub repository of this workshop.
Practice dataset #2 has one experimental factor, just like the dataset we worked with so far. However, it has five different levels, so it’s a bit more complex. All other practice datasets (#1, #3, and #4) include two or mroe experimental factors. This makes the analysis more complex but also more informative, especially since your own experiments will probably involve multiple factors as well.
The datasets
🌽 🌱 The Plant Scientist #1: maize plants infected with the fungus Ustilago maydis
- Experiment ID:
E-CURD-40 - Experimental factors:
treatment, with levels:Ustilago maydisandnonetimepoint, with levels:0.5,1,2,4,6,8,12, days after infection
- Experiment ID:
🌱 🧪 The Plant Scientist #2: Arabidopsis roots treated with different plant hormones
- Experiment ID:
Deforges_2019 - Experimental factors:
condition, with levels:control,auxin,aba,mejaoracc.
- Experiment ID:
🧠 ⚡ The Neuroscientist: different brain regions from patients with different psychiatric problems
- Experiment ID:
E-GEOD-78936 - Experimental factors:
disease, with levels:normal,schizophrenia,bipolar disorderbrain_region, with levels:area 9,area 11,area 24
- Experiment ID:
🦠 🔬 The microbiologist: Pseudomonas aeruginosa bacteria exposed to different concentrations of copper
- Experiment ID:
GSE160187 - Experimental factors:
genotype, with levels:POA1(wild-type bacteria) andXEN41(tetracyclin resistant bacteria)treatment, with levels:none,MIC/10,MIClevels of coppertimepoint, with levels:0,24,48,72hours of incubation
- Experiment ID:
Where to start?
You can follow the DESeq2 episode with these new datasets. Broadly, these are the steps you can take:
- Create a
ddsobject by combining the counts table with the experimental design table. - Perform a PCA to see how the samples are clustering. Try mapping different experimental factors to different aestethics in
ggplot2: e.g., map the treatment to to the shape of the points, and timepoint to color. - Perform differential gene expression analysis between different conditions.
- Visualize DEG results in a volcano plot or a heatmap.
- Select a group of DEGs to dive into the biological intepretation via GO term enrichment.