Data visualization and reproducible science

Topics or episodes to cover:

R wrangling:

  • Based on carpentries + other tutorials
  • Importing data into R
  • Data frame manipulation using dplyr
  • Making graphs using ggplot2

Introduction to unix shell:

  • Based on carpentries and other tutorials
  • Basic set of unix commands (e.g. cd, ls, etc)
  • Continue with basic bioinformatics filetypes (.fasta, .fastq)
  • Introduce sanity check concept

Reproducible science using Github:

  • Introduce github website
  • Introduce concept of version control
  • Make people create a github account, make a repository, and push some changes to the repository

Data visualization in R:

  • Some data viz theory
  • Continue working with ggplot2, slightly more advanced
  • Assignment: take a dataset and create a very BAD plot and a very GOOD plot with this data. Explain why the bad plot is bad and the good plot is good. Present? Hand in in CANVAS? Push to github is maybe better? How to grade this?