Data Analysis Class


Dat.An-1.Iris

The Data Analysis course represents for me a sort of a culmination of my online courses. Even since the first one (Artificial Intelligence) it became obvious that statistics and data analysis play a big part in lots of things I am interested in.

I do not consider myself particularly good at math (and this is one of the reasons for me abandoning the University years ago) so I was at the same time wary that the farther I went on the statistics study, the earlier I would hit a math wall.

Luckily for me the present course is very pragmatic and hands-on, and gives you a reasonable set of tools (both technical and mental ones) that allow you to at least make some inwards in exploring diverse data sets looking for interesting patterns.

Dat.An-2.FICO
The whole course takes 8 weeks. Every week you get a quiz (which contributes to the final score) to reinforce the main themes presented in the lessons, and you also have to submit two short essays based on your analysis of a set of data provided to you along with the main hypothesis you have to investigate.

These two "homework" projects are evaluated by randomly picked students (and you are in turn asked to grade four other people's works) and contribute another 50% of the final score.

Everything is done in "R". While the course provides resources and links to tutorials it should not be considered a "gentle introduction to R". In fact some students complained in the forums about feeling a bit overwhelmed by this. I managed to cope thanks to my programming background and having already used R in a previous course.

Final Result: 89.8% I am a bit saddened because I hoped to get the "distinction" certificate (you have to get at least 90% though). I worked hard at this course (probably more than at any other so far, with the possible exclusion of the AI one).
I am still happy, though - because I really got a sense of accomplishment from this, and I feel confident enough to start looking for opportunities to practice what I learned, and also to learn more.


Syllabus:

  • The structure of a data analysis (steps in the process, knowing when to quit, etc.)
  • Types of data (census, designed studies, randomized trials)
  • Types of data analysis questions (exploratory, inferential, predictive, etc.)
  • How to write up a data analysis (compositional style, reproducibility, etc.)
  • Obtaining data from the web (through downloads mostly)
  • Loading data into R from different file types
  • Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
  • Exploratory statistical models (clustering)
  • Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
  • Basic model checking (primarily visually)
  • The prediction process
  • Study design for prediction
  • Cross-validation
  • A couple of simple prediction models
  • Basics of simulation for evaluating models
  • Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)