MAS61004 The Statistician's Toolkit

Both semesters, 2020/21 30 Credits
Lecturer: Prof Jeremy Oakley Timetable Reading List
Aims Outcomes Teaching Methods Assessment Full Syllabus

This module has three themes. The first is linear and generalised linear modelling, in which students will study both the underlying theory, and practical application using R. The second is data handling and exploratory data analysis, in particular, using the ‘tidyverse’ suite of packages in R. The third theme covers general skills to help prepare statisticians for the workplace, in particular: working collaboratively, engaging in ‘client-driven’ problems, communication with non-specialists, and reproducible reporting.

There are no prerequisites for this module.
No other modules have this module as a prerequisite.


Outline syllabus

  • Linear models: theory and implementation in R;
  • Generalised linear models: theory and implementation in R;
  • Incorporating mixed effects in linear and generalised linear models;
  • Exploratory data analysis with R and the tidyverse: importing data, working with data sets, making plots;
  • Use of R Markdown and shiny.



Aims

  • introduce the theory and application of linear and generalised linear models
  • introduce the programming language R for data analysis;
  • give experience of both individual and group work on practical statistical problems;
  • enhance students’ broader understanding of statistical methodology and develop their professional skills as applied statisticians.

Learning outcomes

  • describe the mathematical framework that underpins linear and generalised linear modelling;
  • identify circumstances in which linear and generalised linear models can be used for data analysis, incorporating random effects as necessary;
  • fit linear and generalised linear models using R, and interpret the output;
  • carry out an analysis using generalised linear modelling in a substantial case study;
  • import data from other file formats into R, clean and reshape the data as necessary, and perform an exploratory data analysis;
  • choose the most appropriate type of plot to illustrate features of a dataset, and produce well-formatted and informative plots using R;
  • implement a structured, reproducible data-analysis workflow in R, with R Markdown used to prepare a report;
  • communicate the results of a statistical analysis to a non-expert ‘client’;
  • work effectively in groups to achieve goals defined by other professionals.

Teaching methods

There will be thirty formal lectures covering the content on linear and generalised linear models. These will involve the explanation of theoretical concepts and their application to worked examples. The motivation, rationale, advantages and disadvantages of the various methods taught will be discussed as appropriate, with examples given of communicating issues to a lay audience. Detailed lecture notes will be provided, which students will be expected to study in their own time to assimilate the material. In parallel, there will be ten (two-hour) computer lab sessions, which will cover the content on R. In these lab sessions, case studies in statistical consultancy will be presented, which will give context to the skills taught. Students will work on a group project in a consultancy setting, with the project drawing on modelling and computing skills learned throughout the module. Staff will mentor the groups and provide guidance in consultancy, team-working and report-writing.


30 lectures, no tutorials, 10 practicals

Assessment

One three hour examination (70%) and one group project (30%).

Full syllabus

1. Linear models

  • Brief revision of simple linear regression.
  • General form of linear model; assumptions; design matrices; examples including multiple regression and polynomial regression; factors and their representation in design matrices.
  • Likelihood and least squares estimators; properties of estimators and distribution theory.
  • Implementation in R.
  • Over-fitting.
  • Residuals and testing assumptions; standardised residuals; residual plots in R.
  • Confidence and prediction intervals.
  • Hypothesis tests for nested models; anova in R.
  • Interactions.
2. Exploratory data analysis with R
  • Introduction to R and RStudio; the RStudio interface.
  • Vectors, data frames, lists.
  • RStudio projects.
  • Importing data into R.
  • Working with data frames using dplyr.
  • Making plots with ggplot2.
  • Writing reports with R Markdown.
  • Making web apps with shiny.
3. Generalised linear models
  • Introduction to GLMs: , link functions, GLM dis- tributions and assumptions, distributional properties.
  • Fitting GLMs: the likelihood, implementation in R
  • Model fit and variable selection: deviance, Pearson X2, comparing nested models, residuals, estimation of dispersion parameter.
  • Binomial, ordinal and multinomial logistic regression: link functions, model building, odds ratios, examples.
  • Poisson regression: modelling of counts; contingency tables as Poisson regression.
  • Over-dispersion: quasi-likelihood, quasibinomial, quasi-Poisson. Examples where the dispersion parameter is estimated.
  • Penalised likelihood and AIC for both GLMs and normal theory linear models; step- wise selection.
  • Mixed effects for correlated observations; incorporating mixed effects in linear and generalised linear models; implementation in R.

Reading list

Type Author(s) Title Library Blackwells Amazon
B Faraway Extending the linear model with R: Generalized linear, Mixed Effects and Nonparametric Regression models (Second edition) Available online Blackwells Amazon
B Faraway Linear Models with R 519.538(F) Blackwells Amazon
B Wickham and Grolemund R for Data Science: Import, Tidy, Transform, Visualize, and Model Data Available online Blackwells Amazon
B Wilke Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures Available online Blackwells Amazon

(A = essential, B = recommended, C = background.)

Most books on reading lists should also be available from the Blackwells shop at Jessop West.

Timetable (semester 1)

Mon 10:00 - 10:50 lab session   Blackboard Online
Mon 11:00 - 11:50 lab session   Blackboard Online
Tue 13:00 - 13:50 lecture   Blackboard Online
Fri 14:00 - 14:50 lecture   Blackboard Online