MAS6019 Machine Learning and Time Series

Both semesters, 2019/20 20 Credits
Lecturer: Dr Frazer Jarvis uses MOLE Timetable
Aims Outcomes Teaching Methods Assessment Full Syllabus

The unit develops concepts and techniques for the analysis of data having the complex structure typical of many real applications. The two main themes are the analysis of observations on high-dimensional data, and the analysis of dependent observations made over a period of time on a single variable. Machine learning lies at the interface between computer science and statistics, whose aims are to develop a set of tools for modelling and understanding complex data sets. A review of repeated measures problems links to ideas of time series analysis. General techniques for the study of time series are developed, including structural descriptions, Box-Jenkins and state-space models and their fitting, techniques for forecasting and an introduction to spectral methods.

There are no prerequisites for this module.
No other modules have this module as a prerequisite.


Outline syllabus

  • Semester 1: Machine Learning
    • The main problems of data science and machine learning
    • Data sets and data visualisation
    • Dimensionality reduction - principal components analysis and introduction to other methods
    • The multivariate normal distribution and decision boundaries
    • Supervised learning: the classification problem and discriminant analysis
    • Model performance: cross-validation; the variance-bias trade-off
    • Regression and classification trees
    • Ensemble methods and random forests; boosting
    • Support vector machines
    • Logistic regression, neural networks and deep learning
  • Semester 2: Time Series
    • Preliminary material
      examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis.
    • Simple descriptive methods
      smoothing; decomposition; differencing; autocorrelation. Probability models for stationary series: autoregressive models; moving average models; partial autocorrelation; invertibility; ARMA processes; ARIMA models for non-stationary series. Inference: identification and fitting; diagnostics; Ljung-Box statistic; choice of models; AIC.
    • Introduction to forecasting
      updating and errors; linear predictions; heuristic forecasting methods.
    • State space models
      formulation; filtering, prediction and smoothing; Kalman recursions; Bayesian inference; Bayesian forecasting; local level model; linear trend model; seasonal model.



Aims

  • Introduce students to the main problems in machine learning
  • Introduce students to some of the techniques used for solving problems in data science
  • Introduce students to the principal computer packages involved in machine learning and time series analyses
  • Teach students some extensions of univariate statistical techniques to higher-dimensional situations
  • Introduce some of the distinctive statistical methodologies which arise only in the analysis of dependent data
  • Illustrate how models for dependent data may be constructed and studied.

Learning outcomes

  • Be able to perform dimensionality reduction techniques using principal component analysis and interpret the results;
  • Understand and implement several data classification methods (linear and quadratic discriminant analysis, trees, support vector machines, neural networks etc.);
  • Be able to work with high-dimensional data within the computer programs R and/or Python;
  • Understand some of the statistical background behind the techniques;
  • Understand general terms used in time series analysis, such as stationarity, autocorrelation function (ACF), and partial autocorrelation function (PACF);
  • understand the structure of ARMA and ARIMA models and be able to derive the ACF and PACF for simple ARMA models;
  • understand the approximate sampling behaviour of estimators of the ACF and PACF;
  • be able to fit models to time series data, assess the fit, and if necessary modify the model, and be able to use the fitted models for forecasting;
  • have undertaken an extended analysis of a problem involving a variety of multivariate methods and of a practical time series problem.

Teaching methods

Lectures, with printed notes, plus lab and exercise sheets and computer demonstrations. Some outside reading is also expected.


40 lectures, no tutorials

Assessment

Semester 1: Two projects of equal weight (total 50%)
Semester 2: One project (15%) and a two-hour restricted open book examination (35%) .

Full syllabus

Multivariate data summary

  • Basic notation, sample estimates of mean, covariance and variance (1 session)
  • Mathematical background, and short review of multivariate normal distribution (1 session)
Graphical displays
  • Scatterplots, augmented plots, Andrews plots, special techniques (1 session)
Unsupervised learning - dimensionality reduction
  • Principal component analysis (3 sessions)
Supervised learning
  • Problems of machine learning in the linear regression setting (2 sessions)
  • Classification and the problems of machine learning (2 sessions)
  • Linear and quadratic discriminant analysis (2 sessions)
  • Decision trees (2 sessions)
  • Support Vector Machines (2 sessions)
  • Neural networks (2 sessions)
More unsupervised learning
  • Cluster analysis and other data mining techniques (1 session)
Time Series preliminary material
  • examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis. (2 sessions)
Simple descriptive methods
  • smoothing; decomposition; differencing; autocorrelation. (1 session) Probability models for stationary series: autoregressive models; (1 session)
  • moving average models; partial autocorrelation; invertibility; (1 session)
  • ARMA processes; (2 sessions)
  • ARIMA models for non-stationary series. (1 session)
Inference
  • identification and fitting; (1 session)
  • diagnostics; Ljung-Box statistic; choice of models; AIC. (1 session)
Introduction to forecasting
  • updating and errors; (2 sessions)
  • linear predictions; heuristic forecasting methods. (2 sessions)
State space models
  • formulation; filtering, prediction and smoothing; Kalman recursions. (2 sessions)
  • Bayesian inference; Bayesian forecasting. (1 session)
  • local level model; linear trend model; seasonal model. (3 sessions)

Timetable (semester 1)

Mon 14:00 - 14:50 lecture   Hicks Lecture Theatre 1
Tue 16:00 - 16:50 lecture   Hicks Lecture Theatre 2