MAS6019 Machine Learning and Time Series
|Both semesters, 2021/22||20 Credits|
|Lecturer:||Dr Kostas Triantafyllopoulos||Home page||Timetable|
|Aims||Outcomes||Teaching Methods||Assessment||Full Syllabus|
The unit develops concepts and techniques for the analysis of data having the complex structure typical of many real applications. The two main themes are the analysis of observations on high-dimensional data, and the analysis of dependent observations made over a period of time on a single variable. Machine learning lies at the interface between computer science and statistics, whose aims are to develop a set of tools for modelling and understanding complex data sets. A review of repeated measures problems links to ideas of time series analysis. General techniques for the study of time series are developed, including structural descriptions, Box-Jenkins and state-space models and their fitting, techniques for forecasting and an introduction to spectral methods.
There are no prerequisites for this module.
No other modules have this module as a prerequisite.
- Semester 1: Machine Learning
- The main problems of data science and machine learning
- Data sets and data visualisation
- Dimensionality reduction - principal components analysis and introduction to other methods
- The multivariate normal distribution and decision boundaries
- Supervised learning: the classification problem and discriminant analysis
- Model performance: cross-validation; the variance-bias trade-off
- Regression and classification trees
- Ensemble methods and random forests; boosting
- Support vector machines
- Logistic regression, neural networks and deep learning
- Semester 2: Time Series
- Preliminary material
examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis.
- Simple descriptive methods
smoothing; decomposition; differencing; autocorrelation. Probability models for stationary series: autoregressive models; moving average models; partial autocorrelation; invertibility; ARMA processes; ARIMA models for non-stationary series. Inference: identification and fitting; diagnostics; Ljung-Box statistic; choice of models; AIC.
- Introduction to forecasting
updating and errors; linear predictions; heuristic forecasting methods.
- State space models
formulation; filtering, prediction and smoothing; Kalman recursions; Bayesian inference; Bayesian forecasting; local level model; linear trend model; seasonal model.
- Preliminary material
- Introduce students to the main problems in machine learning
- Introduce students to some of the techniques used for solving problems in data science
- Introduce students to the principal computer packages involved in machine learning and time series analyses
- Teach students some extensions of univariate statistical techniques to higher-dimensional situations
- Introduce some of the distinctive statistical methodologies which arise only in the analysis of dependent data
- Illustrate how models for dependent data may be constructed and studied.
- Be able to perform dimensionality reduction techniques using principal component analysis and interpret the results;
- Understand and implement several data classification methods (linear and quadratic discriminant analysis, trees, support vector machines, neural networks etc.);
- Be able to work with high-dimensional data within the computer programs R and/or Python;
- Understand some of the statistical background behind the techniques;
- Understand general terms used in time series analysis, such as stationarity, autocorrelation function (ACF), and partial autocorrelation function (PACF);
- understand the structure of ARMA and ARIMA models and be able to derive the ACF and PACF for simple ARMA models;
- understand the approximate sampling behaviour of estimators of the ACF and PACF;
- be able to fit models to time series data, assess the fit, and if necessary modify the model, and be able to use the fitted models for forecasting;
- have undertaken an extended analysis of a problem involving a variety of multivariate methods and of a practical time series problem.
Lectures, with printed notes, plus lab and exercise sheets and computer demonstrations. Some outside reading is also expected.
40 lectures, no tutorials
Semester 1: Two projects of equal weight (total 50%)
Semester 2: One project (15%) and a time restricted open book examination (35%) .
Multivariate data summary
- Basic notation, sample estimates of mean, covariance and variance (1 session)
- Mathematical background, and short review of multivariate normal distribution (1 session)
- Scatterplots, augmented plots, Andrews plots, special techniques (1 session)
- Principal component analysis (3 sessions)
- Problems of machine learning in the linear regression setting (2 sessions)
- Classification and the problems of machine learning (2 sessions)
- Linear and quadratic discriminant analysis (2 sessions)
- Decision trees (2 sessions)
- Support Vector Machines (2 sessions)
- Neural networks (2 sessions)
- Cluster analysis and other data mining techniques (1 session)
- examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis. (2 sessions)
- smoothing; decomposition; differencing; autocorrelation. (1 session) Probability models for stationary series: autoregressive models; (1 session)
- moving average models; partial autocorrelation; invertibility; (1 session)
- ARMA processes; (2 sessions)
- ARIMA models for non-stationary series. (1 session)
- identification and fitting; (1 session)
- diagnostics; Ljung-Box statistic; choice of models; AIC. (1 session)
- updating and errors; (2 sessions)
- linear predictions; heuristic forecasting methods. (2 sessions)
- formulation; filtering, prediction and smoothing; Kalman recursions. (2 sessions)
- Bayesian inference; Bayesian forecasting. (1 session)
- local level model; linear trend model; seasonal model. (3 sessions)