## MAS469 Machine Learning

 Semester 1, 2019/20 10 Credits Lecturer: Dr Frazer Jarvis uses MOLE Timetable Reading List Aims Teaching Methods Assessment Full Syllabus

Machine learning lies at the interface between computer science and statistics. The aims of machine learning are to develop a set of tools for modelling and understanding complex data sets. Tools of statistical machine learning have become important in many fields, such as marketing, finance and business, as well as in science. The module focuses on the problem of training models to learn from existing data to classify new examples of data. Although other aspects of machine learning will be mentioned, the module focuses on the problem of classification; other topics in machine learning are covered by modules in Computer Science.

Prerequisites: MAS223 (Statistical Inference and Modelling)
No other modules have this module as a prerequisite.

## Outline syllabus

• The main problems of data science and machine learning
• Data sets and data visualisation
• Dimensionality reduction - principal components analysis and introduction to other methods
• The multivariate normal distribution and decision boundaries
• Supervised learning: the classification problem and discriminant analysis
• Model performance: cross-validation; the variance-bias trade-off
• Regression and classification trees
• Ensemble methods and random forests; boosting
• Support vector machines
• Logistic regression, neural networks and deep learning

## Aims

• Introduce students to the main problems in machine learning
• Introduce students to some of the techniques used for solving problems in data science
• Introduce students to neural networks and the main ideas behind "deep learning"
• Introduce students to the principal computer packages involved in machine learning
• Teach students some extensions of univariate statistical techniques to higher-dimensional situations

## Teaching methods

Lectures, lab sheets

20 lectures, no tutorials

## Assessment

Two projects of equal weight

## Full syllabus

1. Introduction to machine learning
The main ideas of machine learning. Computing. Books and other resources.

2. Notation
Basic notation - data frames, variance and correlation matrices.
3. Background mathematics
(Mostly not lectured) Linear algebra. Eigenvalues and eigenvectors. Differentiating with respect to vectors. Constrained optimisation and Lagrange multipliers. Gradient descent. The multivariate normal distribution.
4. Data visualisation
Techniques of scatterplots and other visualisation methods. Esoteric methods: Andrews plots, star plots, Chernoff faces.
5. Principal Components Analysis
Principal components analysis (PCA). Variance or correlation? How many components? Examples and applications.
6. The problems of machine learning in the setting of linear regression
Brief recall of linear models. Model performance. Over- and underfitting. Regularisation. Cross validation.
7. The problems of machine learning and classification
Discriminant rules. Nearest neighbours. Logistic regression. Multiclass extensions. Model peformance.
8. Discriminant analysis
Decision boundary between two multivariate normal distributions. Linear discriminant analysis. Quadratic discriminant analysis.
9. Decision trees and related methods
Regression trees. Classification trees. Ensemble methods. Random forests. Boosting.
10. Support Vector Machines
Separating hyperplanes. Dual formalism. Nonseparable sets. Kernels. Support vector regression.
11. Neural networks
Introduction. Notation. Backpropagation. Variants. Regularisation. Convolutional neural networks.
12. Cluster analysis
Introduction to cluster analysis. Hierarchical methods. Nonhierarchical methods. k-means. DBSCAN.