MAS469 Machine Learning
Note: Information for future academic years is provisional. Timetable information and teaching staff are especially likely to change, but other details may also be altered, some courses may not run at all, and other courses may be added.
|Semester 1, 2022/23||10 Credits|
|Lecturer:||Dr Miguel Juarez||Timetable||Reading List|
|Aims||Outcomes||Teaching Methods||Assessment||Full Syllabus|
Machine learning lies at the interface between computer science and statistics. The aims of machine learning are to develop a set of tools for modelling and understanding complex data sets. Tools of statistical machine learning have become important in many fields, such as marketing, finance and business, as well as in science. The module focuses on the problem of training models to learn from existing data to classify new examples of data. Although other aspects of machine learning will be mentioned, the module focuses on the problem of classification; other topics in machine learning are covered by modules in Computer Science.
Prerequisites: MAS223 (Statistical Inference and Modelling)
No other modules have this module as a prerequisite.
- The main problems of data science and machine learning
- Data sets and data visualisation
- Dimensionality reduction - principal components analysis and introduction to other methods
- The multivariate normal distribution and decision boundaries
- Supervised learning: the classification problem and discriminant analysis
- Model performance: cross-validation; the variance-bias trade-off
- Regression and classification trees
- Ensemble methods and random forests; boosting
- Support vector machines
- Logistic regression, neural networks and deep learning
- Introduce students to the main problems in machine learning
- Introduce students to some of the techniques used for solving problems in data science
- Introduce students to neural networks and the main ideas behind "deep learning"
- Introduce students to the principal computer packages involved in machine learning
- Teach students some extensions of univariate statistical techniques to higher-dimensional situations
Learning outcomesBy the end of the unit, a candidate will... 1. Be able to perform some dimensionality reduction techniques (especially principal components analysis) and interpret the results 2. Understand and implement several data classification methods (linear and quadratic discriminant analysis, trees, support vector machines, neural networks etc.) 3. Be able to work with high-dimensional data within the computer programs R and/or Python. 4. Understand some of the statistical background behind the techniques.
Lectures, lab sheets
20 lectures, no tutorials
Two projects (35% and 65%)
1. Introduction to machine learning
The main ideas of machine learning. Computing. Books and other resources.
Basic notation - data frames, variance and correlation matrices. 3. Background mathematics
(Mostly not lectured) Linear algebra. Eigenvalues and eigenvectors. Differentiating with respect to vectors. Constrained optimisation and Lagrange multipliers. Gradient descent. The multivariate normal distribution. 4. Data visualisation
Techniques of scatterplots and other visualisation methods. Esoteric methods: Andrews plots, star plots, Chernoff faces. 5. Principal Components Analysis
Principal components analysis (PCA). Variance or correlation? How many components? Examples and applications. 6. The problems of machine learning in the setting of linear regression
Brief recall of linear models. Model performance. Over- and underfitting. Regularisation. Cross validation. 7. The problems of machine learning and classification
Discriminant rules. Nearest neighbours. Logistic regression. Multiclass extensions. Model peformance. 8. Discriminant analysis
Decision boundary between two multivariate normal distributions. Linear discriminant analysis. Quadratic discriminant analysis. 9. Decision trees and related methods
Regression trees. Classification trees. Ensemble methods. Random forests. Boosting. 10. Support Vector Machines
Separating hyperplanes. Dual formalism. Nonseparable sets. Kernels. Support vector regression. 11. Neural networks
Introduction. Notation. Backpropagation. Variants. Regularisation. Convolutional neural networks. 12. Cluster analysis
Introduction to cluster analysis. Hierarchical methods. Nonhierarchical methods. k-means. DBSCAN.
|C||Everitt||An R and S-PLUS Companion to Multivariate Analysis|
|C||Hastie, Tibshirani and Friedman||The Elements of Statistical Learning|
|C||James, Witten, Hastie and Tibshirani||An Introduction to Statistial Learning|
(A = essential, B = recommended, C = background.)
Most books on reading lists should also be available from the Blackwells shop at Jessop West.