## MAS6011 Dependent Data

Both semesters, 2017/18 | 20 Credits | ||||

Lecturer: | Dr Kostas Triantafyllopoulos | uses MOLE | Timetable | ||

Aims | Outcomes | Teaching Methods | Assessment | Full Syllabus |

The unit develops concepts and techniques for the analysis of data having the complex structure typical of many real applications. The two main themes are the analysis of observations on several dependent variables, and the analysis of dependent observations made over a period of time on a single variable. The unit begins with a practical introduction to multivariate analysis: Data Mining techniques such as summarizing and displaying high dimensional data and dimensionality reduction, principal components, multidimensional scaling, multivariate analysis of variance and discrimination. A review of repeated measures problems links to ideas of time series analysis. General techniques for the study of time series are developed, including structural descriptions, Box-Jenkins and state-space models and their fitting, and techniques for forecasting, covering local level, trend and seasonal time series. Emphasis is given to the practical implementation of the techniques using appropriate computer packages.

There are no prerequisites for this module.

No other modules have this module as a prerequisite.

## Outline syllabus

- Semester 1: Multivariate Data Analysis
- Multivariate data summary

sample estimates of mean, covariance and variance - Graphical displays

scatterplots, augmented plots, Andrews' plots, special techniques. - Exploratory analysis and dimensionality reduction

principal component analysis, principal component and crimcoord displays, implementation in**R**. - Multidimensional scaling

visualisation of similarity data. - Linear discriminant analysis

visualisation of grouped data, linear discriminant analysis in**R**. - Multivariate normal distribution

basic properties, confidence regions, simple hypothesis tests, statistical discriminant analysis. - Single and two sample methods

Hotelling's T^{2}test, practical implementation in**R**. - Construction of statistical hypothesis tests

the likelihood ratio method and the union-intersection principle, MANOVA, implementation in**R**.

- Multivariate data summary
- Semester 2: Time Series
- Preliminary material

examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis. - Simple descriptive methods

smoothing; decomposition; differencing; autocorrelation. Probability models for stationary series: autoregressive models; moving average models; partial autocorrelation; invertibility; ARMA processes; ARIMA models for non-stationary series. Inference: identification and fitting; diagnostics; Ljung-Box statistic; choice of models; AIC. - Introduction to forecasting

updating and errors; linear predictions; heuristic forecasting methods. - State space models

formulation; filtering, prediction and smoothing; Kalman recursions; Bayesian inference; Bayesian forecasting; local level model; linear trend model; seasonal model.

- Preliminary material

## Aims

- To illustrate extensions of univariate statistical methodology to dependent data.
- To introduce some of the distinctive statistical methodologies which arise only in the analysis of dependent data.
- To illustrate how models for dependent data may be constructed and studied.
- To introduce students to some of the computational techniques required for multivariate and time series analyses.

## Learning outcomes

- have some understanding of techniques of multivariate data summary and graphical display and of the principles of multivariate exploratory data analysis and dimensionality reduction;
- have some understanding of the construction of multivariate likelihood ratio tests and of the union-intersection principle in multivariate testing;
- be able to perform and interpret principal component analysis and linear discriminant analysis using a computer package;
- be able to understand the results of computer based multivariate analyses of one and two sample tests;
- be familiar with facilities offered by computer packages for multivariate analysis.
- understand general terms used in time series analysis, such as stationarity, autocorrelation function (ACF), and partial autocorrelation function (PACF).
- understand the structure of ARMA and ARIMA models and be able to derive the ACF and PACF for simple ARMA models.
- understand the approximate sampling behaviour of estimators of the ACF and PACF.
- be able to fit models to time series data, assess the fit, and if necessary modify the model, and be able to use the fitted models for forecasting.
- have undertaken an extended analysis of a problem involving a variety of multivariate methods and of a practical time series problem.

## Teaching methods

Lectures, with printed notes, plus task and exercise sheets and computer demonstrations. Some outside reading is also expected.

40 lectures, no tutorials

## Assessment

Two projects (30%) and a three-hour restricted open book examination (70%) .

## Full syllabus

**Multivariate data summary**

- basic notation, sample estimates of mean, covariance and variance (1 session)

**Graphical displays**

- Scatterplots, augmented plots, Andrews plots, special techniques (1 session)

**Exploratory analysis and dimensionality reduction**

- principal component analysis (2 sessions)
- principal component and crimcoord displays (1 session)
- Multidimensional scaling (2 sessions)
- Linear discriminant analysis (2 sessions)
- Cluster analysis and other data mining techniques (1 session)

**Construction of statistical hypothesis tests**

- multivariate normal distribution and confidence regions (2 sessions)
- Hotelling's T
^{2}test (1 session) - statistical discriminant analysis (1 session)
- practical implementation in
**R**(1 session) - the likelihood ratio method in multivariate data (2 sessions)
- the union-intersection principle (1 session)
- MANOVA (1 session)

**Time Series preliminary material**

- examples; purposes of analysis; components (trend, cycle, seasonal, irregular); stationarity, autocorrelation; approaches to time series analysis. (2 sessions)

**Simple descriptive methods**

- smoothing; decomposition; differencing; autocorrelation. (1 session) Probability models for stationary series: autoregressive models; (1 session)
- moving average models; partial autocorrelation; invertibility; (1 session)
- ARMA processes; (2 sessions)
- ARIMA models for non-stationary series. (1 session)

**Inference**

- identification and fitting; (1 session)
- diagnostics; Ljung-Box statistic; choice of models; AIC. (1 session)

**Introduction to forecasting**

- updating and errors; (2 sessions)
- linear predictions; heuristic forecasting methods. (2 sessions)

**State space models**

- formulation; filtering, prediction and smoothing; Kalman recursions. (2 sessions)
- Bayesian inference; Bayesian forecasting. (1 session)
- local level model; linear trend model; seasonal model. (3 sessions)

## Timetable (semester 2)

Mon | 14:00 - 14:50 | lecture | Hicks Lecture Theatre 1 | ||||

Tue | 09:00 - 09:50 | lecture | Hicks Lecture Theatre 1 |