10 min read

Introduction to Time Series

Classical decomposition model

\[ X_t = m_t + s_t + y_y \] where - \(m : \mathbb{Z} \rightarrow \mathbb{R}\) is a slowly changing function, the trend component. - \(s : \mathbb Z \rightarrow \mathbb R\) is a function with a known period \(d\), i.e., \(s_{t + d} = s_t\) for all \(t \in \mathbb Z\) and \(\sum_{j = 1}^d s_j = 0\), is called the seasonal component. - \((y_t, t \in \mathbb Z)\) is a stationary stochastic process.

Time Series Analysis

  • Always plot the data first
    • If there are clear sections in the data, it might be good to analyze each section separately

Best Linear Predictor

Let \((X_t, t \in \Z)\) be a time series with \(Var(X_t) < \infty\) for \(t \in \Z\) and \(X^n := (X_{t_1},\ldots, X_{t_n})\) be a collection of random variables of the time series at \(n\) different times. Then the _best linear predictor of \(X_t\) is given by \[ b_t^l(X^n) = a_0 + a_1 X_{t_n} + \cdots + a_n X_{t_1} \] where the coefficients are determined by the linear equations

  1. \(E(X_t - b_t^l(X^n)) = 0\)
  2. \(E(X_{t_j}(X_t - b_t^l(X^n))) = 0\) for all \(j = 1,\ldots,n\).

If X is stationary, with mean \(\mu\) and autocovariance function \(\gamma\), the coefficients are determined by \[ a_0 = \mu (1 - \sum_{i = 1}^n a_i) \] and \[ (\gamma(t_{n + 1 - j} - t_{n + 1 - i}))_{i,j = 1}^n (a_1,\ldots,a_n)^T = (\gamma(t - t_n),\ldots, \gamma(t - t_1))^T \]

The mean square error is:

\[ MSE(b_t^l(X^n), X_t) = E[(b_t^l(X^n) - X_t)^2] = \gamma(0) - (a_1, \ldots, a_n)(\gamma(t - t_n), \ldots, \gamma(t - t_1))^T \]

Note: when \(X^n := (X_1, \ldots, X_n)\) then the coefficients \((a_0,\ldots,a_n)\) for prediction of \(X_{n + h}\) and \[ (\gamma(i - j)_{i,j = 1}^n) (a_1,\ldots, a_n)^T = (\gamma(h), \ldots, \gamma(h + n - 1))^T \]

Example

The notation \(t_1,\ldots, t_n\) was confusing to me at first. A good example that shown how it works is if we have an \(AR(1)\) process, \[ X_t - \phi_1 X_{t - 1} = Z_t \] Assume that we observe values at \(X_1\) and \(X_3\), but are missing a value for \(X_2\). We then have \(t_1 = 1\), \(t_2 = 3\), with \(n = 2\). We want to find: \[ b_2^l(X^1,X^3) = a_0 + a_1 X_3 + a_2 X_1 \] We have that \(a_0 = 0\), since the process has mean zero. It then follows that,

\[ \begin{aligned} \begin{pmatrix} \gamma(t_{3 + 1 - 1} - t_{3 + 1 - 1}) & \gamma(t_{3 + 1 - 2} - t_{3 + 1 - 1})\\ \gamma(t_{3 + 1 - 1} - t_{3 + 1 - 2}) &\gamma(t_{3 + 1 - 1} - t_{3 + 1 - 1}) \end{pmatrix} \begin{pmatrix} a_1\\ a_2 \end{pmatrix} &= \begin{pmatrix} \gamma(2 - t_{2})\\ \gamma(2 - t_{1}) \end{pmatrix}\\ \begin{pmatrix} \gamma(0) & \gamma(3 - 1)\\ \gamma(3 - 1) &\gamma(0) \end{pmatrix} \begin{pmatrix} a_1\\ a_2 \end{pmatrix} &= \begin{pmatrix} \gamma(2 - 3)\\ \gamma(2 - 1) \end{pmatrix}\\ \begin{pmatrix} 1 & \phi_1^2\\ \phi_1^2 & 1 \end{pmatrix} \begin{pmatrix} a_1\\ a_2 \end{pmatrix} &= \begin{pmatrix} \phi_1\\ \phi_1 \end{pmatrix} \end{aligned} \]

Which leads to a solution \[ a_1 = a_2 = \frac{\phi_1}{1 + \phi_1^2} \]

ARMA(p,q)

\(\{X_t\}\) is an ARMA(p,q) process if \(\{X_t\}\) is stationary and if for every \(t\), \[ X_t - \sum_{i = 1}^p \phi_i X_{t-i} = Z_t + \sum_{j = 1}^q \theta_j Z_{t - j} \] where \(\{Z_t\} \sim WN(0, \sigma^2)\) and the polynomials \((1 - \sum_{i = 1}^p \phi_i z^i)\) and \((1 + \sum_{j = 1}^q \theta_j z^j)\) have no common factors.

Causality

An ARMA(p,q) process \(\{X_t\}\) is causal, if there exists constants \(\{\psi_t\}\) such that \[ \sum_{j = 0}^\infty | \psi_{j} | < \infty \] and \[ X_t = \sum_{j = 0}^\infty \psi_{j} Z_{t - j} \] for all \(t\). That is, if we can represent the ARMA(p,q) process \(\{X_t\}\) as a \(MA(\infty)\) process.

How to actually check? Use the equivalent condition: \[ \phi(z) = 1 - \sum_{i = 1}^p \phi_i z^i \not = 0 \quad \text{for all}~|z| \leq 1 \] That is, that \(\phi(z)\) has no roots inside (or that all root are outside the unit circle).

Finding Causal Representation

To represent our (causal) ARMA(p,q) process in the form: \[ X_t = \sum_{j = 0}^\infty \psi_j Z_{t - j} \] The sequence of \(\{\psi_j\}\) is determined by the relation \(\psi(z) = \sum_{j = 0}^\infty \psi_j z^j = \theta(z) / \phi(z)\), or equivalently: \[ \psi_j - \sum_{k = 1}^p \phi_k \psi_{j - k} = \theta_j, ~j = 0,1,\ldots \] with \(\theta_0 := 1, \theta_j := 0\) for j > q and \(\psi_j := 0\) for \(j < 0\).

Finding invertible (AR(\(\infty\))) representation

To represent our invertible ARMA(p,q) process in the form: \[ Z_t = \sum_{j = 0}^\infty \pi_j X_{t - j} \] we find \(\{\pi_j\}\) by the equations \[ \pi_j + \sum_{k = 1}^q \theta_k \pi_{j - k} = -\phi_j, ~j = 0,1,\ldots, \] where \(\phi_0 := -1,~ \phi_j := 0\) for \(j > p\), and \(\pi_j := 0\) for \(j < 0\).

Model Building for ARMA processes

  • Remove trend and seasonality until you believe the dta can be modeled as a stationary time series.
  • Identify the order of the ARMA process for the time series
    • Either look at ACF/PACF
    • or by fitting (using maximum likelihood or Hannan-Rissanen estimation) sucessively higher order ARMA(p,q) to the data and choosing p, q to minimize either the AICC or BIC.
  • Estimate the final model using maximum likelihood
  • Compute the residuals \(\hat{R}_t\) and check that they are consistent with the specified distribution and temporal covariance structure for \(Z_t\).
    • If they are, then the model is considered adequate for the data.

Autoregressive Processes - AR(p)

\[ X_t - \sum_{j = 1}^p \phi_j X_{t - j} = Z_t \]

  • AR(p) is not necessarily stationary!

AR(1) Example

AR(2)

Can simulate using the function astsa::arima.sim function \[ X_t = Z_t + 0.7 \cdot X_{t - 1} + 0.2 \cdot X_{t - 2} \]

Moving Average Processes - MA(q)

\[ X_t = Z_t + \sum_{j = 1}^q \theta_j Z_{t - j} \] with \(Z_t \sim WN(0, \sigma^2)\). At lag greater than or equal to q, the ACF should be zero.

MA(2) example

\[ X_t = Z_t + 0.8 \cdot Z_{t - 1} + 0.2 \cdot Z_{t - 2} \]

The autocorrelation function

ARMA(p,q)

  • The partial autocorrelation function PACF of an ARMA(p,q) process X can be thought of as the correlation between \(X_t\) and \(X_{t + h}\) when adjusting for the intervening observations \(X_{t + 1}, \ldots, X_{t + h - 1}\).
    • Can be used to detect seasonality: If you have monthly data and you see a big spike at \(\hat{\alpha}(12)\), it is likely that you see a periodic effect corresponding to a calendar year, so you should remove the seasonality.

ACF and PACF

Conditional Mean Model ACF PACF
AR(p) Tails off gradually Cuts off after p lags
MA(q) Cuts off after q lags Tails off gradually
ARMA(p,q) Tails off gradually Tails off gradually

Source: https://se.mathworks.com/help/econ/autocorrelation-and-partial-autocorrelation.html

Innovations Algorithm

Used to compute the best linear predictor, \(b_{n + 1}^l(X^n)\) more computationally efficiently than solving a system of \(n\) linear equations. Can be applied to all time series with finite second moments. Assume we have a time series \((X_t, t \in \mathbb Z)\) with zero mean, and finite second moment \(E[X_t^2] < + \infty\) for all \(t \in \mathbb Z\) and covariance \[ Cov(X_i, X_j) = \kappa(i,j) \] We denote the best linear one-step predictors as \[ \hat{X}_n := \begin{cases} 0 & \text{for n = 1}\\ b_{n}^l(X^{n - 1}) & \text{for n > 1} \end{cases} \] and the mean squared errors as \[ v_n := MSE(\hat{X}_{n+1}, X_{n + 1}) = E[(\hat{X}_{n + 1} - X_{n + 1})^2] \]

There exists coefficients \((\theta_{ij}, 1 \leq j \leq i \leq n)\) such that the best linear predictors satisfy \[ \hat{X}_{n + 1} = \begin{cases} 0 & \text{for } n = 0\\ \sum_{j = 1}^n \theta_{nj}(X_{n + 1 - j} - \hat{X}_{n+1-j}) & \text{for } n \geq 1 \end{cases} \] We compute the coefficients \(\theta_{n1},\ldots, \theta_{nn}\) recursively from the equations \[ v_0 := \kappa(1,1) \] and \[ \theta_{n (n - k)} := v_k^{-1} \left( \kappa(n + 1, k + 1) - \sum_{j = 0}^{k - 1} \theta_{k (k - j)} \theta_{n (n - j)} v_j \right) \]

GARCH/ARCH

Conditional Maximum Likelihood

The MLEs \((\hat{\alpha}_0, \ldots, \hat{\alpha}_p, \hat{\beta}_1, \ldots, \hat{\alpha}_q, \hat{\theta}_Z)\) are obtained by maximizing the conditional likelihood function \[ L(\alpha_0, \ldots, \alpha_p, \beta_1, \ldots, \alpha_q, \theta_Z) = \prod_{t = p + 1} \frac{1}{\sigma_t} f_Z \left( \frac{x_t}{\sigma_t}\right) \] where \(f_Z\) is the density of the white noise Z and \(\theta_Z\) is any other parameter Z depends on (such as degrees of freedom if Z is t-distributed).

It \(Z \sim IIDN(0,1)\) then, \[ \begin{aligned} f_Z \left( \frac{x_t}{\sigma_t}\right) &= \frac{1}{\sqrt{2\pi}} \exp\left( - \frac{X_t^2}{2 \sigma_t^2}\right)\\ L(\alpha_0, \ldots, \alpha_p, \beta_1, \ldots, \alpha_q, \theta_Z) &= \prod_{t = p + 1}^n \frac{1}{\sigma_t} \frac{1}{\sqrt{2\pi}} \exp\left( - \frac{X_t^2}{2 \sigma_t^2}\right)\\ - \ln L(\alpha_0, \ldots, \alpha_p, \beta_1, \ldots, \alpha_q, \theta_Z) &= \sum_{t = p + 1}^n \ln{\sigma_t} + \frac{1}{2} \ln(2\pi)as + \frac{1}{2} \frac{X_t^2}{\sigma_t^2}\\ &= \frac{1}{2} \sum_{t = p + 1}^n 2 \ln{\sigma_t} + \ln(2\pi) + \frac{X_t^2}{\sigma_t^2}\\ &= \frac{1}{2} \sum_{t = p + 1}^n \ln{\sigma_t^2} + \ln(2\pi) + \frac{X_t^2}{\sigma_t^2} \end{aligned} \]

Acknowledgements

This blog post was made possible thanks to:

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[2] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.

[3] Y. Xie, A. P. Hill, and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.3 (2020-02-29)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Stockholm            
 date     2020-05-11                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version date       lib source        
 assertthat      0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 bibtex          0.4.2   2017-06-30 [1] CRAN (R 3.6.0)
 blogdown        0.13    2019-06-11 [1] CRAN (R 3.6.0)
 bookdown        0.11    2019-05-28 [1] CRAN (R 3.6.0)
 cli             2.0.2   2020-02-28 [1] CRAN (R 3.6.0)
 colorspace      1.4-1   2019-03-18 [1] CRAN (R 3.6.0)
 crayon          1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
 digest          0.6.25  2020-02-23 [1] CRAN (R 3.6.0)
 dplyr           0.8.5   2020-03-07 [1] CRAN (R 3.6.0)
 evaluate        0.14    2019-05-28 [1] CRAN (R 3.6.0)
 fansi           0.4.1   2020-01-08 [1] CRAN (R 3.6.0)
 ggplot2       * 3.3.0   2020-03-05 [1] CRAN (R 3.6.0)
 glue            1.3.2   2020-03-12 [1] CRAN (R 3.6.0)
 gtable          0.3.0   2019-03-25 [1] CRAN (R 3.6.0)
 htmltools       0.4.0   2019-10-04 [1] CRAN (R 3.6.0)
 httr            1.4.1   2019-08-05 [1] CRAN (R 3.6.0)
 jsonlite        1.6.1   2020-02-02 [1] CRAN (R 3.6.0)
 knitcitations * 1.0.8   2017-07-04 [1] CRAN (R 3.6.0)
 knitr           1.28    2020-02-06 [1] CRAN (R 3.6.0)
 lifecycle       0.2.0   2020-03-06 [1] CRAN (R 3.6.0)
 lubridate       1.7.4   2018-04-11 [1] CRAN (R 3.6.0)
 magrittr        1.5     2014-11-22 [1] CRAN (R 3.6.0)
 munsell         0.5.0   2018-06-12 [1] CRAN (R 3.6.0)
 pillar          1.4.3   2019-12-20 [1] CRAN (R 3.6.0)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
 plyr            1.8.6   2020-03-03 [1] CRAN (R 3.6.0)
 purrr           0.3.3   2019-10-18 [1] CRAN (R 3.6.0)
 R6              2.4.1   2019-11-12 [1] CRAN (R 3.6.0)
 Rcpp            1.0.3   2019-11-08 [1] CRAN (R 3.6.0)
 RefManageR      1.2.12  2019-04-03 [1] CRAN (R 3.6.0)
 rlang           0.4.5   2020-03-01 [1] CRAN (R 3.6.0)
 rmarkdown       2.1     2020-01-20 [1] CRAN (R 3.6.0)
 scales          1.1.0   2019-11-18 [1] CRAN (R 3.6.0)
 sessioninfo   * 1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 stringi         1.4.6   2020-02-17 [1] CRAN (R 3.6.0)
 stringr         1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
 tibble          2.1.3   2019-06-06 [1] CRAN (R 3.6.0)
 tidyselect      1.0.0   2020-01-27 [1] CRAN (R 3.6.0)
 withr           2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
 xfun            0.12    2020-01-13 [1] CRAN (R 3.6.0)
 xml2            1.2.5   2020-03-11 [1] CRAN (R 3.6.0)
 yaml            2.2.1   2020-02-01 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library