Univariate Time series - 2 Methods of Economic Investigation Lecture 19 Last Time Concepts that are useful Stationarity Ergodicity Ergodic Theorem Autocovariance Generating Function

Lag Operators Time Series Processes AR(p) MA(q) Todays Class Building up to estimation Wold Decomposition Estimating with exogenous, serially correlated errors Testing for Lag Length

Refresher Stationarity: Some persistence but not too much Ergodicity: Persistence dies out entirely over some finite period of time Square Summability (assumption for MA process) with parameter such that j2 j 0 Invertibility (assumption for AR process) with parameter which has roots such that 1 ARMA process

In general can have a process with both AR and MA components A general ARMA(p, q) process in our lag function notation this looks like:a(L)xt = b(L)t For example, we may have an ARMA(2, 1) xt (1xt-1 + 2xt-3) = t + 1 t-1 (1 - 1L 2L2) xt = (1 + 1L) t If the process is invertible then we can rewrite this as: xt=a(L)-1 b(L) t Why Focus on ARMA processes

Define the range of ARMA processes (invertible AR lag polynomials, square summable MA lag polynomials) which can rely on convergence theorems any time series that is covariance stationary, has a linear ARMA representation. Information Sets At time t-n Everything for time t-n and before is known

Everything at time t is unknown Information set t-n Define Et-n(t) = E[t | t-n] Distinct from E[t] because we know previous values of s up until t-ns up until t-n For example, suppose n = 1 and t = t-1+, E (t) =0 for all t so its up until t-ns a mean zero process Et-1(t) = t-1 Recalling the CEF

Define the linear conditional expectation function CEF(a | b) which is the linear project, i.e. the fitted values of a regression of a on b. i.e. a = b This is distinct from the general expectations operator in that it is imposing a linear form of the conditional expectation function. Wold Decomposition Theorem - 1 Formally the Wold Decomposition Theorem says that: Any mean zero weakly stationary process {xt} can be represented in the form

xt j t j t j 0 This comes with some properties for each term Wold Decomposition Theorem - 2 Where t xt CEF(xt | xt-1, xt-2, . . . ,x0). Properties of t CEF (t|xt1, xt2, . . . x0)=0, E(txtj) = 0, E(t) = 0,

E(t2) = 2 for all t, and E(t s) = 0 for all t s The MA polynomial is invertible The parameters is square summable {j} and {s} are unique. t is linearly deterministic i.e. t = CEF(t|xt1, . . . .). A note on the Wold Decomposition Much of the properties come directly from our assumptions tha the process is weakly stationary

While it says mean zero process, remember we can de-mean our data so most processes can be represented in this format. Uses of Wold Form This theorem is extremely useful because it returns time-series processes back to our standard OLS model. the Wold MA() representation is unique.

Notice that wes up until t-nve relaxed some of the conditions for the Gauss-Markov theorem to hold. if two time series have the same Wold representation, then they are the same time series This on true only up to second moments in linear forecasting Emphasis on Linearity although CEF(t | xtj) = 0, can have xtj) 0 with nonlinear projections E(t | If the true xt is not generated by linear combinations of past xt plus a shock, then the

Wold shocks (s up until t-ns) will be different from the true shocks. The uniqueness result only states that the Wold representation is the unique linear representation where the shocks are linear forecast errors. Estimating with Serially Correlated Errors Suppose that we have: Yt = Xt + t E[t | xt] = 0, E[t2 | xt]=2 E[ t t-k] = k for k0 and so define E[tk] = 2

We could consistently estimate but our standard errors would be incorrect making it difficult to do inference. Just a heteroskedasticity problem which we have already seen with random effects Use feasible GLS to estimate weights and then re-estimate OLS to obtain efficient standard errors. Endogenous Lagged Regressors May be the case that either the dependent variable or the regressor should enter the

estimating equation in lag values too Suppose we were estimating t Yt = 0 Xt +1 Xt-1 + k Xt-k + t. We think that these Xs up until t-ns are correlated with Y up to some lag length k We think these Xs up until t-ns are correlated with each other (e.g. some underlying AR process) but wes up until t-nre not sure how many lags to include Naive Test

Include lags going very far back r >> k test the longest lag coefficient r = 0 and see if that is significant. If not, drop it and keep going. Problems: Practically, the longer lags you take, the more data you make unusable because it doesns up until t-nt have enough time periods to construct the lags. doesns up until t-nt allow lag t-6 but exclude lag t-3. The theoretical issue is that we will reject the null 5 percent of the time, even if its up until t-ns true (or whatever the significance of the test is).

More sophisticated testing Can be a bit more sophisticated comparing restricted and unrestricted models define pmax as some long lag length greater than the expected relevant lag length In general, we do not test our pmax but as before, as p pmax the sample size decreases. Define j = Yt = 0 Xt +1 Xt-1 + j Xj and let N be the sample size. We therefore could imagine trying to minimize the sum of squared residual:

j ' j min log o j pmax N c ( n) ( j 1) n Cost Functions Intuition: c( . ) is a penalty for adding additional parameters thus we try to pick the best specification using that cost function to penalize inclusion of extra

but irrelevant lags. Akaike (AIC): c(n) = 2 the AIC criterion is not well-founded in theory and will be biased in finite samples the bias will tend to overstate the true lag length Bayesian: c(n) = log(n) the BIC will converge to the true p. Return to Likelihood Ratio Tests The minimization problem is just likelihood ratio test To see this, compare lag length j to lag length k. We can write:

Define constant LR test Constant: Declining in N Next Time Multivariate Time Series Testing for Unit Roots Cointegration Returning to Causal Effects

Impulse Response Functions Forecasting