Forecasting with bayesian techniques MP

Июль 30, 2022

Главная
Математика
Forecasting with bayesian techniques MP

Содержание

2. Lecture Objectives Introduce the idea of and rationale for Bayesian perspective and Bayesian VARs Understand the
3. Introduction: Two Perspectives in Econometrics Let θ be a vector of parameters to be estimated using
4. Outline Why a Bayesian Approach to VARs? Brief Introduction to Bayesian Econometrics Analytical Examples Estimating a
5. Why a Bayesian Approach to VAR? Dimensionality problem with VARs: y contains n variables, p lags
6. Usually, only a fraction of estimated coefficients are statistically significant parsimonious modeling should be favored What
7. Combining information: prior and posterior Bayesian coefficient estimates combine information in the prior with evidence from
8. Shrinkage There are many approaches to reducing over-parameterization in VARs A common idea is shrinkage Incorporating
9. Forecasting Performance of BVAR vs. alternatives Source: Litterman, 1986 BVAR provides better forecast of Real GNP
10. Introduction to Bayesian Econometrics: Objects of Interest Objects of interest: Prior distribution: Likelihood function: - likelihood
11. Bayesian Econometrics: Objects of Interest (2) The marginal likelihood… …is independent of the parameters of the
12. Bayesian Econometrics: maximizing criterion For practical purposes, it is useful to focus on the criterion: Traditionally,
13. Bayesian Econometrics : maximizing criterion (2) Maximizing C(θ) gives the Bayes mode. In some cases (i.e.
14. Analytical Examples Let’s work on some analytical examples: Sample mean Linear regression model
15. Estimating a Sample Mean Let yt～ i.i.d. N(μ,σ2), then the data density function is: where y={y1,…yT}
16. Estimating a Sample Mean The posterior of μ: …has the following analytical form with So, we
17. Estimating a Sample Mean: Example Assume the true distribution is Normal yt~N(3,1) So, μ=3 is known
18. Compute the posterior distribution as sample size increases Posterior with prior N(1,1) Already after 10 draws
19. Then, we look at more informative (tight) prior and set ν =50 (higher precision) Posterior with
20. Examples: Regression Model I Linear Regression model: where ut～ i.i.d. N(0,σ2) Assume: β is random and
21. Assume that the prior mean of β has multivariate Normal distribution N(m,σ2M): where the key parameters
22. Examples: Regression Model I (3) We mix information – densities of data and prior – to
23. Since we do not like black boxes… there are 2 ways to get m* and M*
24. Define a “new” regression model We simply stack our “ingredients” together to mix the information (prior
25. Examples: Regression Model II So far the life was easy(-ier), in the linear regression model β
26. Examples: Regression Model II () To manipulate the product …we assume the following distributions: Normal for
27. Examples: Regression Model II (3) By manipulating the product (see more details in the appendix B)
28. Priors: summary In the above examples we dealt with 2 types of prior distributions of our
29. Bayesian VARs Linear Regression examples will help us to deal with our main object – Bayesian
30. VAR in a matrix form: example Consider, as an example, a VAR for n variables and
31. How to Estimate a BVAR: Case 1 Prior Consider Case 1 prior for a VAR: coefficients
32. Before we see the case of an unknown Σe need to introduce a multivariate distribution to
33. How to Estimate a BVAR: Conjugate Priors Assume Conjugate priors: The VAR parameters A and Σe
34. BVARs: Minnesota Prior Implementation The Minnesota prior – a particular case of the “Case 1 prior”
35. The Minnesota prior The prior variance for the coefficient of lag k in equation i for
36. The Minnesota prior Interpretation: the prior on the first own lag is the prior on the
37. BVARs: Minnesota Prior Implementation
38. BVARs: Prior Selection Minnesota and conjugate priors are useful (e.g., to obtain closed-form solutions), but can
39. Del Negro and Schorfheide (2004): DSGE-VAR Approach Del Negro and Schorfheide (2004) We want to estimate
40. Del Negro and Schorfheide (2004) We estimate the following BVAR: The solution for the DSGE model
41. Del Negro and Schorfheide (2004) Parameter λ is a “weight” of “artificial” (prior) data from DSGE
42. Likelihood of the VAR of a DSGE Model Recall the likelihood function for an unconstrained VAR
43. Next step: we simulate s→∞ artificial observations (Y*,X*) from the DSGE …and replace sample moments like
44. Conditional on the parameters θ, the DSGE m+odel provides a conjugate priors for the BVAR For
45. DSGE-VAR posterior Posterior, conditional on θ : where Prior info, weighted by λT Information from Data
46. BVARs (under different λ’s) have advantage in forecasting performance (RMSE) vis-à-vis the unrestricted VAR The “optimal”
47. BVAR with the DSGE prior under the “optimal” λ has better forecasting performance than: the unrestricted
48. Kadiyala and Karlsson (1997) Small Model: a bivariate VAR with unemployment and industrial production Sample period:
49. Kadiyala and Karlsson (1997) Compare different priors based on the VAR forecasting performance (RMSE) Standard VAR(p)…
50. Prior distributions in K&K K&K use a number of competing prior distributions… Minnesota, Normal-Wishart, Normal-Diffuse, Extended
51. Prior distributions in K&K In the Small Model: For prior distributions, hyper-parameters π1= γ, π2=wγ are
52. Forecast Comparison in K&K: Small Model, unemployment Forecasting performance is markedly different for different priors Normal-Wishart,
53. Forecast Comparison in K&K: Large Model OLS and Diffuse priors produce worst forecasts in all cases
54. Giannone, Lenza and Primiceri (2011) Use three VARs to compare forecasting performance Small VAR: GDP, GDP
55. Giannone, Lenza and Primiceri (2011) The marginal likelihood is obtained by integrating out the parameters of
56. Giannone, Lenza and Primiceri (2011) We interpret the model as a hierarchical model by replacing pγ(θ)=p(θ|γ)
57. Giannone, Lenza and Primiceri (2011)
58. In all cases BVARs demonstrate better forecasting performance vis-à-vis the unrestricted VARs BVARs are roughly at
59. Conclusions BVARs is a useful tool to improve forecasts This is not a “black box” posterior
60. Thank You!
61. Appendix A: Remarks about the marginal likelihood Remarks about the marginal likelihood: If we have M1,….MN
62. Appendix A: Remarks about the marginal likelihood Remarks about the marginal likelihood: Predict the first observation
63. Appendix B: Linear Regression with conjugate priors To calculate the posterior distribution for parameters …we assume
64. Rearranging the expressions under the exponents we have the following: where Further denote … and rewrite
65. Therefore we have Normal posterior distribution for β: β|σ2 ̴ N(m*, σ2M*) And Invesrse Gamma posterior
66. Appendix C: How to Estimate a BVAR, Case 1 prior Use GLS estimator for the regression
67. Appendix C: How to Estimate a BVAR, Case 1 Prior Continue So, the moments for the
68. Appendix D: How to Estimate a BVAR: Conjugate Priors Note that in the case of the
69. Appendix E: Prior and Posterior distributions in Kadiyala and Karlsson (1997)
70. Appendix E: Posterior distributions of forecast for unemployment and industrial production in K&K (1997), h=4, T0
71. Appendix E: Posterior distribution of the unemployment rate forecast in K&K (1997)
73. Скачать презентацию

Слайд 2

Lecture Objectives
Introduce the idea of and rationale for Bayesian perspective

and Bayesian VARs
Understand the idea of prior distribution of parameters, Bayesian update and posterior distribution
Become familiar with prior distributions for VAR parameters, which allow for analytical representation of moments for posterior distribution of VAR parameters
Understand the idea and implementation of the DSGE-VAR approach

Слайд 3

Introduction: Two Perspectives in Econometrics
Let θ be a vector of parameters

to be estimated using data
For example, if yt～ i.i.d. N(μ,σ2), then θ=[μ,σ2] are to be estimated from a sample {yt}
Classical perspective:
there is an unknown true value for θ
we obtain a point estimator as a function of the data:
Bayesian perspective:
θ is an unknown random variable, for which we have initial uncertain beliefs - prior prob. distribution
we describe (changing) beliefs about θ in terms of probability distribution (not as a point estimator!)

Слайд 4

Outline
Why a Bayesian Approach to VARs?
Brief Introduction to Bayesian Econometrics
Analytical Examples
Estimating

a distribution mean
Linear Regression
Analytical priors and posteriors for BVARs
Prior selection in applications (incl. DSGE-VARs)

This training material is the property of the International Monetary Fund (IMF) and is intended for use in IMF’s Institute for Capacity development (ICD) courses. Any reuse requires the permission of ICD.

Слайд 5

Why a Bayesian Approach to VAR?
Dimensionality problem with VARs:
y contains

n variables, p lags in the VAR
The number of parameters in c and A is n(1+np), and the number of parameters in Σ is n(n+1)/2
Assume n=4, p=4, then we are estimating 78 parameters, with n=8, p=4, we have 133 parameters
A tension: better in-sample fit – worse forecasting performance
Sims (Econometrica, 1980) acknowledged the problem:
“Even with a small system like those here, forecasting, especially over relatively long horizons, would probably benefit substantially from use of Bayesian methods or other mean-square-error shrinking devices…”

Слайд 6

$Usually, only a fraction of estimated coefficients are statistically significant parsimonious$

Usually, only a fraction of estimated coefficients are statistically significant
parsimonious modeling

should be favored
What could we do?
Estimate a VAR with classical methods and use standard tests to exclude variables (i.e. reduce number of lags)
Use Bayesian approach to VAR which allows for:
interaction between variables
flexible specification of the likelihood of such interaction

Why a Bayesian Approach to VAR? (2)

Слайд 7

Combining information: prior and posterior
Bayesian coefficient estimates combine information in the

prior with evidence from the data
Bayesian estimation captures changes in beliefs about model parameters
Prior: initial beliefs (e.g., before we saw data)
Posterior: new beliefs = evidence from data + initial beliefs

Слайд 8

Shrinkage
There are many approaches to reducing over-parameterization in VARs
A common idea

is shrinkage
Incorporating prior information is a way of introducing shrinkage
The prior information can be reduced to a few parameters, i.e. hyperparameters

Слайд 9

Forecasting Performance of BVAR vs. alternatives
Source: Litterman, 1986
BVAR provides better forecast

of Real GNP and Inflation

Слайд 10

Introduction to Bayesian Econometrics: Objects of Interest
Objects of interest:
Prior distribution:
Likelihood function:

- likelihood of data at a given value of θ
Joint distribution (of unknown parameters and observables/data):
Marginal likelihood:
Posterior distribution:
i.e. what we learned about the parameters (1) having prior and (2) observing the data

Слайд 11

Bayesian Econometrics: Objects of Interest (2)
The marginal likelihood…
…is independent of

the parameters of the model
Therefore, we can write the posterior as proportional to prior and data:

We combine data & prior to get the posterior

Слайд 12

Bayesian Econometrics: maximizing criterion
For practical purposes, it is useful to focus

on the criterion:
Traditionally, priors that let us obtain analytical expressions for the posterior would be needed
Today, with increased computer power, we can use any prior and likelihood distribution, as long as we can evaluate them numerically
Then we can use Markov Chain Monte-Carlo (MCMC) methods to simulate the posterior distribution (not covered in this lecture)

Слайд 13

Bayesian Econometrics : maximizing criterion (2)
Maximizing C(θ) gives the Bayes mode.

In some cases (i.e. Normal distributions) this is also the mean and the median
The criterion can be generalized to:
λ controls relative importance of prior information vs. data

Слайд 14

Analytical Examples
Let’s work on some analytical examples:
Sample mean
Linear regression model

Слайд 15

Estimating a Sample Mean
Let yt～ i.i.d. N(μ,σ2), then the data density

function is:
where y={y1,…yT}
For now: assume variance σ2 is known (certain)
Assume the prior distribution of mean μ is normal, μ～ N(m,σ2/ν):
where the key parameters of the prior distribution are m and ν

Слайд 16

Estimating a Sample Mean
The posterior of μ:
…has the following analytical form

with
So, we “mix” prior m and the sample average (data)
Note:
The posterior distribution of μ is also normal: μ～ N(m*,σ2/{ν+T})
Diffuse prior: ν→0 (prior is not informative, everything is in data)
Tight prior: ν→ ∞ (data not important, prior is rather informative)

Слайд 17

Estimating a Sample Mean: Example
Assume the true distribution is Normal yt~N(3,1)
So,

μ=3 is known to… God
A researcher (one of us) does not know μ
for him/her it is a normally distributed random variable μ~N(m,1/v)
The researcher initially believes that m=1 and ν=1, so his/her prior is μ~N(1,1)

Слайд 18

Compute the posterior distribution as sample size increases
Posterior with prior N(1,1)
Already

after 10 draws we get closer to μ=3
After 50 and 100:
the mean of the distribution gets closer to 3
the dispersion is smaller

Слайд 19

Then, we look at more informative (tight) prior and set ν

=50 (higher precision)

Posterior with Prior N(1,1/50)

The picture is different here
After 10 and 50 draws we still are quite far from μ=3 … although we get closer
Why?...
Our prior was m=1, but this time it is tighter (v=50 instead of v=1)
i.e. harder to change based on observed data

Слайд 20

Examples: Regression Model I
Linear Regression model:
where ut～ i.i.d. N(0,σ2)
Assume:
β is

random and unknown
but σ2 is fixed and known
Convenient matrix representation
where
The density function for data is:

Слайд 21

Assume that the prior mean of β has multivariate Normal distribution

N(m,σ2M):
where the key parameters of the prior distribution are m and M
Bayesian rule states:
i.e., the posterior of β is proportional to the product of the data density of data and prior

Examples: Regression Model I (2)

Слайд 22

Examples: Regression Model I (3)
We mix information – densities of data

and prior – to get posterior distribution!
Result: the density function of β is…
… which means that the posterior distribution is again (!) normal
with the mean and variance

Слайд 23

Since we do not like black boxes… there are 2 ways

to get m* and M* (2 parameters to characterize posterior)
The long: manipulate the product of density functions (see Hamilton book, p367)
The smart: use GLS regression…
We have 2 ingredients:
prior distribution , which implies
and our regression model that “catches” the impact of the data on the estimate of β

Examples: Regression Model I (4)

β ~N(m,σ2M)

m= β+υβ

υβ ~N(0,σ2M)

Слайд 24

Define a “new” regression model
We simply stack our “ingredients” together to

mix the information (prior and data) so that now β takes into account both!
The GLS estimator of β… is exactly our posterior mean
And the posterior variance of β is

Regression Model: Posterior Distribution

Слайд 25

Examples: Regression Model II
So far the life was easy(-ier), in the

linear regression model
β was random and unknown, but σ2 was fixed and known
What if σ2 is random and unknown?..
Bayesian rule states:
i.e., the posterior of β and σ2 is proportional to the product of the density of data, prior of β (given σ2) and prior of σ2

Слайд 26

Examples: Regression Model II ()
To manipulate the product
…we assume the

following distributions:
Normal for data
Normal for the prior for β (conditional on σ2): β|σ2 ̴ N(m, σ2M)
and Inverse-Gamma for the prior for σ2 : σ2 ̴ IG(λ,l)
Note: inverse-gamma is handy! It guaranties that random draws σ2 >0!

Слайд 27

Examples: Regression Model II (3)
By manipulating the product (see more details

in the appendix B)
…we get the following result
with mean and variance of the posterior for β|σ2 ̴ N(m*, σ2M*)
And parameters for posterior for σ2 ̴ IG(λ*,l*)

Posterior normal density of β

Posterior gamma density of σ2

Слайд 28

Priors: summary
In the above examples we dealt with 2 types

of prior distributions of our parameters:
Case 1 prior
assumes β is unknown and normally distributed (Gaussian)
σ2 is a known parameter
the assumption Gaussian errors delivers posterior normal distribution for β
Case 2 (conjugate) priors
assumes β and σ2 are unknown
β and σ2 have prior normal and Inverse-Gamma distributions respectively
with Gaussian errors delivers posterior distributions for β and σ2 of the same family

Слайд 29

Bayesian VARs
Linear Regression examples will help us to deal with our

main object – Bayesian VARs
A VAR is typically written as
where yt contains n variables, the VAR includes p lags, and the data sample size is T
We have seen that it is convenient to work with a matrix representation for a regression
Can we get it for our VAR? Yes!
…and it will help to get posteriors for our parameters

Слайд 30

VAR in a matrix form: example
Consider, as an example, a VAR

for n variables and p=2
Stack the variables and coefficients
Then, the VAR
Let and rewrite
where is a Kroneker product

Слайд 31

How to Estimate a BVAR: Case 1 Prior
Consider Case 1 prior

for a VAR:
coefficients in A are unknown with multivariate Normal prior distribution:
and known Σe
“Old trick” to get the posterior: use GLS estimator (appendix C for details)
Result
So the posterior distribution is multivariate normal

Слайд 32

Before we see the case of an unknown Σe
need to introduce

a multivariate distribution to characterize the unknown random error covariance matrix Σe
Consider a matrix
Each raw is a draw form N(0,S)
The nxn matrix
has an Inverse Wishart distribution with k degrees of freedom: Σe~IWn(S,l)
If Σe ~ IWn(S,l), then Σe-1 follows a Wishart distribution: Σe-1~Wn(S-1,l)
Wishart distribution might be more convenient
Σe-1 is a measure of precision (since Σe is a measure of dispersion)

How to Estimate a BVAR: Case 2 (conjugate) Priors

Слайд 33

How to Estimate a BVAR: Conjugate Priors
Assume Conjugate priors:
The VAR parameters

A and Σe are both unknown
prior for A is multivariate Normal:
and for Σe is Inverse Wishart:
Follow the analogy with univariate regression examples to put down the moments for posterior distributions
Recall matrix representation for our VAR:
Posterior for A is multivariate normal:
Posterior for Σe is Inv. Wishart:
See appendix D for details

Слайд 34

BVARs: Minnesota Prior Implementation
The Minnesota prior – a particular case of

the “Case 1 prior” (unknown model coefficients, but known error variance):
Assume random walk is a reasonable model for every yit in the VAR
Hence, for every yit
coefficient for the first own lag yit-1 has a prior mean of 1
coefficients for all other lags yit-k , yjt-1 , yjt-k have 0 prior mean
So, our prior for coefficients of VAR(2) example would be:

Слайд 35

The Minnesota prior
The prior variance for the coefficient of lag k

in equation i for variable j is:
… and depends only on three hyperparameters:
the tightness parameter γ (typically the same in all equations)
and the relative weight parameter w: is 1 for own lags and <1 for other variables
parameter q governs the tightness of the prior depending on the lag (often set to 1)
is a “scale correction”
the ratio of residual variances for OLS-estimated AR:

BVARs: Minnesota Prior Implementation

Слайд 36

The Minnesota prior
Interpretation:
the prior on the first own lag is

the prior on the own lag k is
the prior std. dev. declines at a rate k, i.e. coefficients for longer lags are more likely to be close to 0
the prior on the first lag of another variable is
the prior std. dev. is reduced by a factor w: i.e. it is more likely that the first lags of other variables are irrelevant
the prior std. dev. on other variables’ lags
declines at a rate k

BVARs: Minnesota Prior Implementation

Слайд 37

BVARs: Minnesota Prior Implementation

Слайд 38

BVARs: Prior Selection
Minnesota and conjugate priors are useful (e.g., to obtain

closed-form solutions), but can be too restrictive:
Independence across equations
Symmetry in the prior can sometimes be a problem
Increased computer power allows to simulate more general prior distributions using numerical methods
Three examples:
DSGE-VAR approach: Del Negro and Schorfheide (IER, 2004)
Explore different prior distributions and hyperparameters: Kadiyala and Karlsson (1997)
Choosing the hyperparameters to maximize the marginal likelihood: Giannone, Lenza and Primiceri (2011)

Слайд 39

Del Negro and Schorfheide (2004): DSGE-VAR Approach
Del Negro and Schorfheide (2004)
We

want to estimate a BVAR model
We also have a DSGE model for the same variables
It can be solved and linearized: approximated with a RF VAR
Then, we can use coefficients from the DSGE-based VAR as prior means to estimate the BVAR
Several advantages:
DSGE-VAR may improve forecasts by restricting parameter values
At the same time, can improve empirical performance of DSGE relaxing its restrictions
Our priors (from DSGE) are based on deep structural parameters consistent with economic theory

Слайд 40

Del Negro and Schorfheide (2004)
We estimate the following BVAR:
The solution for

the DSGE model has a reduced-form VAR representation
where θ are deep structural parameters
Idea:
Combine artificial and T actual observations (Y,X) and to get the posterior distribution
T*=λT “artificial” observations are generated from the DSGE model: (Y*,X*)

DSGE

BVAR

Data

Слайд 41

Del Negro and Schorfheide (2004)
Parameter λ is a “weight” of “artificial”

(prior) data from DSGE
λ=0 delivers OLS-estimated VAR: i.e. DSGE not important
Large λ shrinks coefficients towards the DSGE solution: i.e. data not important
to find an “optimal” λ marginal likelihood is maximized (appendix E)
Can implement the procedure analytically… let’s see

Слайд 42

Likelihood of the VAR of a DSGE Model
Recall the likelihood function

for an unconstrained VAR
Similarly, the (Quasi-) likelihood for the “artificial” data:
which is a prior density for the BVAR parameters
Rewrite the likelihood for the “artificial” data (open brackets)

Sample moments

Слайд 43

Next step: we simulate s→∞ artificial observations (Y,X) from the DSGE
…and

replace sample moments like with population moments consistent with the DSGE model, e.g. :
The likelihood is then
where c(Θ) is chosen to ensure that the probability distribution integrates to one (proper prior)

Likelihood of the VAR of a DSGE Model

Population moments

Слайд 44

Conditional on the parameters θ, the DSGE m+odel provides a conjugate

priors for the BVAR
For the conjugate priors we can obtain posteriors for A ande Σe (conditional on θ) of the same distribution family

DSGE-VAR prior

Слайд 45

DSGE-VAR posterior
Posterior, conditional on θ :
where
Prior info, weighted by λT
Information from

Data

Слайд 46

BVARs (under different λ’s) have advantage in forecasting performance (RMSE) vis-à-vis

the unrestricted VAR
The “optimal” λ is about 0.6. It also delivers the best ex-post forecasting performance for 1 quarter horizon

Results

Слайд 47

BVAR with the DSGE prior under the “optimal” λ has better

forecasting performance than:
the unrestricted VAR for all variables
The BVAR with Minnesota Prior (ex. FF-rate at the shorter forecasting horizon)

Results

Слайд 48

Kadiyala and Karlsson (1997)
Small Model: a bivariate VAR with unemployment and

industrial production
Sample period: 1964:1 to 1990:4.
Estimate the model through 1978:4
Criterion to chose hyperparameters: forecasting performance over 1979:1-1982:3
Use the remaining sub-sample 1982:4-1990:4 for forecasting
Large “”Litterman” Model: a VAR with 7 variables (real GNP, inflation, unemployment, money, investment, interest rate and inventories)
Sample period: 1948:1 to 1986:4.
Estimate the model through 1980:1
Use the remaining sub-sample 1980:2-1986:4 for forecasting

Слайд 49

Kadiyala and Karlsson (1997)
Compare different priors based on the VAR forecasting

performance (RMSE)
Standard VAR(p)…
… can be rewritten (see slide 29):
… and
where

Слайд 50

Prior distributions in K&K
K&K use a number of competing prior distributions…

Minnesota, Normal-Wishart, Normal-Diffuse, Extended Natural Conjugate (see appendix E)
… for and
Parameters of the prior distribution for :
each yit is a random walk (just as in Minnesota priors above)
The variance of each coefficient depends on two hyper-parameters w,γ :

Слайд 51

Prior distributions in K&K
In the Small Model:
For prior distributions, hyper-parameters π1=

γ, π2=wγ are selected based on the forecast RMSEs over 1979:1-1982:3
(π1,π2) are fixed at the selected values and used in the forecasting exercise over 1982:4-1990:4

Слайд 52

Forecast Comparison in K&K: Small Model, unemployment
Forecasting performance is markedly

different for different priors
Normal-Wishart, Diffuse and OLS do well (RMSEs are twice lower than for other priors)

Слайд 53

Forecast Comparison in K&K: Large Model
OLS and Diffuse priors produce worst

forecasts in all cases
Normal-Wishart, Normal-Diffuse and Minnesota do better (RMSEs are substantially lower)
Lessons:
It does make sense to move from OLS-estimated (over-parametrized VAR) to BVAR in “Larger” model
Some prior distributions may lead to a dominant forecasting performance

In the Large Model: hyper-parameters are fixed like in Litterman (1986)

Слайд 54

Giannone, Lenza and Primiceri (2011)
Use three VARs to compare forecasting performance
Small

VAR: GDP, GDP deflator, Federal Funds rate for the U.S
Medium VAR: includes small VAR plus consumption, investment, hours worked and wages
Large VAR: expand the medium VAR with up to 22 variables
The prior distributions of the VAR parameters ϴ={α, Σα, Σe} depend on a small number of hyperparameters
The hyperparameters are themselves uncertain and follow either gamma or inverse gamma distributions
This is to the contrast of Minnesota priors where hyperparameters are fixed!

Слайд 55

Giannone, Lenza and Primiceri (2011)
The marginal likelihood is obtained by integrating

out the parameters of the model:
But the prior distribution of θ is itself a function of the hyperparameters of the model i.e. p(θ)=p (θ|γ)

Слайд 56

Giannone, Lenza and Primiceri (2011)
We interpret the model as a hierarchical

model by replacing pγ(θ)=p(θ|γ) and evaluate the marginal likelihood:
The hyperparameters γ are uncertain
Informativeness of their prior distribution is chosen via maximizing the posterior distribution
Maximizing the posterior of γ corresponds to maximizing the one-step ahead forecasting accuracy of the model

Слайд 57

Giannone, Lenza and Primiceri (2011)

Слайд 58

In all cases BVARs demonstrate better forecasting performance vis-à-vis the unrestricted

VARs
BVARs are roughly at par with the factor models, known to be good forecasting devices

Giannone, Lenza and Primiceri (2011)

Слайд 59

Conclusions
BVARs is a useful tool to improve forecasts
This is not a

“black box”
posterior distribution parameters are typically functions of prior parameters and data
Choice of priors can go:
from a simple Minnesota prior (that is convenient for analytical results)
…to a full-fledged DSGE model that incorporates theory-consistent structural parameters
The choice of hyperparameters for the prior depends on the nature of the time series we want to forecast
No “one size fits all approach”

Слайд 60

Thank You!

Слайд 61

Appendix A: Remarks about the marginal likelihood
Remarks about the marginal likelihood:
If

we have M1,….MN competing models, the marginal likelihood of model Mj, f({yt}|Mj) can be seen as:
The update on the weight of model Mj after observing the data
The out-of-sample prediction record of model j.
Model comparison between two models is performed with the posterior odds ratio:
Favor’s parsimonious modeling: in-built “Occam’s Razor.”

Слайд 62

Appendix A: Remarks about the marginal likelihood
Remarks about the marginal likelihood:
Predict

the first observation using the prior:
Record the first observable and its probability, f(y1o). Update your beliefs:
Predict the second observation:
Record f(y2o|y1o).
Eventually, you get f({yo})=f(y1o) f(y2o|y1o)…..f(yTo|y1o, y2o,…, yT-1o).

Слайд 63

Appendix B: Linear Regression with conjugate priors
To calculate the posterior distribution

for parameters
…we assume the following for distributions:
Normal for data
Normal for the prior for β (conditional on σ2): β|σ2 ̴ N(m, σ2M)
and Inverse-gamma for the prior for σ2 : σ2 ̴ IΓ(λ,k)
Next consider the product

Слайд 64

Rearranging the expressions under the exponents we have the following:
where

Further denote
… and rewrite the

Appendix B: Linear Regression with conjugate priors

Слайд 65

Therefore we have Normal posterior distribution for β:
β|σ2 ̴ N(m*,

σ2M*)
And Invesrse Gamma posterior for the error covariance matrix
σ2 ̴ IG(λ*,k*)

Appendix B: Linear Regression with conjugate priors

Слайд 66

Appendix C: How to Estimate a BVAR, Case 1 prior
Use GLS

estimator for the regression
Continue (next slide)

Слайд 67

Appendix C: How to Estimate a BVAR, Case 1 Prior
Continue
So, the

moments for the posterior distribution are:
The posterior distribution is then multivariate normal

Слайд 68

Appendix D: How to Estimate a BVAR: Conjugate Priors
Note that in

the case of the Conjugate priors we rely on the following VAR representation
… while in the Minnesota priors case we employed
Though, if we have priors for vectorized coefficients in the form
we can also get priors for coefficients in the matrix form
For the mean we simply need to convert α back to the matrix form A
The variance matrix for can be obtained from the variance for :

Слайд 69

Appendix E: Prior and Posterior distributions in Kadiyala and Karlsson (1997)

Слайд 70

Appendix E: Posterior distributions of forecast for unemployment and industrial production

in K&K (1997), h=4, T0 =1985:4

Слайд 71

Appendix E: Posterior distribution of the unemployment rate forecast in K&K

(1997)

Forecasting with bayesian techniques MP

Содержание

Lecture Objectives Introduce the idea of and rationale for Bayesian perspective

Introduction: Two Perspectives in EconometricsLet θ be a vector of parameters

OutlineWhy a Bayesian Approach to VARs?Brief Introduction to Bayesian EconometricsAnalytical ExamplesEstimating

Why a Bayesian Approach to VAR? Dimensionality problem with VARs: y contains

Usually, only a fraction of estimated coefficients are statistically significantparsimonious modeling

Combining information: prior and posteriorBayesian coefficient estimates combine information in the

ShrinkageThere are many approaches to reducing over-parameterization in VARsA common idea

Forecasting Performance of BVAR vs. alternativesSource: Litterman, 1986BVAR provides better forecast

Introduction to Bayesian Econometrics: Objects of InterestObjects of interest:Prior distribution:Likelihood function:

Bayesian Econometrics: Objects of Interest (2)The marginal likelihood… …is independent of

Bayesian Econometrics: maximizing criterionFor practical purposes, it is useful to focus

Bayesian Econometrics : maximizing criterion (2)Maximizing C(θ) gives the Bayes mode.

Analytical ExamplesLet’s work on some analytical examples:Sample meanLinear regression model

Estimating a Sample MeanLet yt～ i.i.d. N(μ,σ2), then the data density

Estimating a Sample MeanThe posterior of μ:…has the following analytical form

Estimating a Sample Mean: ExampleAssume the true distribution is Normal yt~N(3,1)So,

Compute the posterior distribution as sample size increasesPosterior with prior N(1,1)Already

Then, we look at more informative (tight) prior and set ν

Examples: Regression Model ILinear Regression model: where ut～ i.i.d. N(0,σ2) Assume:β is

Assume that the prior mean of β has multivariate Normal distribution

Examples: Regression Model I (3)We mix information – densities of data

Since we do not like black boxes… there are 2 ways

Define a “new” regression modelWe simply stack our “ingredients” together to

Examples: Regression Model IISo far the life was easy(-ier), in the

Examples: Regression Model II ()To manipulate the product …we assume the

Examples: Regression Model II (3)By manipulating the product (see more details

Priors: summary In the above examples we dealt with 2 types

Bayesian VARsLinear Regression examples will help us to deal with our

VAR in a matrix form: exampleConsider, as an example, a VAR

How to Estimate a BVAR: Case 1 PriorConsider Case 1 prior

Before we see the case of an unknown Σeneed to introduce

How to Estimate a BVAR: Conjugate PriorsAssume Conjugate priors:The VAR parameters

BVARs: Minnesota Prior ImplementationThe Minnesota prior – a particular case of

The Minnesota priorThe prior variance for the coefficient of lag k

The Minnesota priorInterpretation: the prior on the first own lag is

BVARs: Minnesota Prior Implementation

BVARs: Prior SelectionMinnesota and conjugate priors are useful (e.g., to obtain

Del Negro and Schorfheide (2004): DSGE-VAR ApproachDel Negro and Schorfheide (2004)We

Del Negro and Schorfheide (2004)We estimate the following BVAR:The solution for

Del Negro and Schorfheide (2004)Parameter λ is a “weight” of “artificial”

Likelihood of the VAR of a DSGE ModelRecall the likelihood function

Next step: we simulate s→∞ artificial observations (Y*,X*) from the DSGE…and

Conditional on the parameters θ, the DSGE m+odel provides a conjugate

DSGE-VAR posteriorPosterior, conditional on θ :wherePrior info, weighted by λTInformation from

BVARs (under different λ’s) have advantage in forecasting performance (RMSE) vis-à-vis

BVAR with the DSGE prior under the “optimal” λ has better

Kadiyala and Karlsson (1997)Small Model: a bivariate VAR with unemployment and

Kadiyala and Karlsson (1997)Compare different priors based on the VAR forecasting

Prior distributions in K&KK&K use a number of competing prior distributions…

Prior distributions in K&KIn the Small Model:For prior distributions, hyper-parameters π1=

Forecast Comparison in K&K: Small Model, unemployment Forecasting performance is markedly

Forecast Comparison in K&K: Large ModelOLS and Diffuse priors produce worst

Giannone, Lenza and Primiceri (2011)Use three VARs to compare forecasting performanceSmall

Giannone, Lenza and Primiceri (2011)The marginal likelihood is obtained by integrating

Giannone, Lenza and Primiceri (2011)We interpret the model as a hierarchical

Giannone, Lenza and Primiceri (2011)

In all cases BVARs demonstrate better forecasting performance vis-à-vis the unrestricted

ConclusionsBVARs is a useful tool to improve forecastsThis is not a

Thank You!

Appendix A: Remarks about the marginal likelihoodRemarks about the marginal likelihood:If

Appendix A: Remarks about the marginal likelihoodRemarks about the marginal likelihood:Predict

Appendix B: Linear Regression with conjugate priorsTo calculate the posterior distribution

Rearranging the expressions under the exponents we have the following:where

Therefore we have Normal posterior distribution for β: β|σ2 ̴ N(m*,

Appendix C: How to Estimate a BVAR, Case 1 priorUse GLS

Appendix C: How to Estimate a BVAR, Case 1 PriorContinueSo, the

Appendix D: How to Estimate a BVAR: Conjugate PriorsNote that in

Appendix E: Prior and Posterior distributions in Kadiyala and Karlsson (1997)

Appendix E: Posterior distributions of forecast for unemployment and industrial production

Appendix E: Posterior distribution of the unemployment rate forecast in K&K

Похожие презентации

Lecture Objectives
Introduce the idea of and rationale for Bayesian perspective

Introduction: Two Perspectives in Econometrics
Let θ be a vector of parameters

Outline
Why a Bayesian Approach to VARs?
Brief Introduction to Bayesian Econometrics
Analytical Examples
Estimating

Why a Bayesian Approach to VAR?
Dimensionality problem with VARs:
y contains

Usually, only a fraction of estimated coefficients are statistically significant
parsimonious modeling

Combining information: prior and posterior
Bayesian coefficient estimates combine information in the

Shrinkage
There are many approaches to reducing over-parameterization in VARs
A common idea

Forecasting Performance of BVAR vs. alternatives
Source: Litterman, 1986
BVAR provides better forecast

Introduction to Bayesian Econometrics: Objects of Interest
Objects of interest:
Prior distribution:
Likelihood function:

Bayesian Econometrics: Objects of Interest (2)
The marginal likelihood…
…is independent of

Bayesian Econometrics: maximizing criterion
For practical purposes, it is useful to focus

Bayesian Econometrics : maximizing criterion (2)
Maximizing C(θ) gives the Bayes mode.

Analytical Examples
Let’s work on some analytical examples:
Sample mean
Linear regression model

Estimating a Sample Mean
Let yt～ i.i.d. N(μ,σ2), then the data density

Estimating a Sample Mean
The posterior of μ:
…has the following analytical form

Estimating a Sample Mean: Example
Assume the true distribution is Normal yt~N(3,1)
So,

Compute the posterior distribution as sample size increases
Posterior with prior N(1,1)
Already

Examples: Regression Model I
Linear Regression model:
where ut～ i.i.d. N(0,σ2)
Assume:
β is

Examples: Regression Model I (3)
We mix information – densities of data

Define a “new” regression model
We simply stack our “ingredients” together to

Examples: Regression Model II
So far the life was easy(-ier), in the

Examples: Regression Model II ()
To manipulate the product
…we assume the

Examples: Regression Model II (3)
By manipulating the product (see more details

Priors: summary
In the above examples we dealt with 2 types

Bayesian VARs
Linear Regression examples will help us to deal with our

VAR in a matrix form: example
Consider, as an example, a VAR

How to Estimate a BVAR: Case 1 Prior
Consider Case 1 prior

Before we see the case of an unknown Σe
need to introduce

How to Estimate a BVAR: Conjugate Priors
Assume Conjugate priors:
The VAR parameters

BVARs: Minnesota Prior Implementation
The Minnesota prior – a particular case of

The Minnesota prior
The prior variance for the coefficient of lag k

The Minnesota prior
Interpretation:
the prior on the first own lag is

BVARs: Prior Selection
Minnesota and conjugate priors are useful (e.g., to obtain

Del Negro and Schorfheide (2004): DSGE-VAR Approach
Del Negro and Schorfheide (2004)
We

Del Negro and Schorfheide (2004)
We estimate the following BVAR:
The solution for

Del Negro and Schorfheide (2004)
Parameter λ is a “weight” of “artificial”

Likelihood of the VAR of a DSGE Model
Recall the likelihood function

Next step: we simulate s→∞ artificial observations (Y,X) from the DSGE
…and

DSGE-VAR posterior
Posterior, conditional on θ :
where
Prior info, weighted by λT
Information from

Kadiyala and Karlsson (1997)
Small Model: a bivariate VAR with unemployment and

Kadiyala and Karlsson (1997)
Compare different priors based on the VAR forecasting

Prior distributions in K&K
K&K use a number of competing prior distributions…

Prior distributions in K&K
In the Small Model:
For prior distributions, hyper-parameters π1=

Forecast Comparison in K&K: Small Model, unemployment
Forecasting performance is markedly

Forecast Comparison in K&K: Large Model
OLS and Diffuse priors produce worst

Giannone, Lenza and Primiceri (2011)
Use three VARs to compare forecasting performance
Small

Giannone, Lenza and Primiceri (2011)
The marginal likelihood is obtained by integrating

Giannone, Lenza and Primiceri (2011)
We interpret the model as a hierarchical

Conclusions
BVARs is a useful tool to improve forecasts
This is not a

Appendix A: Remarks about the marginal likelihood
Remarks about the marginal likelihood:
If

Appendix A: Remarks about the marginal likelihood
Remarks about the marginal likelihood:
Predict

Appendix B: Linear Regression with conjugate priors
To calculate the posterior distribution

Rearranging the expressions under the exponents we have the following:
where

Therefore we have Normal posterior distribution for β:
β|σ2 ̴ N(m*,

Appendix C: How to Estimate a BVAR, Case 1 prior
Use GLS

Appendix C: How to Estimate a BVAR, Case 1 Prior
Continue
So, the

Appendix D: How to Estimate a BVAR: Conjugate Priors
Note that in