Next Article in Journal
Thermodynamic Analysis of a Waste Heat Driven Vuilleumier Cycle Heat Pump
Previous Article in Journal
A Comparison of Nonlinear Measures for the Detection of Cardiac Autonomic Neuropathy from Heart Rate Variability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approximated Information Analysis in Bayesian Inference

1
Department of Statistics, Yeungnam University, Gyeongsan 712-749, Korea
2
Department of Statistics, Kyungpook National University, Daegu 702-701, Korea
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(3), 1441-1451; https://0-doi-org.brum.beds.ac.uk/10.3390/e17031441
Submission received: 30 December 2014 / Revised: 15 March 2015 / Accepted: 19 March 2015 / Published: 20 March 2015
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
In models with nuisance parameters, Bayesian procedures based on Markov Chain Monte Carlo (MCMC) methods have been developed to approximate the posterior distribution of the parameter of interest. Because these procedures require burdensome computations related to the use of MCMC, approximation and convergence in these procedures are important issues. In this paper, we explore Gibbs sensitivity by using an alternative to the full conditional distribution of the nuisance parameter. The approximate sensitivity of the posterior distribution of interest is studied in terms of an information measure, including Kullback–Leibler divergence. As an illustration, we then apply these results to simple spatial model settings.

1. Introduction

Let d denote the data, which can be scalar- or vector-valued, and suppose that d ~ p(d|θ, β), where θ ∈ Θ is the parameter of interest and βB is a nuisance parameter. Realizations from the joint posterior distribution π(θ, β|d) can be produced by independent sampling based on π(θ, β|d) = π(θ|β, d) × π(β|d) or, if π(β|d) is intractable, by Gibbs sampling based on the full conditional distributions π(θ|β, d) and π(β|θ, d). Since β is a nuisance parameter, our primary interest is in the marginal posterior distribution π(θ|d) = ∫ π (θ, β|d)dβ. In general, it may often be feasible to integrate out some nuisance parameters either analytically or numerically Missing data problems are brought into this framework by augmenting the observed data d with latent data β.
The latent variable/nuisance parameter scenario is a commonly studied one in the literature. One issue in nuisance parameter problems is the relationship between π(β|d) and π(θ|d). Our main interest lies in the impacts on the sensitivity of inferences based on the target marginal posterior distribution π(θ|d) compared with an approximation based on choosing π*(β|d), an alternative to the posterior distribution of the nuisance parameter, instead of π(β|d). The point is that π*(β|d) is a flexible and manageable approximation to an unmanageable π(β|d). For application to simple spatial model settings, we can consider the Gaussian approximation or the Laplace approximation to π(β|d). For example, we can use
π ( β | d ) π ( θ , β | d ) π ˜ G ( θ | β , d ) | θ = θ ^ ( β ) ,
where π ˜ G ( θ | β , d ) is the Gaussian approximation to π(θ|β, d). Under “standard conditions,” the Laplace approximation of a marginal posterior density has the error rate O(n−1) [1].
State-of-the-art Markov Chain Monte Carlo (MCMC) approaches to posterior inference typically revolve around reparameterizations. Yu and Meng introduce an alternative strategy for boosting MCMC efficiency by simply interweaving—but not alternating—two parameterizations, namely the centered parameterization and the non-centered parameterization, to ensure effective MCMC implementation [2]. Filippone and Girolamipresent a pseudo-marginal MCMC approach to account for uncertainty in the model parameters when making model-based predictions on out-of-sample data [3]. Attias presents the Variational Bayes framework, which provides a solution for the structure of models with latent variables [4]. Here, Kullback–Leibler divergence is minimized between the posterior and a typically exponential family approximation [4]. Expectation propagation is similar in nature [5]. The integrated nested Laplace approximation is considered for the approximate Bayesian inference in latent Gaussian models [6].
In this paper, we explore Gibbs sensitivity by using an alternative to the full conditional distribution of the nuisance parameter. The approximate sensitivity of the posterior distribution of interest is studied in terms of an information measure including Kullback–Leibler divergence. As an illustration, we apply the proposed approach to PRUDENCE (Prediction of Regional scenarios and Uncertainties for Defining EuropeaN Climate change risks and Effects; http://prudence.dmi.dk/) ensemble of regional climate models over Central Europe (about 8000 grid points), which involves the analysis of large quantities of data. Furrer and Sain combine two techniques, namely tapering and backfitting, to model and analyze these spatial datasets [7]. Kim and Kimpropose an approximate likelihood function of the spatial correlation parameter based on PRUDENCE data [8]. This paper thus provides some information analysis for their approaches.
The rest of the paper is organized as follows. In Section 2, we describe the approximation setting in the Bayesian computation and discuss some of its sensitivity issues. Various theoretical results on the information analysis are provided in the subsequent section. As an illustration, the results from Section 3 are applied to simple spatial model settings in Section 4, and Section 5 concludes the study.

2. Sensitivity Issues

Let π(θ|β, d) and π(β|θ, d) be the full conditional distributions for the joint posterior distribution π(θ, β|d), which can be expressed in terms of these full conditional distributions. The consistency conditions on the full conditional distributions are required to reconstruct the joint posterior distribution (e.g., see the Hammersley-Clifford Theorem in [9]). Let π*(β|θ, d) be an approximated full conditional distribution of π(β|θ, d). Under the regulation condition with reference points β0 and θ0, the joint posterior distributions can be written in the form
π ( θ , β | d ) π ( β | θ = θ 0 , d ) π ( β = β 0 | θ = θ 0 , d ) π ( θ | β , d ) π ( θ = θ 0 | β , d ) g ( β ) π ( θ | β , d )
and
π * ( θ , β | d ) π * ( β | θ = θ 0 , d ) π * ( β = β 0 | θ = θ 0 , d ) π ( θ | β , d ) π ( θ = θ 0 | β , d ) g * ( β ) π ( θ | β , d ) ,
where
g ( β ) = π ( β | θ = θ 0 , d ) π ( θ = θ 0 | β , d ) and g * ( β ) = π * ( β | θ = θ 0 , d ) π ( θ = θ 0 | β , d )
Therefore,
π ( β | d ) = C g ( β ) and π * ( β | d ) = C * g * ( β ) ,
where C and C* are normalizing constants, that is,
C = [ g ( β ) d β ] 1 and C * = [ g * ( β ) d β ] 1 .
Note that C and C* can be explicated by
C = π ( β = β 0 , θ = θ 0 | d ) π ( β = β 0 | θ = θ 0 , d ) and C * = π * ( β = β 0 , θ = θ 0 | d ) π * ( β = β 0 | θ = θ 0 , d ) ,
where π(β = β0, θ = θ0|d) and π*(β = β0, θ = θ0|d) can be obtained from a Monte Carlo computation.
Using an approximating density π*(β|d) instead of the true density π(β|d) leads to an approximate joint posterior distribution, π*(θ, β|d) = π(θ|β, d)π*(β|d). Define a simple pointwise difference measure between the corresponding marginal posterior densities π(θ|d) and π*(θ|d) as
d θ = | π ( θ | d ) π * ( θ | d ) | ,
where π(θ|d) = ∫ π(θ, β|d) and π*(θ|d) = ∫ π*(θ, β|d)dβ.
Alternatively, the bounds for the (log) ratio of π(θ|d) and π*(θ|d) can be obtained in terms of the full conditional distribution of the nuisance parameter π(β|θ, d) (see [10]).
Theorem 1. Suppose that for some set B0B and α ϵ ℜ, 1 π ( β | d ) π * ( β | d ) 1 + α for all βB0. Then, for all θ ∈ Θ,
P * ( β B 0 | θ , d ) π ( θ | d ) π * ( θ | d ) 1 + α P ( β B 0 | θ , d ) ,
where P and P* are the probability measures for the conditional distributions of β given θ and d, based on π(β|d) and π*(β|d), respectively.
Proof. Since
π ( θ | d ) B 0 π ( θ | β , d ) π ( β | d ) d β B 0 π ( θ | β , d ) π * ( β | d ) d β = π * ( θ | d ) P * ( β B 0 | θ , d ) ,
and
π * ( θ | d ) B 0 π ( θ | β , d ) π * ( β | d ) d β 1 1 + α B 0 π ( θ | β , d ) π ( β | d ) d β = P ( β B 0 | θ , d ) 1 + α π ( θ | d ) ,
thus,
P * ( β B 0 | θ , d ) π ( θ | d ) π * ( θ | d ) and π ( θ | d ) π * ( θ | d ) 1 + α P ( β B 0 | θ , d )
Suppose that there exists γ ϵ ℜ such that π ( β | d ) π * ( β | d ) γ for all βB. Then, for all βB,
P ( β B \ B 0 | θ , d ) P ( β B 0 | θ , d ) γ P * ( β B \ B 0 | θ , d ) P * ( β B 0 | θ , d )
and
π ( β | θ , d ) π * ( β | θ , d ) π ( β | θ , d ) π * ( β | d ) P * ( β B 0 | θ , d )
This result can be extended to the Kullback–Leibler divergence of π*(θ|d) from π(θ|d).
In practice, the marginal posterior distribution of the parameter of interest is hard to calculate analytically. Suppose π(θ|β, d) is a smooth (positive) function of β. By using Laplace’s method, π(θ|d) is approximated as
π ( θ | d ) = π ( θ | β , d ) exp [ n h ( β ) ] d β , where h ( β ) = 1 n log π ( β | d ) .
Assume β ^ maximizes logπ(β|d), and define = ( 2 β 2 h ( β ) ) β = β ^ 1. Then, π(θ|d) can be well approximated by Laplace’s method. That is,
π ( θ | d ) π ( θ | β ^ , d ) π ( β ^ | d ) ( 2 π n ) m / 2 | | 1 / 2 ( 1 + 2 β 2 π ( θ | β , d ) 2 π ( θ | β , d ) n ) β = β ^ ,
where m is the dimension of β. More generally, we have
π ( θ | d ) = π ^ ( θ | d ) [ 1 + O θ ( 1 n ) ] , where π ^ ( θ | d ) = π ( θ | β ^ , d ) π ( β ^ | d ) ( 2 π n ) m / 2 | | 1 / 2 .
Laplace’s method requires three conditions, referred to as Laplace regularity: (1) the integrals in the equation must exist and be finite; (2) the determinant of the Hessians must be bounded away from zero at the optimizers; and (3) the log-likelihood must be differentiable on the parameters and all the partial derivatives be bounded in the neighborhood of the optimizers. These conditions imply, under mild assumptions, the asymptotic normality of the posterior.
Based on the above results, the bounds for the differences between the marginal posterior distributions can be approximated. Suppose that |π(β|d) − π*(β|d)| has a unique maximum at β ^. Then,
| π ( θ | d ) π * ( θ | d ) | d θ ( 2 π n ) m / 2 | ˜ | 1 / 2 | π ( β ^ | d ) π * ( β ^ | d ) | [ 1 + O ( 1 n ) ] ,
where O(n−1) does not depend on θ.

3. Information Analysis

3.1. Approximation to Kullback–Leibler Divergence

Kullback–Leibler divergence, or relative entropy, is a quantity that measures the difference between two probability distributions [11]. The Kullback–Leibler divergence between π(θ, β|d) and π*(θ, β|d) is the same as that between π(β|d) and π*(β|d). Theorem 1 provides the upper bound for the Kullback–Leibler divergence between π(θ|d) and π*(θ|d), denoted by I(π(θ|d), π*(θ|d)):
I ( π ( θ | d ) , π * ( θ | d ) ) log ( 1 + α ) E θ | d log P ( β B 0 | θ , d ) .
For sufficiently large P(βB0|θ, d) and small α,
log ( 1 + α ) E θ | d log P ( β B 0 | θ , d ) β B 0 π ( β | θ , d ) π ( θ | d ) d θ .
Thus, the upper bound can be approximated by 1 P ( β B 0 | d ) = P ( β B 0 | d ). However, the analytic calculation of the Kullback–Leibler divergence between π(θ|d) and π*(θ|d) is usually difficult. Here, we introduce an approximation to Kullback–Leibler divergence based on Laplace’s method along with its convergence properties.
Theorem 2. Suppose that β ^ and β ^ * maximize π(β|d) and π*(β|d), respectively. Then, we have
I ( π , π * ) I ( π ( θ | d ) , π * ( θ | d ) ) = I ^ 1 ( π , π * ) [ 1 + O ( 1 n ) ] ,
where
I ^ 1 ( π , π * ) = ( 2 π n ) m / 2 | | 1 / 2 π ( θ | β ^ , d ) π ( β ^ | d ) log | | 1 / 2 π ( θ | β ^ , d ) π ( β ^ | d ) | * | 1 / 2 π ( θ | β ^ * , d ) π * ( β ^ * | d ) d θ
and
= 2 β 2 1 n log π ( β | d ) | β = β ^ a n d * = 2 β 2 1 n log π * ( β | d ) | β = β ^ * .
It is noted that I ^ 1 ( π , π * ) 0 when |Σ| ≥ |Σ*|. Furthermore, if β ^ = β ^ * and |Σ|1/2 = |Σ*|1/2, then
I ^ 1 ( π , π * ) = ( 2 π n ) m \ 2 | | 1 \ 2 π ( β ^ | d ) log π ( β ^ | d ) π * ( β ^ | d ) 0.
Proof. Since
I ( π , π * ) π ( θ | d ) log π ( θ | d ) π * ( θ | d ) d θ = π ^ ( θ | d ) [ 1 + O θ ( 1 n ) ] log π ^ ( θ | d ) [ 1 + O θ ( 1 n ) ] π ^ * ( θ | d ) [ 1 + O θ ( 1 n ) ] d θ = π ^ ( θ | d ) log π ^ ( θ | d ) π ^ * ( θ | d ) [ 1 + O θ ( 1 n ) ] d θ = π ^ ( θ | d ) log π ^ ( θ | d ) π ^ * ( θ | d ) d θ [ 1 + O ( 1 n ) ]
and
π ^ ( θ | d ) log π ^ ( θ | d ) π ^ * ( θ | d ) d θ = π ^ ( θ | β ^ , d ) π ( β ^ | d ) ( 2 π n ) m / 2 | | 1 / 2 × log π ( θ | β ^ , d ) π ( β ^ | d ) ( 2 π n ) m / 2 | | 1 / 2 π ( θ | β ^ * , d ) π * ( β ^ * | d ) ( 2 π n ) m / 2 | * | 1 / 2 d θ ,
where
= ( 2 β 2 1 n log π ( β | d ) ) β = β ^ 1 and * = ( 2 β 2 1 n log π * ( β | d ) ) β = β ^ * 1 ,
we have
I ( π , π * ) = I ^ 1 ( π , π * ) [ 1 + O ( 1 n ) ] ,
where
I ^ 1 ( π , π * ) = ( 2 π n ) m / 2 | | 1 / 2 π ( θ | β ^ , d ) π ( β ^ | d ) log | | 1 / 2 π ( β ^ | d ) | * | 1 / 2 π * ( β ^ | d ) d θ = ( 2 π n ) m / 2 | | 1 / 2 π ( β ^ | d ) log | | 1 / 2 π ( β ^ | d ) | * | 1 / 2 π * ( β ^ | d ) .
Under the weak conditions allowing the exchange of integral and limit, it can be shown that I ^ 1 ( π , π * ) converges to I(π, π*). Suppose that both π(β|d) and π*(β|d) are maximized at β ^, a posterior mode. Then,
| I ^ 1 ( π , π * ) I ( π , π * ) | 0 as n .
Since sup θ | π ^ ( θ | d ) π ( θ | d ) | 0 and sup θ | π ^ * ( θ | d ) π * ( θ | d ) | 0 as n → ∞, continuity implies that
| π ^ ( θ | d ) log π ^ ( θ | d ) π ^ * ( θ | d ) d θ π ( θ | d ) log π ( θ | d ) π * ( θ | d ) d θ | | π ^ ( θ | d ) log π ^ ( θ | d ) π ^ * ( θ | d ) ( θ | d ) log π ( θ | d ) π * ( θ | d ) | d θ 0 as n .
Therefore,
| ( 2 π n ) m / 2 | | 1 / 2 π ( β ^ | d ) log | | 1 / 2 π ( β ^ | d ) | * | 1 / 2 π * ( β ^ | d ) I ( π , π * ) | 0 as n .
Note that the first-order approximation I ^ 1 ( π , π * ) depends only on the marginal posterior distribution of the nuisance parameters π(β|d) and π*(β|d) but not the full conditional distribution of the parameter of interest π(θ|β, d). Based on the asymptotic properties of the posterior distributions (which can be achieved easily under fairly general conditions when the true value of the parameter is in the support of the prior), a Gaussian distribution with mean β ^, the generalized MLE of β, and variance ( 2 β 2 log π ( β | d ) ) β = β ^ 1 can be considered as π*(β|d). That is,
π * ( β | d ) exp ( 1 2 ( β β ^ ) H β ( β β ^ ) ) , where H β 2 β 2 log π ( β | d ) .
For a sufficiently large n, π*(β|d) is maximized at β ^, the posterior mode of π(β|d), and 2 β 2 log π ( β | d ) | β = β ^ 2 β 2 log π * ( β | d ) | β = β ^ .

3.2. Higher-Order Approximations to Kullback–Leibler Divergence

To improve accuracy, second-order approximations can be considered in Laplace’s method [1]. Let β ^ 1, β ^ * 1, 1, and 1 * be defined as above in the case of the first-order approximations. Let β ^ 2 and β ^ * 2 maximize log[π(θ|β, d)π(β|d)] and log[π(θ|β, d)π* (β|d)], respectively While the second-order approximations to π(θ|d) and π(θ|d)* are more accurate, the second-order approximation to I(π,π*) is quite difficult to calculate since both β ^ 2 and β ^ * 2 are functions of θ, that is, β ^ 2 = β ^ 2 ( θ ) and β ^ * 2 = β ^ * 2 ( θ ).
Assume that both β ^ and β ^ * maximize π(θ|β, d)π(β|d) and π(θ|β, d)π*(β|d), respectively, regardless of θ. Then, it can be shown that
I ( π , π * ) = I ^ 2 ( π , π * ) [ 1 + O ( 1 n 2 ) ]
Or
| I ^ 2 ( π , π * ) I ( π , π * ) | 0 as n ,
where
I ^ 2 ( π , π * ) | 2 | 1 / 2 | 1 | 1 / 2 π ( θ | β ^ 2 , d ) π ( β ^ 2 | d ) π ( β ^ 1 | d ) log | 2 | 1 / 2 | 1 | 1 / 2 π ( θ | β ^ 2 , d ) π ( β ^ 2 | d ) π ( β ^ 1 | d ) | 2 * | 1 / 2 | 1 * | 1 / 2 π ( θ | β ^ 2 * , d ) π * ( β ^ 2 * | d ) π * ( β ^ 1 * | d ) d θ , 2 = ( 2 β 2 1 n log [ π ( θ | β , d ) π ( β | d ) ] ) β = β ^ 2 1 ,
and
2 * = ( 2 β 2 1 n log [ π ( θ | β , d ) π * ( β | d ) ] ) β = β ^ 2 * 1 .
If both log[π(θ|β, d)π(β|d)] and log[π(θ|β, d)π*(β|d)] are maximized at β ^ 2, then the second-order approximation to I (π, π*) can be simplified to
I ^ 2 ( π , π * ) = | 2 | 1 / 2 | 1 | 1 / 2 π ( β ^ 2 | d ) π ( β ^ 1 | d ) log | 2 | 1 / 2 | 1 | 1 / 2 π ( β ^ 2 | d ) π ( β ^ 1 | d ) | 2 * | 1 / 2 | 1 * | 1 / 2 π * ( β ^ 2 * | d ) π * ( β ^ 1 * | d ) .
I ^ 2 ( π , π * ) 0 when | 2 | | 1 | | 2 * | | 1 * | and π ( β ^ 1 | d ) π * ( β ^ 1 * | d ).

3.3. Approximation to Other Information Measures

Instead of Kullback–Leibler divergence, other useful information measures based on uncertainty functions or the entropy function can be used. One information measure based on uncertainty functions [12] is
I = E V a r ( β | d ) E V a r * ( β | d ) or E log V a r ( β | d ) V a r * ( β | d ) .
Another information measure is based on Renyi’s entropy function [13]:
I α ( π ( θ | d ) , π * ( θ | d ) ) = 1 1 α log [ π ( θ | d ) ] α [ π * ( θ | d ) ] 1 α d θ .
Theorem 3. Suppose that β ^ maximizes both π(β|d) and π*(β|d). Then,
I α ( π ( θ | d ) , π * ( θ | d ) ) = 1 1 α log J ( 1 + O ( 1 n ) ) ,
where
J ( 2 π n ) m / 2 | | 1 / 2 [ π ( β ^ | d ) ] α [ π * ( β ^ | d ] 1 α .
Proof. Let
δ α ( π ( θ | d ) , π * ( θ | d ) ) = exp [ ( 1 α ) I α ( ( π ( θ | d ) , π * ( θ | d ) ) ] .
Then, it suffices to show that
δ α ( π ( θ | d ) , π * ( θ | d ) ) = [ π ( θ | d ) ] α [ π * ( θ | d ) ] 1 α d θ = [ π ( θ | β , d ) π ( β | d ) d β ] α [ π ( θ | β , d ) π * ( β | d ) d β ] 1 α d θ = π ( θ | β ^ , d ) ( 2 π n ) m / 2 | | 1 / 2 ( 1 + O θ ( 1 n ) ) [ π ( β ^ | d ) ] α [ π * ( β ^ | d ) ] 1 α d θ , = ( 2 π n ) m / 2 | | 1 / 2 [ π ( β ^ | d ) ] α [ π * ( β ^ | d ) ] 1 α ( 1 + O ( 1 n ) ) .
Under similar conditions to those in Theorem 2, it can be shown that
| 1 1 α log J I a ( π ( θ | d ) , π ( θ | d ) ) | 0 as n .

4. Illustrative Example

We consider an approach to approximating the likelihood function of the spatial correlation parameters in the Gaussian random field. Consider the simple Gaussian random field Z ~ MVN(0, Σ(θ)), where Σ(θ) is parameterized by a variance term and a correlation function. That is, Σ(θ) = σ2R(θ). Then, the likelihood function of (σ2, θ) is
L ( σ 2 , θ ) | σ 2 R ( θ ) | 1 / 2 exp ( 1 2 σ 2 Z R 1 ( θ ) Z ) .
Note that in the problem with a large spatial domain, it is not computationally feasible to compute the likelihood function of the spatial correlation parameters because of R−1.
The proposed approaches are also illustrated in regional climate models, which are used to model the evolution of a climate system over a limited area. These models address smaller spatial regions than global climate models. However, the higher resolution of regional climate models better captures the impact of local features such as lakes and mountains as well as the subgrid-scale atmospheric process. The PRUDENCE project involves regional models over Europe from various climate research centers ( http://prudence.dmi.dk/) and employs a major archive of data of a 25-km resolution covering the 1951–2100 transient periods. In the analysis, spatial parameters are estimated based on the approximated likelihood approach using PRUDENCE data (about 8000 grid points). Here, the mean trend in the surface temperature change is modeled as follows:
T ( s ) = β 0 + β 1 I l a n d / s e a ( s ) + β 2 P ( s ) + β 3 l o n ( s ) + β 4 l a t ( s ) + β 5 e l e v ( s ) ,
where Iland/sea(s) is an indicator function for the sea and land, P(s) is the amount of seasonal precipitation, lon(s) is the longitude, lat(s) is the latitude, and elev(s) is the elevation in location s. For the detrended surface temperature field, we consider a stationary Gaussian spatial process with an exponential covariance function, σ 2 exp ( d 2 2 ξ 2 ). For simplicity, we also assume that σ2 is known and θ = 2ξ2.
Now we consider an approximated likelihood function for θ. Considering a log transformation for the correlation function ρ leads to
θ ^ ~ a p p r o x N ( θ , θ 4 ( d D d 2 σ d 2 ) 1 ) ,
where θ ^ is the MLE of the spatial parameter θ,
θ ^ = d D d 2 σ d 2 d D d σ d log ρ ( d ) .
Note that the model coefficients (β0, β1, β2, β3, β4) are the parameters of interest and that the correlation parameter θ is the nuisance parameter in this example. Here, the full conditional distribution of the parameter of interest, π(β0, β1, β2, β3, β4|θ, d), can be obtained based on the model of the mean trend in the surface temperature change, while the true marginal density of the nuisance parameter, π(β|d), and approximate marginal density of the nuisance parameter, π*(β|d), can be computed by using likelihood functions (1) and (3), respectively. Therefore, the full conditional distribution of the parameter of interest, (β0, β1, β2, β3, β4), can be expressed as
[ β 0 , β 1 , β 2 , β 3 , β 4 | D , θ ] | σ 2 R ( θ ) | 1 / 2 exp ( 1 2 σ 2 ( D T ) R 1 ( θ ) ( D T ) ) ,
where D = {d(1), d(2),…, d(n)} is the observation vector and T = {T(1), T(2),…, T(n)} is the mean trend vector in (2). The approximated full conditional distribution of the correlation parameter θ is of the form
[ θ | D , β 0 , β 1 , β 2 , β 3 , β 4 ] exp ( 1 2 ( θ ^ θ ) 2 θ 4 d D d 2 σ d 2 ) ,
where as the exact full conditional distribution is quite similar to (5). Further, θ ^ is the MLE of the spatial parameter θ in (4). The reference points for (β0, β1, β2, β3, β4) and θ are randomly chosen in the neighborhood of the MLEs of the parameters. For more details, see [8].
In the next step, various grid points (n = 400, 900,1600) are randomly chosen and then eliminated as in the simulation study Table 1 provides the first-order-approximated Kullback–Leibler divergence along with the exact Kullback–Leibler divergence under various settings. The estimated information measures are quite efficient and competitive because the seasonal mean surface temperature fields from the global climate model are already smoothed. Further, the approximated likelihood function of the correlation parameter is not well estimated, particularly when the number of observations is less than 100.

5. Summary

We introduced various ways of checking the sensitive effects on the target posterior distribution of the parameter of interest by using an alternative to the full conditional distribution of the nuisance parameter. By using Laplace’s method, approximated Kullback–Leibler divergence between π(θ|d) and π*(θ|d) was also calculated in terms of the entropy of π(β|d) and π*(β|d) at the generalized MLE β ^. Other information measures provided similar results. However, it is still difficult to check analytically the robustness of the marginal posterior distribution of interest, π(θ|d), according to the choice of the full conditional distribution of the nuisance parameter, π(β|θ, d). Nonetheless, for the general class of available marginal posterior distributions of the nuisance parameters, the sensitivity and robustness to the marginal posterior distribution of the parameters of interest can be checked approximately. In addition, we can find a reasonable and flexible substitute for the complicated full conditional distribution of the nuisance parameter under the sensitivity and robustness criteria to the target distribution and then perform inference based on this substitute. Our approach can be applied to future sensitivity analysis on the posterior predictive distribution by using an approximated posterior distribution, Bayes factor or marginal density by using the choice of prior distribution, and expected loss or utility by using an approximated posterior distribution.

Author Contributions

Jung In Seo and Yongku Kim conceived the idea and developed the mothod presented in this paper. Both authors performed data analysis and wrote the paper. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tierney, L.; Kadane, J. B. Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar]
  2. Yu, Y.; Meng, X.L. To center or not to center: That is not the question—An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Stat 2011, 20, 531–570. [Google Scholar]
  3. Filippone, M.; Girolami, M. Pseudo-marginal Bayesian inference for Gaussian Processes. IEEE Trans. Pattern. Anal. Mach. Intell. 2014. doi:TPAMI.2014.2316530. [Google Scholar]
  4. Attias, H. Inferring parameters and structure of latent variable models by variational Bayes. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99), Stockholm, Sweden, 30 July–1 August 1999; Morgan Kaufmann Publishers Inc: San Francisco, CA, USA, 1999; pp. 21–30. [Google Scholar]
  5. Minka, T.P. Expectation propagation for approximate Bayesian inference. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001; Morgan Kaufmann Publishers Inc: San Francisco, CA, USA, 2001; pp. 362–369. [Google Scholar]
  6. Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B 2009, 71, 319–392. [Google Scholar]
  7. Furrer, R.; Sain, S.R. Spatial model fitting for large datasets with applications to climate and microarray problems. Stat. Comput 2008, 19, 113–128. [Google Scholar]
  8. Kim, Y.; Kim, D.H. An approximate likelihood function of spatial correlation parameters 2015. submitted.
  9. Cressie, N. Statistics for Spatial Data; Revised ed.; Wiley: New York, NY, USA, 1993. [Google Scholar]
  10. Dickey, J.M. Approximate posterior distributions. J. Am. Stat. Assoc 1976, 71, 680–689. [Google Scholar]
  11. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat 1951, 22, 79–86. [Google Scholar]
  12. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J 1948, 27. [Google Scholar]
  13. Schneier, B. Applied Cryptography, 2nd ed; John Wiley and Sons: New York, NY, USA, 1996. [Google Scholar]
Table 1. First-order-approximated Kullback–Leibler divergence and the exact Kullback–Leibler divergence under the various settings of the grid points.
Table 1. First-order-approximated Kullback–Leibler divergence and the exact Kullback–Leibler divergence under the various settings of the grid points.
Sampled Grid SizeFirst-Order-Approximated Kullback–Leibler DistanceExact Kullback–Leibler Distance
4000.4820.425
9000.2790.251
16000.1210.108

Share and Cite

MDPI and ACS Style

Seo, J.I.; Kim, Y. Approximated Information Analysis in Bayesian Inference. Entropy 2015, 17, 1441-1451. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031441

AMA Style

Seo JI, Kim Y. Approximated Information Analysis in Bayesian Inference. Entropy. 2015; 17(3):1441-1451. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031441

Chicago/Turabian Style

Seo, Jung In, and Yongku Kim. 2015. "Approximated Information Analysis in Bayesian Inference" Entropy 17, no. 3: 1441-1451. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031441

Article Metrics

Back to TopTop