An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory

Anade, Dadja; Gorce, Jean-Marie; Mary, Philippe; Perlaza, Samir M.

doi:10.3390/e22060690

Open AccessArticle

An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory^†

¹

Laboratoire CITI, a Joint Laboratory between INRIA, the Université de Lyon and the Institut National de Sciences Appliquées (INSA) de Lyon. 6 Av. des Arts, 69621 Villeurbanne, France

²

IETR and the Institut National de Sciences Appliquées (INSA) de Rennes, 20 Avenue des Buttes de Coësmes, CS 70839, 35708 Rennes, France

³

INRIA, Centre de Recherche de Sophia Antipolis—Méditerranée, 2004 Route des Lucioles, 06902 Sophia Antipolis, France

⁴

Princeton University, Electrical Engineering Department, Princeton, NJ 08544, USA

^*

Author to whom correspondence should be addressed.

^†

Parts of this paper appear in the proceedings of the IEEE international Conference on Communications (ICC), Dublin, Ireland, June 2020, and in the INRIA technical report RR-9329.

^‡

Authors are listed in alphabetical order.

Entropy 2020, 22(6), 690; https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

Submission received: 12 May 2020 / Revised: 10 June 2020 / Accepted: 11 June 2020 / Published: 20 June 2020

(This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper introduces an upper bound on the absolute difference between:

(a)

the cumulative distribution function (CDF) of the sum of a finite number of independent and identically distributed random variables with finite absolute third moment; and

(b)

a saddlepoint approximation of such CDF. This upper bound, which is particularly precise in the regime of large deviations, is used to study the dependence testing (DT) bound and the meta converse (MC) bound on the decoding error probability (DEP) in point-to-point memoryless channels. Often, these bounds cannot be analytically calculated and thus lower and upper bounds become particularly useful. Within this context, the main results include, respectively, new upper and lower bounds on the DT and MC bounds. A numerical experimentation of these bounds is presented in the case of the binary symmetric channel, the additive white Gaussian noise channel, and the additive symmetric

α

-stable noise channel.

Keywords:

sums of independent and identically random variables; saddlepoint approximation; memoryless channels

1. Introduction

This paper focuses on approximating the cumulative distribution function (CDF) of sums of a finite number of real-valued independent and identically distributed (i.i.d.) random variables with finite absolute third moment. More specifically, let

Y_{1}

,

Y_{2}

, ⋯,

Y_{n}

, with n an integer and

2 ⩽ n < \infty

, be real-valued random variables with probability distribution

P_{Y}

. Denote by

F_{Y}

the CDF associated with

P_{Y}

, and, if it exists, denote by

f_{Y}

the corresponding probability density function (PDF). Let also

X_{n} = \sum_{t = 1}^{n} Y_{t}

(1)

be a random variable with distribution

P_{X_{n}}

. Denote by

F_{X_{n}}

the CDF and if it exists, denote by

f_{X_{n}}

the PDF associated with

P_{X_{n}}

. The objective is to provide a positive function that approximates

F_{X_{n}}

and an upper bound on the resulting approximation error. In the following, a positive function

g : R \to R_{+}

is said to approximate

F_{X_{n}}

with an approximation error that is upper bounded by a function

ϵ : R \to R_{+}

, if, for all

x \in R

,

| F_{X_{n}} (x) - g (x) | ⩽ ϵ (x) .

(2)

The case in which

Y_{1}

,

Y_{2}

, ⋯,

Y_{n}

in (1) are stable random variables with

F_{Y}

analytically expressible is trivial. This is essentially because the sum

X_{n}

follows the same distribution of a random variable

a_{n} Y + b_{n}

, where

(a_{n}, b_{n}) \in R^{2}

and Y is a random variable whose CDF is

F_{Y}

. Examples of this case are random variables following the Gaussian, Cauchy, or Levy distributions [1].

In general, the problem of calculating the CDF of

X_{n}

boils down to calculating

n - 1

convolutions. More specifically, it holds that

f_{X_{n}} (x) = \int_{- \infty}^{\infty} f_{X_{n - 1}} (x - t) f_{Y} (t) d t,

(3)

where

f_{X_{1}} = f_{Y}

. Even for discrete random variables and small values of n, the integral in (3) often requires excessive computation resources [2].

When the PDF of the random variable

X_{n}

cannot be conveniently obtained but only the r first moments are known, with

r \in N

, an approximation of the PDF can be obtained by using an Edgeworth expansion. Nonetheless, the resulting relative error in the large deviation regime makes these approximations inaccurate [3].

When the cumulant generating function (CGF) associated with

F_{Y}

, denoted by

K_{Y} : R \to R

, is known, the PDF

f_{X_{n}}

can be obtained via the Laplace inversion lemma [2]. That is, given two reals

α_{-} < 0

and

α_{+} > 0

, if

K_{Y}

is analytic for all

z \in {a + i b \in C : (a, b) \in R^{2} and α_{-} ⩽ a ⩽ α_{+}} \subset C

, then,

f_{X_{n}} (x) = \frac{1}{2 π i} \int_{γ - i \infty}^{γ + i \infty} exp (n K_{Y} (z) - z x) d z,

(4)

with

i = \sqrt{- 1}

and

γ \in (α_{-}, α_{+})

. Note that the domain of

K_{Y}

in (4) has been extended to the complex plane and thus it is often referred to as the complex CGF. With an abuse of notation, both the CGF and the complex CGF are identically denoted.

In the case in which n is sufficiently large, an approximation to the Bromwich integral in (4) can be obtained by choosing the contour to include the unique saddlepoint of the integrand as suggested in [4]. The intuition behind this lies on the following observations:

(i): the saddlepoint, denoted by $z_{0}$ , is unique, real and $z_{0} \in (α_{-}, α_{+})$ ;
(ii): within a neighborhood around the saddlepoint of the form $|z - z_{0}| < ϵ$ , with $z \in C$ and $ϵ > 0$ sufficiently small, $Im [n K_{Y} (z) - z x] = 0$ and $Re [n K_{Y} (z) - z x]$ can be assumed constant; and
(iii): outside such neighborhood, the integrand is negligible.

From

(i)

, it follows that the derivative of

n K_{Y} (t) - t x

with respect to t, with

t \in R

, is equal to zero when it is evaluated at the saddlepoint

z_{0}

. More specifically, for all

t \in R

,

\frac{d}{d t} K_{Y} (t) = E_{P_{Y}} [Y exp (t Y - K_{Y} (t))],

(5)

and thus

E_{P_{Y}} [Y exp (z_{0} Y - K_{Y} (z_{0}))] = \frac{x}{n},

(6)

which shows the dependence of

z_{0}

on both x and n.

A Taylor series expansion of the exponent

n K_{Y} (z) - z x

in the neighborhood of

z_{0}

, leads to the following asymptotic expansion in powers of

\frac{1}{n}

of the Bromwich integral in (4):

\begin{matrix} f_{X_{n}} (x) & = & {\hat{f}}_{X_{n}} (x) (1 + \frac{1}{n} (\frac{1}{8} \frac{K_{Y}^{(4)} (z_{0})}{{(K_{Y}^{(2)} (z_{0}))}^{2}} - \frac{5}{24} \frac{{(K_{Y}^{(3)} (z_{0}))}^{2}}{{(K_{Y}^{(2)} (z_{0}))}^{3}}) + O (\frac{1}{n^{2}})), \end{matrix}

(7)

where

{\hat{f}}_{X_{n}} : R \to R_{+}

is

\begin{matrix} {\hat{f}}_{X_{n}} (x) & = & \sqrt{\frac{1}{2 π n K_{Y}^{(2)} (z_{0})}} exp (n K_{Y} (z_{0}) - z_{0} x), \end{matrix}

(8)

and for all

k \in N

and

t \in R

, the notation

K_{Y}^{(k)} (t)

represents the k-th real derivative of the CGF

K_{Y}

evaluated at t. The first two derivatives

K_{Y}^{(1)}

and

K_{Y}^{(2)}

play a central role, and thus it is worth providing explicit expressions. That is,

\begin{matrix} K_{Y}^{(1)} (t) & ≜ & E_{P_{Y}} [Y exp (t Y - K_{Y} (t))], and \end{matrix}

(9)

\begin{matrix} K_{Y}^{(2)} (t) & ≜ & E_{P_{Y}} [{|Y - K_{Y}^{(1)} (t)|}^{2} exp (t Y - K_{Y} (t))] . \end{matrix}

(10)

The function

{\hat{f}}_{X_{n}}

in (8) is referred to as the saddlepoint approximation of the PDF

f_{X_{n}}

and was first introduced in [4]. Nonetheless,

{\hat{f}}_{X_{n}}

is not necessarily a PDF as often its integral on

R

is not equal to one. A particular exception is observed only in three cases [5]. First, when

f_{Y}

is the PDF of a Gaussian random variable, the saddlepoint approximation

{\hat{f}}_{X_{n}}

is identical to

f_{X_{n}}

, for all

n > 0

. Second and third, when

f_{Y}

is the PDF associated with a Gamma distribution and an inverse normal distribution, respectively, the saddlepoint approximation

{\hat{f}}_{X_{n}}

is exact up to a normalization constant for all

n > 0

.

An approximation to the CDF

F_{X_{n}}

can be obtained by integrating the PDF in (4), cf., [6,7,8]. In particular, the result reported in [6] leads to an asymptotic expansion of the CDF of

X_{n}

, for all

x \in R

, of the form:

\begin{matrix} F_{X_{n}} (x) & = & {\hat{F}}_{X_{n}} (x) + O (\frac{1}{\sqrt{n}} exp (n K_{Y} (z_{0}) - x z_{0})), \end{matrix}

(11)

where the function

{\hat{F}}_{X_{n}} : R \to R

is the saddlepoint approximation of

F_{X_{n}}

. That is, for all

x \in R

,

\begin{matrix} {\hat{F}}_{X_{n}} (x) & = & 1_{\{z_{0} > 0\}} + {(- 1)}^{1_{\{z_{0} > 0\}}} exp (n K_{Y} (z_{0}) - z_{0} x + \frac{1}{2} z_{0}^{2} n K_{Y}^{(2)} (z_{0})) Q (| z_{0} | \sqrt{n K_{Y}^{(2)} (z_{0})}), \end{matrix}

(12)

where the function

Q : R \to [0, 1]

is the complementary CDF of a Gaussian random variable with zero mean and unit variance. That is, for all

t \in R

,

Q (t) = \frac{1}{\sqrt{2 π}} \int_{t}^{\infty} exp (- \frac{x^{2}}{2}) d x .

(13)

Finally, from the central limit theorem [3], for large values of n and for all

x \in R

, a reasonable approximation to

F_{X_{n}} (x)

is

1 - Q (x)

. In the following, this approximation is referred to as the normal approximation of

F_{X_{n}}

.

1.1. Contributions

The main contribution of this work is an upper bound on the error induced by the saddlepoint approximation

{\hat{F}}_{X_{n}}

in (12) (Theorem 3 in Section 2.2). This result builds upon two observations. The first observation is that the CDF

F_{X_{n}}

can be written for all

x \in R

in the form,

\begin{matrix} F_{X_{n}} (x) = 1_{{z_{0} ⩽ 0}} E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} ⩽ x}}] + 1_{{z_{0} > 0}} (1 - E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} > x}}]), \end{matrix}

(14)

where the random variable

S_{n} = \sum_{t = 1}^{n} Y_{t}^{(z_{0})}

(15)

has a probability distribution denoted by

P_{S_{n}}

, and the random variables

Y_{1}^{(z_{0})}

,

Y_{2}^{(z_{0})}

, …,

Y_{n}^{(z_{0})}

are independent with probability distribution

P_{Y^{(z_{0})}}

. The distribution

P_{Y^{(z_{0})}}

is an exponentially tilted distribution [9] with respect to the distribution

P_{Y}

at the saddlepoint

z_{0}

. More specifically, the Radon–Nikodym derivative of the distribution

P_{Y^{(z_{0})}}

with respect to the distribution

P_{Y}

satisfies for all

y \in supp P_{Y}

,

\begin{matrix} \frac{d P_{Y^{(z_{0})}}}{d P_{Y}} (y) = exp (- (K_{Y} (z_{0}) - z_{0} y)) . \end{matrix}

(16)

The second observation is that the saddlepoint approximation

{\hat{F}}_{X_{n}}

in (12) can be written for all

x \in R

in the form,

\begin{matrix} {\hat{F}}_{X_{n}} (x) = 1_{{z_{0} ⩽ 0}} E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} ⩽ x}}] + 1_{{z_{0} > 0}} (1 - E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} > x}}]), \end{matrix}

(17)

where

Z_{n}

is a Gaussian random variable with mean x, variance

n K_{Y}^{(2)} (z_{0})

, and probability distribution

P_{Z_{n}}

. Note that the means of the random variable

S_{n}

in (14) and

Z_{n}

in (17) are equal to

n K_{Y}^{(1)} (z_{0})

, whereas their variances are equal to

n K_{Y}^{(2)} (z_{0})

. Note also that, from (6), it holds that

x = n K_{Y}^{(1)} (z_{0})

.

Using these observations, it holds that the absolute difference between

F_{X_{n}}

in (14) and

{\hat{F}}_{X_{n}}

in (17) satisfies for all

x \in R

,

\begin{matrix} |F_{X_{n}} (x) - {\hat{F}}_{X_{n}} (x)| \\ = 1_{{z_{0} ⩽ 0}} |E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} ⩽ x}}] - E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} ⩽ x}}]| \\ + 1_{{z_{0} > 0}} |E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} > x}}] - E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} > x}}]| . \end{matrix}

(18)

A step forward (Lemma A1 in Appendix A) is to note that, when x is such that

z_{0} ⩽ 0

, then,

\begin{matrix} |E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} ⩽ x}}] - E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} ⩽ x}}]| \\ ⩽ exp (n K_{Y} (z_{0}) - z_{0} x) min \{1, 2 sup_{a \in R} |F_{S_{n}} (a) - F_{Z_{n}} (a)|\}, \end{matrix}

(19)

and when x is such that

z_{0} > 0

, it holds that

\begin{matrix} |E_{P_{S_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} S_{n}) 1_{{S_{n} > x}}] - E_{P_{Z_{n}}} [exp (n K_{Y} (z_{0}) - z_{0} Z_{n}) 1_{{Z_{n} > x}}]| \\ ⩽ exp (n K_{Y} (z_{0}) - z_{0} x) min \{1, 2 sup_{a \in R} |F_{S_{n}} (a) - F_{Z_{n}} (a)|\}, \end{matrix}

(20)

where

F_{S_{n}}

and

F_{Z_{n}}

are the CDFs of the random variables

S_{n}

and

Z_{n}

, respectively. The final result is obtained by observing that

{sup}_{a \in R} |F_{S_{n}} (a) - F_{Z_{n}} (a)|

can be upper bounded using the Berry–Esseen Theorem (Theorem 1 in Section 2.1). This is essentially due to the fact that the random variable

S_{n}

is the sum of n independent random variables, i.e., (15), and

Z_{n}

is a Gaussian random variable, and both

S_{n}

and

Z_{n}

possess identical means and variances. Thus, the main result (Theorem 3 in Section 2.2) is that, for all

x \in R

,

\begin{matrix} |F_{X_{n}} (x) - {\hat{F}}_{X_{n}} (x)| ⩽ \frac{2 ξ_{Y} (z_{0})}{\sqrt{n}} exp (n K_{Y} (z_{0}) - z_{0} x), \end{matrix}

(21)

where

\begin{matrix} ξ_{Y} (z_{0}) = c_{1} (\frac{E_{P_{Y}} [{|Y - K_{Y}^{(1)} (z_{0})|}^{3} exp (z_{0} Y - K_{Y} (z_{0}))]}{{(K_{Y}^{(2)} (z_{0}))}^{3 / 2}} + c_{2}), \end{matrix}

(22)

with

\begin{matrix} c_{1} ≜ 0.33554, and \end{matrix}

(23a)

\begin{matrix} c_{2} ≜ 0.415 . \end{matrix}

(23b)

Finally, note that (21) holds for any finite value of n and admits the asymptotic scaling law with respect to n suggested in (11).

1.2. Applications

In the realm of information theory, the normal approximation has played a central role in the calculation of bounds on the minimum decoding error probability (DEP) in point-to-point memoryless channels, cf., [10,11]. Thanks to the normal approximation, simple approximations for the dependence testing (DT) bound, the random coding union bound (RCU) bound, and the meta converse (MC) bound have been obtained in [10,12]. The success of these approximations stems from the fact that they are easy to calculate. Nonetheless, easy computation comes at the expense of loose upper and lower bounds and thus uncontrolled approximation errors.

On the other hand, saddlepoint techniques have been extensively used to approximate existing lower and upper bounds on the minimum DEP. See, for instance, [13,14] in the case of the RCU bound and the MC bound. Nonetheless, the errors induced by saddlepoint approximations are often neglected due to the fact that calculating them involves a large number of optimizations and numerical integrations. Currently, the validation of saddlepoint approximations is carried through Monte Carlo simulations. Within this context, the main objectives of this paper are twofold:

(a)

to analytically assess the tightness of the approximation of DT and MC bounds based on the saddlepoint approximation of the CDFs of sums of i.i.d. random variables;

(b)

to provide new lower and upper bounds on the minimum DEP by providing a lower bound on the MC bound and an upper bound on the DT bound. Numerical experimentation of these bounds is presented for the binary symmetric channel (BSC), the additive white Gaussian noise (AWGN) channel, and the additive symmetric

α

-stable noise (S

α

S) channel, where the new bounds are tight and obtained at low computational cost.

2. Sums of Independent and Identically Distributed Random Variables

In this section, upper bounds on the absolute error of approximating

F_{X_{n}}

by the normal approximation and the saddlepoint approximation are presented.

2.1. Error Induced by the Normal Approximation

Given a random variable Y, let the function

ξ_{Y} : R \overset{}{\to} R

be for all

t \in R

:

\begin{matrix} ξ_{Y} (t) & ≜ & c_{1} (\frac{E_{P_{Y}} [{|Y - K_{Y}^{(1)} (t)|}^{3} exp (t Y - K_{Y} (t))]}{{(K_{Y}^{(2)} (t))}^{3 / 2}} + c_{2}), \end{matrix}

(24)

where

c_{1}

and

c_{2}

are defined in (23).

The following theorem, known as the Berry–Esseen theorem [3], introduces an upper bound on the approximation error induced by the normal approximation.

Theorem 1

(Berry–Esseen [15]). Let

Y_{1}

,

Y_{2}

, …,

Y_{n}

be i.i.d random variables with probability distribution

P_{Y}

. Let also

Z_{n}

be a Gaussian random variable with mean

n K_{Y}^{(1)} (0)

, variance

n K_{Y}^{(2)} (0)

, and CDF denoted by

F_{Z_{n}}

. Then, the CDF of the random variable

X_{n}

=

Y_{1}

+

Y_{2}

+…+

Y_{n}

, denoted by

F_{X_{n}}

, satisfies

\begin{matrix} sup_{a \in R} |F_{X_{n}} (a) - F_{Z_{n}} (a)| ⩽ min \{1, \frac{ξ_{Y} (0)}{\sqrt{n}}\}, \end{matrix}

(25)

where the functions

K_{Y}^{(1)}

,

K_{Y}^{(2)}

and

ξ_{Y}

are defined in (9), (10), and (24).

An immediate result from Theorem 1 gives the following upper and lower bounds on

F_{X_{n}} (a)

, for all

a \in R

,

\begin{matrix} F_{X_{n}} (a) ⩽ F_{Z_{n}} (a) + min \{1, \frac{ξ_{Y} (0)}{\sqrt{n}}\} ≜ \bar{Σ} (a, n), and \end{matrix}

(26)

\begin{matrix} F_{X_{n}} (a) ⩾ F_{Z_{n}} (a) - min \{1, \frac{ξ_{Y} (0)}{\sqrt{n}}\} ≜ \underset{̲}{Σ} (a, n) . \end{matrix}

(27)

The main drawback of Theorem 1 is that the upper bound on the approximation error does not depend on the exact value of a. More importantly, for some values of a and n, the upper bound on the approximation error might be particularly big, which leads to irrelevant results.

2.2. Error Induced by the Saddlepoint Approximation

The following theorem introduces an upper bound on the approximation error induced by approximating the CDF

F_{X_{n}}

of

X_{n}

in (1) by the function

η_{Y} :

R^{2}

×

N

→

R

defined such that for all

(θ

, a,

n)

∈

R^{2}

×

N

,

\begin{matrix} η_{Y} (θ, a, n) \\ ≜ 1_{{θ > 0}} + {(- 1)}^{1_{{θ > 0}}} exp (\frac{1}{2} n θ^{2} K_{Y}^{(2)} (θ) + n K_{Y} (θ) - n θ K_{Y}^{(1)} (θ)) Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a + n θ K_{Y}^{(2)} (θ) - n K_{Y}^{(1)} (θ)}{\sqrt{n K_{Y}^{(2)} (θ)}}), \end{matrix}

(28)

where the function

Q : R \to [0, 1]

is the complementary CDF of the standard Gaussian distribution defined in (13). Note that

η_{Y} (θ, n, a)

is identical to

{\hat{F}}_{X_{n}} (a)

, when

θ

is chosen to satisfy the saddlepoint

K_{Y}^{(1)} (θ) = \frac{a}{n}

. Note also that

η_{Y} (0, n, a)

is the CDF of a Gaussian random variable with mean

n K_{Y}^{(1)} (0)

and variance

n K_{Y}^{(2)} (0)

, which are the mean and the variance of

X_{n}

in (1), respectively.

Theorem 2.

Let

Y_{1}

,

Y_{2}

, …,

Y_{n}

be i.i.d. random variables with probability distribution

P_{Y}

and CGF

K_{Y}

. Let also

F_{X_{n}}

be the CDF of the random variable

X_{n}

=

Y_{1}

+

Y_{2}

+…+

Y_{n}

. Hence, for all a∈

R

and for all

θ \in Θ_{Y}

, it holds that

\begin{matrix} |F_{X_{n}} (a) - η_{Y} (θ, a, n)| ⩽ exp (n K_{Y} (θ) - θ a) min \{1, \frac{2 ξ_{Y} (θ)}{\sqrt{n}}\}, \end{matrix}

(29)

where

\begin{matrix} Θ_{Y} ≜ {t \in R : K_{Y} (t) < \infty}; \end{matrix}

(30)

and the functions

ξ_{Y}

and

η_{Y}

are defined in (24) and (28), respectively.

Proof.

The proof of Theorem 2 is presented in Appendix A. ☐

This result leads to the following upper and lower bounds on

F_{X_{n}} (a)

, for all

a \in R

,

\begin{matrix} F_{X_{n}} (a) & ⩽ & η_{Y} (θ, a, n) + exp (n K_{Y} (θ) - θ a) min \{1, \frac{2 ξ_{Y} (θ)}{\sqrt{n}}\}, and \end{matrix}

(31)

\begin{matrix} F_{X_{n}} (a) & ⩾ & η_{Y} (θ, a, n) - exp (n K_{Y} (θ) - θ a) min \{1, \frac{2 ξ_{Y} (θ)}{\sqrt{n}}\}, \end{matrix}

(32)

with

θ

∈

Θ_{Y}

.

The advantages of approximating

F_{X_{n}}

by using Theorem 2 instead of Theorem 1 are twofold. First, both the approximation

η_{Y}

and the corresponding approximation error depend on the exact value of a. In particular, the approximation can be optimized for each value of a via the parameter

θ

. Second, the parameter

θ

in (29) can be optimized to improve either the upper bound in (31) or the lower bound in (32) for some

a \in R

. Nonetheless, such optimizations are not necessarily simple.

An alternative to the optimization on

θ

in (31) and (32) is to choose

θ

such that it minimizes

n K_{Y} (θ) - θ a

. This follows the intuition that, for some values of a and n, the term

exp (n K_{Y} (θ) - θ a)

is the one that influences the most the value of the right-hand side of (29). To build upon this idea, consider the following lemma.

Lemma 1.

Consider a random variable Y with probability distribution

P_{Y}

and CGF

K_{Y}

. Given

n \in N

, let the function

h : R \to R

be defined for all

a \in R

satisfying

\frac{a}{n} \in int C_{Y}

, with

int C_{Y}

denoting the interior of the convex hull of

supp P_{X_{n}}

, as follows:

\begin{matrix} h (a) = inf_{θ \in Θ_{Y}} n K_{Y} (θ) - θ a, \end{matrix}

(33)

where

Θ_{Y}

is defined in (30). Then, the function h is concave and for all

a \in R

,

\begin{matrix} h (a) ⩽ h (n E_{P_{Y}} [Y]) = 0 . \end{matrix}

(34)

Furthermore,

\begin{matrix} h (a) = n K_{Y} (θ^{⋆}) - θ^{⋆} a, \end{matrix}

(35)

where

θ^{⋆}

is the unique solution in θ to

\begin{matrix} n K_{Y}^{(1)} (θ) = a, \end{matrix}

(36)

with

K_{Y}^{(1)}

is defined in (9).

Proof.

The proof of Lemma 1 is presented in Appendix B. ☐

Given

(a, n) \in R \times N

, the value of

h (a)

in (33) is the argument that minimizes the exponential term in (29). An interesting observation from Lemma 1 is that the maximum of h is zero, and it is reached when

a = n E_{P_{Y}} [Y] = E_{P_{X_{n}}} [X_{n}]

. In this case,

θ^{⋆} = 0

, and thus, from (31) and (32), it holds that

\begin{matrix} F_{X_{n}} (a) & ⩽ & η_{Y} (0, a, n) + min \{1, \frac{2 ξ_{Y} (0)}{\sqrt{n}}\} \end{matrix}

\begin{matrix} = & F_{Z_{n}} (a) + min \{1, \frac{2 ξ_{Y} (0)}{\sqrt{n}}\}, and \\ F_{X_{n}} (a) & ⩾ & η_{Y} (0, a, n) - min \{1, \frac{2 ξ_{Y} (0)}{\sqrt{n}}\} \end{matrix}

(37)

\begin{matrix} = & F_{Z_{n}} (a) - min \{1, \frac{2 ξ_{Y} (0)}{\sqrt{n}}\}, \end{matrix}

(38)

where

F_{Z_{n}}

is the CDF defined in Theorem 1. Hence, the upper bound in (37) and the lower bound in (38) obtained from Theorem 2 are worse than those in (26) and (27) obtained from Theorem 1. In a nutshell, for values of a around the vicinity of

n E_{P_{Y}} [Y] = E_{P_{X_{n}}} [X_{n}]

, it is more interesting to use Theorem 1 instead of Theorem 2.

Alternatively, given that h is non-positive and concave, when

|a - n E_{P_{Y}} [Y]|

=

| a -

E_{P_{X_{n}}} [X_{n}] |

>

γ

, with

γ

sufficiently large, it follows that

\begin{matrix} exp (n K_{Y} (θ^{⋆}) - θ^{⋆} a) < min \{1, \frac{ξ_{Y} (0)}{\sqrt{n}}\}, \end{matrix}

(39)

with

θ^{⋆}

defined in (36). Hence, in this case, the right-hand side of (29) is always smaller than the right-hand side of (25). That is, for such values of a and n, the upper and lower bounds in (31) and (32) are better than those in (26) and (27), respectively. The following theorem leverages this observation.

Theorem 3.

Let

Y_{1}

,

Y_{2}

, …,

Y_{n}

be i.i.d. random variables with probability distribution

P_{Y}

and CGF

K_{Y}

. Let also

F_{X_{n}}

be the CDF of the random variable

X_{n} = Y_{1} + Y_{2} + \dots + Y_{n}

. Hence, for all a∈

int C_{X_{n}}

, with

int C_{X_{n}}

the interior of the convex hull of

supp P_{X_{n}}

, it holds that

\begin{matrix} |F_{X_{n}} (a) - {\hat{F}}_{X_{n}} (a)| ⩽ exp (n K_{Y} (θ^{⋆}) - θ^{⋆} a) min \{1, \frac{2 ξ_{Y} (θ^{⋆})}{\sqrt{n}}\}, \end{matrix}

(40)

where

θ^{⋆}

is defined in (36), and the functions

{\hat{F}}_{X_{n}}

and

ξ_{Y}

are defined in (12), and (24), respectively.

Proof.

The proof of Theorem 3 is presented in Appendix C. ☐

An immediate result from Theorem 3 gives the following upper and lower bounds on

F_{X} (a)

, for all

a \in R

,

\begin{matrix} F_{X_{n}} (a) & ⩽ & {\hat{F}}_{X_{n}} (a) + exp (n K_{Y} (θ^{⋆}) - θ^{⋆} a) min \{1, \frac{2 ξ_{Y} (θ^{⋆})}{\sqrt{n}}\} ≜ \bar{Ω} (a, n), and \end{matrix}

(41)

\begin{matrix} F_{X_{n}} (a) & ⩾ & {\hat{F}}_{X_{n}} (a) - exp (n K_{Y} (θ^{⋆}) - θ^{⋆} a) min \{1, \frac{2 ξ_{Y} (θ^{⋆})}{\sqrt{n}}\} ≜ \underset{̲}{Ω} (a, n) . \end{matrix}

(42)

The following section presents two examples that highlight the observations mentioned above.

2.3. Examples

Example 1

(Discrete random variable). Let the random variables

Y_{1}

,

Y_{2}

, …,

Y_{n}

in (1) be i.i.d. Bernoulli random variables with parameter

p = 0.2

and

n = 100

. In this case,

E_{P_{X_{n}}} [X_{n}] = n E_{P_{Y}} [Y] = 20

. Figure 1 depicts the CDF

F_{X_{100}}

of

X_{100}

in (1); the normal approximation

F_{Z_{100}}

in (25); and the saddlepoint approximation

{\hat{F}}_{X_{100}}

in (12). Therein, it is also depicted the upper and lower bounds due to the normal approximation

\bar{Σ}

in (26) and

\underset{̲}{Σ}

in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation

\bar{Ω}

in (41) and

\underset{̲}{Ω}

in (42), respectively. These functions are plotted as a function of a, with a∈

[5, 35]

.

Example 2

(Continuous random variable). Let the random variables

Y_{1}

,

Y_{2}

, …,

Y_{n}

in (1) be i.i.d. chi-squared random variables with parameter

k = 1

and

n = 50

. In this case,

E_{P_{X_{n}}} [X_{n}] = n E_{P_{Y}} [Y] = 50

. Figure 2 depicts the CDF

F_{X_{50}}

of

X_{50}

in (1); the normal approximation

F_{Z_{50}}

in (25); and the saddlepoint approximation

{\hat{F}}_{X_{50}}

in (12). Therein, it is also depicted the upper and lower bounds due to the normal approximation

\bar{Σ}

in (26) and

\underset{̲}{Σ}

in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation

\bar{Ω}

in (41) and

\underset{̲}{Ω}

in (42), respectively. These functions are plotted as a function of a, with a∈

[0, 100]

.

3. Application to Information Theory: Channel Coding

This section focuses on the study of the DEP in point-to-point memoryless channels. The problem is formulated in Section 3.1. The main results presented in this section consist of lower and upper bounds on the DEP. The former, which are obtained building upon the existing DT bound [10], are presented in Section 3.2. The latter, which are obtained from the MC bound [10], are presented in Section 3.3.

3.1. System Model

Consider a point-to-point communication in which a transmitter aims at sending information to one receiver through a noisy memoryless channel. Such a channel can be modeled by a random transformation

\begin{matrix} (X^{n}, Y^{n}, P_{Y | X}), \end{matrix}

(43)

where

n \in

N

is the blocklength and

X

and

Y

are the channel input and channel output sets. Given the channel inputs

x

=

(x_{1}

,

x_{2}

, …,

x_{n})

∈

X^{n}

, the outputs

y

=

(y_{1}

,

y_{2}

, …,

y_{n})

∈

Y^{n}

are observed at the receiver with probability

\begin{matrix} P_{Y | X} (y | x) = \prod_{t = 1}^{n} P_{Y | X} (y_{t} | x_{t}), \end{matrix}

(44)

where, for all

x \in X

,

P_{Y | X = x} \in ▵ (Y)

, with

▵ (Y)

, the set of all possible probability distributions whose support is a subset of

Y

. The objective of the communication is to transmit a message index i, which is a realization of a random variable W that is uniformly distributed over the set

\begin{matrix} W ≜ {1, 2, \dots, M}, \end{matrix}

(45)

with 1 <M<∞. To achieve this objective, the transmitter uses an

(n

, M,

λ)

-code, where

λ

∈

[0, 1]

.

Definition 1

(

(n

, M,

λ)

-code). Given a tuple

(M

, n,

λ)

∈

N^{2}

×

[0, 1]

, an

(n

, M,

λ)

-code for the random transformation in (43) is a system

\begin{matrix} \{(u (1), D (1)), (u (2), D (2)), \dots, (u (M), D (M))\}, \end{matrix}

(46)

where for all

(j, ℓ) \in W^{2}

, with

j \neq ℓ

:

\begin{matrix} u (j) & = & (u_{1} (j), u_{2} (j), \dots, u_{n} (j)) \in X^{n}, \end{matrix}

(47a)

\begin{matrix} D (j) \cap D (ℓ) & = & \emptyset, \end{matrix}

(47b)

\begin{matrix} ⋃_{j \in W} D (j) & \subseteq & Y^{n}, and \end{matrix}

(47c)

\begin{matrix} \frac{1}{M} \sum_{i = 1}^{M} E_{P_{Y | X = u (i)}} [1_{\{Y \notin D (i)\}}] & ⩽ & λ . \end{matrix}

(47d)

To transmit message index i∈

W

, the transmitter uses the codeword

u (i)

. For all t∈{ 1,2,…,

n}

, at channel use t, the transmitter inputs the symbol

u_{t} (i)

into the channel. Assume that, at the end of channel use t, the receiver observes the output

y_{t}

. After n channel uses, the receiver uses the vector

y

=

(y_{1}

,

y_{2}

,…,

y_{n})

and determines that the symbol j was transmitted if

y

∈

D (j)

, with j∈

W

.

Given the

(n

,M,

λ)

-code described by the system in (46), the DEP of the message index i can be computed as

E_{P_{Y | X = u (i)}} [

1_{{Y \notin D (i)}}]

. As a consequence, the average DEP is

\frac{1}{M} \sum_{i = 1}^{M} E_{P_{Y | X = u (i)}} [1_{{Y \notin D (i)}}] .

(48)

Note that, from (47d), the average DEP of such an

(n, M, λ)

-code is upper bounded by

λ

. Given a fixed pair

(n

,

M)

∈

N^{2}

, the minimum

λ

for which an

(n

,M,

λ)

-code exists is defined hereunder.

Definition 2.

Given a pair

(n

,

M)

∈

N^{2}

, the minimum average DEP for the random transformation in (43), denoted by

λ^{*} (n, M)

, is given by

\begin{matrix} λ^{*} (n, M) = min \{λ \in [0, 1] : \exists (n, M, λ) - code\} . \end{matrix}

(49)

When

λ

is chosen accordingly with the reliability constraints, an

(n, M, λ)

-code is said to transmit at an information rate

R = \frac{\log_{2} (M)}{n}

bits per channel use.

The remainder of this section introduces the DT and MC bounds. The DT bound is one of the tightest existing upper bounds on

λ^{*} (n, M)

in (49), whereas the MC bound is one of the tightest lower bounds.

3.2. Dependence Testing Bound

This section describes an upper bound on

λ^{*} (n, M)

, for a fixed pair

(n, M) \in N^{2}

. Given a probability distribution

P_{X} \in ▵ (X^{n})

, let the random variable

ι (X; Y)

satisfy

ι (X; Y) ≜ \ln (\frac{d P_{X Y}}{d P_{X} P_{Y}} (X, Y)),

(50)

where the function

\frac{d P_{X Y}}{d P_{X} P_{Y}} : X^{n} \times Y^{n} \to R

denotes the Radon–Nikodym derivative of the joint probability measure

P_{X Y}

with respect to the product of probability measures

P_{X} P_{Y}

, with

P_{X Y} = P_{X} P_{Y | X}

and

P_{Y}

the corresponding marginal. Let the function

T : N^{2} \times ▵ (X^{n}) \to R_{+}

be for all

(n

,

M)

∈

N^{2}

and for all probability distributions

P_{X} \in ▵ (X^{n})

,

\begin{matrix} T (n, M, P_{X}) & = & E_{P_{X} P_{Y | X}} [1_{\{ι (X; Y) ⩽ \ln (\frac{M - 1}{2})\}}] + \frac{M - 1}{2} E_{P_{X} P_{Y}} [1_{\{ι (X; Y) > \ln (\frac{M - 1}{2})\}}] . \end{matrix}

(51)

Using this notation, the following lemma states the DT bound.

Lemma 2

(Dependence testing bound [10]). Given a pair

(n

,

M)

∈

N^{2}

, the following holds for all

P_{X} \in ▵ (X^{n})

, with respect to the random transformation in (43):

\begin{matrix} λ^{*} (n, M) ⩽ T (n, M, P_{X}), \end{matrix}

(52)

with the function T defined in (51).

Note that the input probability distribution

P_{X}

in Lemma 2 can be chosen among all possible probability distributions

P_{X} \in ▵ (X^{n})

to minimize the right-hand side of (52), which improves the bound. Note also that with some loss of optimality, the optimization domain can be restricted to the set of product probability distributions for which for all

x

∈

X^{n}

,

\begin{matrix} P_{X} (x) = \prod_{t = 1}^{n} P_{X} (x_{t}), \end{matrix}

(53)

with

P_{X} \in ▵ (X)

. Hence, subject to (44), the random variable

ι (X; Y)

in (50) can be written as the sum of i.i.d. random variables, i.e.,

\begin{matrix} ι (X; Y) = \sum_{t = 1}^{n} ι (X_{t}; Y_{t}) . \end{matrix}

(54)

This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function T in (51), for some given values

(n, M) \in N^{2}

and a given distribution

P_{X} \in ▵ (X^{n})

for the random transformation in (43) subject to (44). These bounds become significantly relevant when the exact value of

T (n, M, P_{X})

cannot be calculated with respect to the random transformation in (43). In such a case, providing upper and lower bounds on

T (n, M, P_{X})

helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

3.2.1. Normal Approximation

This section describes the normal approximation of the function T in (51). That is, the random variable

ι (X; Y)

is assumed to satisfy (54) and to follow a Gaussian distribution. More specifically, for all

P_{X} \in ▵ (X)

, let

\begin{matrix} μ (P_{X}) ≜ E_{P_{X} P_{Y | X}} [ι (X; Y)], \end{matrix}

(55)

\begin{matrix} σ (P_{X}) ≜ E_{P_{X} P_{Y | X}} [{(ι (X; Y) - μ (P_{X}))}^{2}], and \end{matrix}

(56)

\begin{matrix} ξ (P_{X}) ≜ c_{1} (\frac{E_{P_{X} P_{Y | X}} [| ι (X; Y) - μ (P_{X}) |^{3}]}{σ {(P_{X})}^{\frac{3}{2}}} + c_{2}), \end{matrix}

(57)

with

c_{1}

and

c_{2}

defined in (23), be functions of the input distribution

P_{X}

. In particular,

μ (P_{X})

and

σ (P_{X})

are respectively the first moment and the second central moment of the random variables

ι (X_{1}; Y_{1})

,

ι (X_{2}; Y_{2})

…

ι (X_{n}; Y_{n})

. Using this notation, consider the functions

D : N^{2} \times ▵ (X) \to R_{+}

and

N : N^{2} \times ▵ (X) \to R_{+}

such that for all

(n, M) \in N^{2}

and for all

P_{X} \in ▵ (X)

,

\begin{matrix} D (n, M, P_{X}) = max \{0, α (n, M, P_{X}) - \frac{ξ (P_{X})}{\sqrt{n}}\}, and \end{matrix}

(58)

\begin{matrix} N (n, M, P_{X}) = min \{1, α (n, M, P_{X}) + \frac{5 ξ (P_{X})}{\sqrt{n}} + \frac{2 \ln (2)}{σ {(P_{X})}^{\frac{1}{2}} \sqrt{2 n π}}\}, \end{matrix}

(59)

where

\begin{matrix} α (n, M, P_{X}) & ≜ & Q (\frac{n μ (P_{X}) - \ln (\frac{M - 1}{2})}{\sqrt{n σ (P_{X})}}) . \end{matrix}

(60)

Using this notation, the following theorem introduces lower and upper bounds on the function T in (51).

Theorem 4.

Given a pair

(n, M) \in N^{2}

, for all input distributions

P_{X} \in ▵ (X^{n})

subject to (53), the following holds with respect to the random transformation in (43) subject to (44),

\begin{matrix} D (n, M, P_{X}) ⩽ T (n, M, P_{X}) ⩽ N (n, M, P_{X}), \end{matrix}

(61)

where the functions T, D and N are defined in (51), (58) and (59), respectively.

Proof.

The proof of Theorem 4 is presented in [12]. Essentially, it relies on Theorem 1 for upper and lower bounding the terms

E_{P_{X} P_{Y | X}} [1_{\{ι (X; Y) ⩽ \ln (\frac{M - 1}{2})\}}]

in (51). The upper bound on

E_{P_{X} P_{Y}} [1_{\{ι (X; Y) > \ln (\frac{M - 1}{2})\}}]

in (51) follows from Lemma 47 in [10]. ☐

In [12], the function

α (n, M, P_{X})

in (60) is often referred to as the normal approximation of

T (n, M, P_{X})

, which is indeed a language abuse. In Section 2.1, a comment is given on the fact that the lower and upper bounds, i.e., the functions D in (58) and N in (59), are often too far from the normal approximation

α

in (60).

3.2.2. Saddlepoint Approximation

This section describes an approximation of the function T in (51) by using the saddlepoint approximation of the CDF of the random variable

ι (X; Y)

, as suggested in Section 2.2. Given a distribution

P_{X} \in ▵ (X)

, the moment generating function of

ι (X; Y)

is

\begin{matrix} φ (P_{X}, θ) ≜ E_{P_{X} P_{Y | X}} [exp (θ ι (X; Y))], \end{matrix}

(62)

with

θ \in R

. For all

P_{X} \in ▵ (X)

and for all

θ \in R

, consider the following functions:

\begin{matrix} μ (P_{X}, θ) ≜ E_{P_{X} P_{Y | X}} [\frac{ι (X; Y) exp (θ ι (X; Y))}{φ (P_{X}, θ)}], \end{matrix}

(63)

\begin{matrix} V (P_{X}, θ) ≜ E_{P_{X} P_{Y | X}} [{(ι (X; Y) - μ (P_{X}, θ))}^{2} \frac{exp (θ ι (X; Y))}{φ (P_{X}, θ)}], and \end{matrix}

(64)

\begin{matrix} ξ (P_{X}, θ) ≜ c_{1} (\frac{E_{P_{X} P_{Y | X}} [{|ι (X; Y) - μ (P_{X}, θ)|}^{3} \frac{exp (θ ι (X; Y))}{φ (P_{X}, θ)}]}{{(V (P_{X}, θ))}^{3 / 2}} + c_{2}), \end{matrix}

(65)

where

c_{1}

and

c_{2}

are defined in (23). Using this notation, consider the functions

β_{1} : N^{2} \times R \times ▵ (X) \to R_{+}

and

β_{2} : N^{2} \times R \times ▵ (X) \to R_{+}

:

\begin{matrix} β_{1} (n, M, θ, P_{X}) = 1_{{θ > 0}} + {(- 1)}^{1_{{θ > 0}}} exp (n \ln (φ (P_{X}, θ)) - θ \ln (\frac{M - 1}{2}) + \frac{1}{2} θ^{2} n V (P_{X}, θ)) Q (\sqrt{n V (P_{X}, θ)} | θ |), \end{matrix}

(66)

and

\begin{matrix} β_{2} (n, M, θ, P_{X}) \\ = 1_{{θ ⩽ - 1}} + {(- 1)}^{1_{{θ ⩽ - 1}}} exp (n \ln (φ (P_{X}, θ)) - (θ + 1) \ln (\frac{M - 1}{2}) + \frac{1}{2} {(θ + 1)}^{2} n V (P_{X}, θ)) Q (\sqrt{n V (P_{X}, θ)} | θ + 1 |) . \end{matrix}

(67)

Note that

β_{1}

is the saddlepoint approximation of the CDF of the random variable

ι (X; Y)

in (54) when

X

and

Y

follow the distribution

P_{X} P_{Y | X}

. Note also that

β_{2}

is the saddlepoint approximation of the complementary CDF of the random variable

ι (X; Y)

in (54) when

X

and

Y

follow the distribution

P_{X} P_{Y}

.

Consider also the following functions:

\begin{matrix} G_{1} (n, M, θ, P_{X}) = β_{1} (n, M, θ, P_{X}) - \frac{2 ξ (P_{X}, θ)}{\sqrt{n}} exp (n \ln (φ (P_{X}, θ)) - θ \ln (\frac{M - 1}{2})), \end{matrix}

(68)

\begin{matrix} G_{2} (n, M, θ, P_{X}) = β_{2} (n, M, θ, P_{X}) - \frac{2 ξ (P_{X}, θ)}{\sqrt{n}} exp (n \ln (φ (P_{X}, θ)) - (θ + 1) \ln (\frac{M - 1}{2})), \end{matrix}

(69)

\begin{matrix} G (n, M, θ, P_{X}) = max \{0, G_{1} (n, M, θ, P_{X})\} + \frac{M - 1}{2} max \{0, G_{2} (n, M, θ, P_{X})\}, and \end{matrix}

(70)

\begin{matrix} S (n, M, θ, P_{X}) = min \{1, β (n, M, θ, P_{X}) + \frac{4 ξ (P_{X}, θ)}{\sqrt{n}} exp (n \ln (φ (P_{X}, θ)) - θ \ln (\frac{M - 1}{2}))\}, \end{matrix}

(71)

where,

\begin{matrix} β (n, M, θ, P_{X}) = β_{1} (n, M, θ, P_{X}) + \frac{M - 1}{2} β_{2} (n, M, θ, P_{X}), \end{matrix}

(72)

with

β_{1}

in (66) and

β_{2}

in (67). Often, the function

β

in (72) is referred to as the saddlepoint approximation of the function T in (51), which is indeed a language abuse.

The following theorem introduces new lower and upper bounds on the function T in (51).

Theorem 5.

Given a pair

(n, M) \in N^{2}

, for all input distributions

P_{X} \in ▵ (X^{n})

subject to (53), the following holds with respect to the random transformation in (43) subject to (44),

\begin{matrix} G (n, M, θ, P_{X}) ⩽ T (n, M, P_{X}) ⩽ S (n, M, θ, P_{X}) \end{matrix}

(73)

where θ is the unique solution in t to

\begin{matrix} n μ (P_{X}, t) = \ln (\frac{M - 1}{2}), \end{matrix}

(74)

and the functions T, G, and S are defined in (51), (70), and (71).

Proof.

The proof of Theorem 5 is provided in Appendix F. In a nutshell, the proof relies on Theorem 3 for independently bounding the terms

E_{P_{X} P_{Y | X}} [1_{\{ι (X; Y) ⩽ \ln (\frac{M - 1}{2})\}}]

and

E_{P_{X} P_{Y}} [1_{\{ι (X; Y) > \ln (\frac{M - 1}{2})\}}]

in (51). ☐

3.3. Meta Converse Bound

This section describes a lower bound on

λ^{*} (n, M)

, for a fixed pair

(n, M) \in N^{2}

. Given two probability distributions

P_{X Y} \in ▵ (X^{n} \times Y^{n})

and

Q_{Y} \in ▵ (Y^{n})

, let the random variable

\tilde{ι} (X; Y | Q_{Y})

satisfy

\tilde{ι} (X; Y | Q_{Y}) ≜ \ln (\frac{d P_{X Y}}{d P_{X} Q_{Y}} (X, Y)) .

(75)

For all

(n

,M,

γ)

∈

N^{2} \times R

and for all probability distributions

P_{X} \in ▵ (X^{n})

and

Q_{Y} \in ▵ (Y^{n})

, let the function

C : N^{2} \times ▵ (X^{n}) \times ▵ (Y^{n}) \times R_{+} \to R_{+}

be

\begin{matrix} C (n, M, P_{X}, Q_{Y}, γ) ≜ E_{P_{X} P_{Y | X}} [1_{\{\tilde{ι} (X; Y | Q_{Y}) ⩽ \ln (γ)\}}] + γ (E_{P_{X} Q_{Y}} [1_{\{\tilde{ι} (X; Y | Q_{Y}) > \ln (γ)\}}] - \frac{1}{M}) . \end{matrix}

(76)

Using this notation, the following lemma describes the MC bound.

Lemma 3

(MC Bound [10,13]). Given a pair

(n

,

M)

∈

N^{2}

, the following holds for all

Q_{Y} \in Δ (Y^{n})

, with respect to the random transformation in (43):

\begin{matrix} λ^{*} (n, M) ⩾ inf_{P_{X} \in Δ (X^{n})} max_{γ ⩾ 0} C (n, M, P_{X}, Q_{Y}, γ), \end{matrix}

(77)

where the function C is defined in (76).

Note that the output probability distribution

Q_{Y}

in Lemma 3 can be chosen among all possible probability distributions

Q_{Y} \in ▵ (Y^{n})

to maximize the right-hand side of (76), which improves the bound. Note also that, with some loss of optimality, the optimization domain can be restricted to the set of probability distributions for which for all

y

∈

Y^{n}

,

\begin{matrix} Q_{Y} (y) = \prod_{t = 1}^{n} Q_{Y} (y_{t}), \end{matrix}

(78)

with

Q_{Y} \in ▵ (Y)

. Hence, subject to (44), for all

x \in X^{n}

, the random variable

\tilde{ι} (x; Y | Q_{Y})

in (76) can be written as the sum of the independent random variables, i.e.,

\begin{matrix} \tilde{ι} (x; Y | Q_{Y}) = \sum_{t = 1}^{n} \tilde{ι} (x_{t}; Y_{t} | Q_{Y}) . \end{matrix}

(79)

With some loss of generality, the focus is on a channel transformation of the form in (43) for which the following condition holds: The infimum in (77) is achieved by a product distribution, i.e.,

P_{X}

is of the form in (53), when the probability distribution

Q_{Y}

satisfies (78). Note that this condition is met by memoryless channels such as the BSC, the AWGN and S

α

S channels with binary antipodal inputs, i.e., input alphabets are of the form

X = {a, - a}

, with

a \in R

. This follows from the fact that the random variable

\tilde{ι} (x; Y | Q_{Y})

is invariant of the choice of

x \in X^{n}

when the probability distribution

Q_{Y}

satisfies (78) and for all

y \in Y

,

\begin{matrix} Q_{Y} (y) = \frac{P_{Y | X} (y | - a) + P_{Y | X} (y | a)}{2} . \end{matrix}

(80)

Under these conditions, the random variable

\tilde{ι} (X; Y | Q_{Y})

in (76) can be written as the sum of i.i.d. random variables, i.e.,

\begin{matrix} \tilde{ι} (X; Y | Q_{Y}) = \sum_{t = 1}^{n} \tilde{ι} (X_{t}; Y_{t} | Q_{Y}) . \end{matrix}

(81)

This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function C in (76), for some given values

(n, M) \in N^{2}

and given distributions

P_{X} \in ▵ (X^{n})

and

Q_{Y} \in ▵ (Y^{n})

. These bounds become significantly relevant when the exact value of

C (n, M, P_{X}, Q_{Y}, γ)

cannot be calculated with respect to the random transformation in (43). In such a case, providing upper and lower bounds on

C (n, M, P_{X}, Q_{Y}, γ)

helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

3.3.1. Normal Approximation

This section describes the normal approximation of the function C in (76), that is to say, the random variable

\tilde{ι} (X; Y | Q_{Y})

is assumed to satisfy (81) and to follow a Gaussian distribution. More specifically, for all

(P_{X}, Q_{Y}) \in ▵ (X) \times ▵ (Y)

, let

\begin{matrix} \tilde{μ} (P_{X}, Q_{Y}) ≜ E_{P_{X} P_{Y | X}} [\tilde{ι} (X; Y | Q_{Y})], \end{matrix}

(82)

\begin{matrix} \tilde{σ} (P_{X}, Q_{Y}) ≜ E_{P_{X} P_{Y | X}} [{(\tilde{ι} (X; Y | Q_{Y}) - \tilde{μ} (P_{X}, Q_{Y}))}^{2}], and \end{matrix}

(83)

\begin{matrix} \tilde{ξ} (P_{X}, Q_{Y}) ≜ c_{1} (\frac{E_{P_{X} P_{Y | X}} [| \tilde{ι} (X; Y | Q_{Y}) - \tilde{μ} (P_{X}, Q_{Y}) |^{3}]}{{(\tilde{σ} (P_{X}, Q_{Y}))}^{3 / 2}} + c_{2}) \end{matrix}

(84)

with

c_{1}

and

c_{2}

defined in (23), be functions of the input and output distributions

P_{X}

and

Q_{Y}

, respectively. In particular,

\tilde{μ} (P_{X}, Q_{Y})

and

\tilde{σ} (P_{X}, Q_{Y})

are respectively the first moment and the second central moment of the random variables

\tilde{ι} (X_{1}; Y_{1} | Q_{Y}), \tilde{ι} (X_{2}; Y_{2} | Q_{Y}), \dots \tilde{ι} (X_{n}; Y_{n} | Q_{Y})

. Using this notation, consider the functions

\tilde{D} : N^{2} \times ▵ (X) \times ▵ (Y) \times R_{+} \to R_{+}

and

\tilde{N} : N^{2} \times ▵ (X) \times ▵ (Y) \times R_{+} \to R_{+}

such that, for all

(n, M, γ) \in N^{2} \times R_{+}

and for all

P_{X} \in ▵ (X)

and for all

Q_{Y} \in ▵ (Y)

,

\begin{matrix} \tilde{D} (n, M, P_{X}, Q_{Y}, γ) = max \{0, \tilde{α} (n, M, P_{X}, Q_{Y}, γ) - \frac{\tilde{ξ} (P_{X}, Q_{Y})}{\sqrt{n}}\}, and \end{matrix}

(85)

\begin{matrix} \tilde{N} (n, M, P_{X}, Q_{Y}, γ) = min \{1, \tilde{α} (n, M, P_{X}, Q_{Y}, γ) + \frac{5 \tilde{ξ} (P_{X}, Q_{Y})}{\sqrt{n}} + \frac{2 \ln (2)}{\tilde{σ} {(P_{X}, Q_{Y})}^{\frac{1}{2}} \sqrt{2 n π}}\}, \end{matrix}

(86)

where

\begin{matrix} \tilde{α} (n, M, P_{X}, Q_{Y}, γ) & ≜ & Q (\frac{n \tilde{μ} (P_{X}, Q_{Y}) - \ln (γ)}{\sqrt{n \tilde{σ} (P_{X}, Q_{Y})}}) - \frac{γ}{M} . \end{matrix}

(87)

Using this notation, the following theorem introduces lower and upper bounds on the function C in (76).

Theorem 6.

Given a pair

(n, M) \in N^{2}

, for all input distributions

P_{X} \in ▵ (X^{n})

subject to (53), for all output distributions

Q_{Y} \in ▵ (Y^{n})

subject to (78), and for all

γ ⩾ 0

, the following holds with respect to the random transformation in (43) subject to (44),

\begin{matrix} \tilde{D} (n, M, P_{X}, Q_{Y}, γ) ⩽ C (n, M, P_{X}, Q_{Y}, γ) ⩽ \tilde{N} (n, M, P_{X}, Q_{Y}, γ), \end{matrix}

(88)

where the functions C,

\tilde{D}

, and

\tilde{N}

are defined in (76), (85), and (86), respectively.

Proof.

The proof of Theorem 6 is partially presented in [10]. Essentially, it relies on Theorem 1 for upper and lower bounding the term

E_{P_{X} P_{Y | X}} [1_{\{\tilde{ι} (X; Y | Q_{Y}) ⩽ \ln (γ)\}}]

in (76); and using Lemma 47 in [10] for upper bounding the term

E_{P_{X} Q_{Y}} [1_{\{\tilde{ι} (X; Y | Q_{Y}) > \ln (γ)\}}]

in (76). ☐

The function

\tilde{α} (n, M, P_{X}, Q_{Y}, γ)

in (87) is often referred to as the normal approximation of

C (n, M, P_{X})

, which is indeed a language abuse. In Section 2.1, a comment is given on the fact that the lower and upper bounds on the normal approximation, i.e., the functions

\tilde{D}

in (85) and

\tilde{N}

in (86), are often too far from the normal approximation

\tilde{α}

in (87).

3.3.2. Saddlepoint Approximation

This section describes an approximation of the function C in (76) by using the saddlepoint approximation of the CDF of the random variable

\tilde{ι} (X; Y | Q_{Y})

, as suggested in Section 2.2. Given two distributions

P_{X} \in ▵ (X)

and

Q_{Y} \in ▵ (Y)

, let the random variable

\tilde{ι} (X; Y | Q_{Y})

satisfy

\begin{matrix} \tilde{ι} (X; Y | Q_{Y}) ≜ \ln (\frac{d P_{X} P_{Y | X}}{d P_{X} Q_{Y}} (X, Y)), \end{matrix}

(89)

where

P_{Y | X}

is in (44). The moment generating function of

\tilde{ι} (X; Y | Q_{Y})

is

\begin{matrix} \tilde{φ} (P_{X}, Q_{Y}, θ) ≜ E_{P_{X} P_{Y | X}} [exp (θ \tilde{ι} (X; Y | Q_{Y}))], \end{matrix}

(90)

with

θ \in R

. For all

P_{X} \in ▵ (X)

and

Q_{Y} \in ▵ (Y)

, and for all

θ \in R

, consider the following functions:

\begin{matrix} \tilde{μ} (P_{X}, Q_{Y}, θ) ≜ E_{P_{X} P_{Y | X}} [\frac{\tilde{ι} (X; Y | Q_{Y}) exp (θ \tilde{ι} (X; Y | Q_{Y}))}{\tilde{φ} (P_{X}, Q_{Y}, θ)}], \end{matrix}

(91)

\begin{matrix} \tilde{V} (P_{X}, Q_{Y}, θ) ≜ E_{P_{X} P_{Y | X}} [{(\tilde{ι} (X; Y | Q_{Y}) - \tilde{μ} (P_{X}, Q_{Y}, θ))}^{2} \frac{exp (θ \tilde{ι} (X; Y | Q_{Y}))}{\tilde{φ} (P_{X}, Q_{Y}, θ)}], and \end{matrix}

(92)

\begin{matrix} \tilde{ξ} (P_{X}, Q_{Y}, θ) ≜ c_{1} (\frac{E_{P_{X} P_{Y | X}} [{|\tilde{ι} (X; Y | Q_{Y}) - \tilde{μ} (P_{X}, Q_{Y}, θ)|}^{3} \frac{exp (θ \tilde{ι} (X; Y | Q_{Y}))}{\tilde{φ} (P_{X}, Q_{Y}, θ)}]}{{(\tilde{V} (P_{X}, Q_{Y}, θ))}^{3 / 2}} + c_{2}), \end{matrix}

(93)

where

c_{1}

and

c_{2}

are defined in (23). Using this notation, consider the functions

{\tilde{β}}_{1} : N \times R_{+} \times R \times ▵ (X) \times ▵ (Y) \to R_{+}

and

{\tilde{β}}_{2} : N \times R_{+} \times R \times ▵ (X) \times ▵ (Y) \to R_{+}

:

\begin{matrix} {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) \end{matrix}

\begin{matrix} = 1_{{θ > 0}} + {(- 1)}^{1_{{θ > 0}}} exp (n \ln (\tilde{φ} (P_{X}, Q_{Y}, θ)) - θ \ln (γ) + \frac{1}{2} θ^{2} n \tilde{V} (P_{X}, Q_{Y}, θ)) Q (\sqrt{n \tilde{V} (P_{X}, Q_{Y}, θ)} | θ |), and \\ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) \\ = 1_{{θ ⩽ - 1}} + {(- 1)}^{1_{{θ ⩽ - 1}}} exp (n \ln (\tilde{φ} (P_{X}, Q_{Y}, θ)) - (θ + 1) \ln (γ) + \frac{1}{2} {(θ + 1)}^{2} n \tilde{V} (P_{X}, Q_{Y}, θ)) \end{matrix}

(94)

\begin{matrix} Q (\sqrt{n \tilde{V} (P_{X}, Q_{Y}, θ)} | θ + 1 |) . \end{matrix}

(95)

Note that

{\tilde{β}}_{1}

and

{\tilde{β}}_{2}

are the saddlepoint approximation of the CDF and the complementary CDF of the random variable

\tilde{ι} (X; Y | Q_{Y})

in (81) when

(X, Y)

follows the distribution

P_{X} P_{Y | X}

and

P_{X} Q_{Y}

, respectively. Consider also the following functions:

\begin{matrix} {\tilde{G}}_{1} (n, γ, θ, P_{X}, Q_{Y}) = {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) - \frac{2 \tilde{ξ} (P_{X}, Q_{Y}, θ)}{\sqrt{n}} exp (n \ln (\tilde{φ} (P_{X}, Q_{Y}, θ)) - θ \ln (γ)), \end{matrix}

(96)

\begin{matrix} {\tilde{G}}_{2} (n, γ, θ, P_{X}, Q_{Y}) = {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) - \frac{2 \tilde{ξ} (P_{X}, Q_{Y}, θ)}{\sqrt{n}} exp (n \ln (\tilde{φ} (P_{X}, Q_{Y}, θ)) - (θ + 1) \ln (γ)), \end{matrix}

(97)

\begin{matrix} \tilde{G} (n, γ, θ, P_{X}, Q_{Y}, M) = max \{0, {\tilde{G}}_{1} (n, γ, θ, P_{X}, Q_{Y})\} + γ max \{0, {\tilde{G}}_{2} (n, γ, θ, P_{X}, Q_{Y})\} - \frac{γ}{M}, \end{matrix}

(98)

\begin{matrix} \tilde{S} (n, γ, θ, P_{X}, Q_{Y}, M) = min \{1, \tilde{β} (n, γ, θ, P_{X}, Q_{Y}, M) + \frac{4 \tilde{ξ} (P_{X}, Q_{Y}, θ)}{\sqrt{n}} exp (n \ln (\tilde{φ} (P_{X}, Q_{Y}, θ)) - θ \ln (γ))\}, \end{matrix}

(99)

and

\tilde{β} (n, γ, θ, P_{X}, Q_{Y}, M) = {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) + γ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) - \frac{γ}{M} .

(100)

The function

\tilde{β} (n, γ, θ, P_{X}, Q_{Y}, M)

in (100) is referred to as the saddlepoint approximation of the function C in (76), which is indeed a language abuse.

The following theorem introduces new lower and upper bounds on the function C in (76).

Theorem 7.

Given a pair

(n, M) \in N^{2}

, for all input distributions

P_{X} \in ▵ (X^{n})

subject to (53), for all output distributions

Q_{Y} \in ▵ (Y^{n})

subject to (81) such that for all

x \in X

,

P_{Y | X = x}

is absolutely continuous with respect to

Q_{Y}

, for all

γ ⩾ 0

, the following holds with respect to the random transformation in (43) subject to (44),

\begin{matrix} \tilde{G} (n, γ, θ, P_{X}, Q_{Y}, M) ⩽ C (n, M, P_{X}, Q_{Y}, γ) ⩽ \tilde{S} (n, γ, θ, P_{X}, Q_{Y}, M) \end{matrix}

(101)

where θ is the unique solution in t to

\begin{matrix} n μ (P_{X}, t) = \ln (γ), \end{matrix}

(102)

and the functions C,

\tilde{G}

, and

\tilde{S}

are defined in (76), (98) and (99).

Proof.

The proof of Theorem 7 is provided in Appendix G. ☐

Note that, in (101), the parameter

γ

can be optimized as in (77).

3.4. Numerical Experimentation

The normal and the saddlepoint approximations of the DT and MC bounds as well as their corresponding upper and lower bounds presented from Section 3.2.1 to Section 3.3.2 are studied in the cases of the BSC, the AWGN channel, and the S

α

S channel. The latter is defined by the random transformation in (43) subject to (44) and for all

(x, y) \in X \times Y

:

\begin{matrix} P_{Y | X} (y | x) = P_{Z} (y - x), \end{matrix}

(103)

where

P_{Z}

is a probability distribution satisfying for all

t \in R

,

\begin{matrix} E_{P_{Z}} [exp (i t Z)] = exp (- {|σ t|}^{α}), \end{matrix}

(104)

with

i = \sqrt{- 1}

. The reals

α \in (0, 2]

and

σ \in R_{+}

in (104) are parameters of the S

α

S channel.

In the following figures, Figure 3, Figure 4 and Figure 5, the channel inputs are discrete

X = {- 1, 1}

,

P_{X}

is the uniform distribution, and

θ

is chosen to be the unique solution to t in (74) or (102) depending on whether the DT or MC bound is considered. For the results relative to the MC bound,

Q_{Y}

is chosen to be equal to the distribution

P_{Y}

, i.e., the marginal of

P_{X} P_{Y | X}

. The parameter

γ

is chosen to maximize the function

C (n, 2^{n R}, P_{X}, Q_{Y}, γ)

in (76). The plots in Figure 3a, Figure 4a and Figure 5a illustrate the function

T (n, 2^{n R}, P_{X})

in (51) as well as the bounds in Theorems 4 and 5. Figure 3b, Figure 4b and Figure 5b illustrate the function C in (76) and the bounds in Theorems 6 and 7. The normal approximations, i.e,

α (n, 2^{n R}, P_{X})

in (60) and

\tilde{α} (n, 2^{n R}, P_{X}, Q_{Y}, γ)

in (87), of the DT and MC bounds, respectively, are plotted in black diamonds. The upper bounds, i.e.,

N (n, 2^{n R}, P_{X})

in (59) and

\tilde{N} (n, 2^{n R}, P_{X}, Q_{Y}, γ)

in (86), are plotted in blue squares. The lower bounds of the DT and MC bounds, i.e.,

D (n, M, P_{X})

in (58) and

\tilde{D} (n, 2^{n R}, P_{X}, Q_{Y}, γ)

in (85), are non-positive in these cases, and thus do not appear in the figures. The saddlepoint approximations of the DT and MC bounds, i.e.,

β (n, 2^{n R}, θ, P_{X})

in (72) and

\tilde{β} (n, γ, θ, P_{X}, Q_{Y}, 2^{n R})

in (100), respectively, are plotted in black stars. The upper bounds, i.e.,

S (n, 2^{n R}, θ, P_{X})

in (71) and

\tilde{S} (n, γ, θ, P_{X}, Q_{Y}, 2^{n R})

in (99), are plotted in blue upward-pointing triangles. The lower bounds, i.e.,

G (n, 2^{n R}, θ, P_{X})

in (70) and

\tilde{G} (n, γ, θ, P_{X}, Q_{Y}, 2^{n R})

in (98), are plotted in red downward-pointing triangles.

Figure 3 illustrates the case of a BSC with cross-over probability

δ = 0.11

. The information rates are chosen to be

R = 0.32

and

R = 0.42

bits per channel use in Figure 3a,b, respectively. The functions T and C can be calculated exactly and thus they are plotted in magenta asterisks in Figure 3a,b, respectively. In these figures, it can be observed that the saddlepoint approximations of the DT and MC bounds, i.e.,

β

and

\tilde{β}

, respectively, overlap with the functions T and C. These observations are in line with those reported in [13]. Therein, the saddlepoint approximations of the RCU bound and the MC bound are both shown to be precise approximations. Alternatively, the normal approximations of the DT and MC bounds, i.e.,

α

and

\tilde{α}

, do not overlap with T and C respectively.

In Figure 3, it can be observed that the new bounds on the DT and MC provided in Theorems 5 and 7, respectively, are tighter than those in Theorems 4 and 6. Indeed, the upper-bounds N and

\tilde{N}

on the DT and MC bounds derived from the normal approximations

α

and

\tilde{α}

, are several order of magnitude above T and C, respectively. This observation remains valid for AWGN channels in Figure 4 and S

α

S channels in Figure 5, respectively. Note that, in Figure 3a, for

n > 1000

, the normal approximation

α

is below the lower bound G showing that approximating T by

α

is too optimistic. These results show that the use of the Berry–Esseen Theorem to approximate the DT and MC bounds may lead to erroneous conclusions due to the uncontrolled error made on the approximation.

Figure 4 and Figure 5 illustrate the cases of a real-valued AWGN channel and a S

α

S channel, respectively. The signal-to-noise ratio (SNR) is

SNR = 1

for the AWGN channel. The information rate is

R = 0.425

bits per channel use for the AWGN channel and

R = 0.38

bits per channel use for the S

α

S channel with

(α, σ) = (1.4, 0.6)

. In both cases, the functions T in (51) and C in (76) can not be computed explicitly and hence does not appear in Figure 4 and Figure 5. In addition, the lower bounds

D (n, M, P_{X})

and

\tilde{D} (n, 2^{n R}, P_{X}, Q_{Y}, γ)

obtained from Theorems 4 and 6 are non-positive in these cases, and thus, do not appear on these figures.

In Figure 4, note that the saddlepoint approximations,

β

and

\tilde{β}

, are well bounded by Theorems 5 and 7 for a large range of blocklengths. Alternatively, the lower bounds D and

\tilde{D}

based on the normal approximation do not even exist in that case.

In Figure 5, note that the upper bounds S and

\tilde{S}

on the DT and MC respectively are relatively tight compared to those in AWGN channel case. This characteristic is of a particular importance in a channel such as S

α

S channel, where the DT and MC bounds remain computable only by Monte Carlo simulations.

4. Discussion and Further Work

One of the main results of this work is Theorem 3, which gives an upper bound on the error induced by the saddlepoint approximation of the CDF of a sum of i.i.d. random variables. This result paves the way to study channel coding problems at any finite blocklength and any constraint on the DEP. In particular, Theorem 3 is used to bound the DT and MC bounds in point-to-point memoryless channels. This leads to tighter bounds than those obtained from Berry–Esseen Theorem (Theorem 1), cf., examples in Section 3.4, particularly for the small values of the DEP.

The bound on the approximation error presented in Theorem 2 uses a triangle inequality in the proof of Lemma A1, which is loose. This is essentially the reason why Theorem 2 is not reduced to the Berry–Esseen Theorem when the parameter

θ

is equal to zero. An interesting extension of this work is to tighten the inequality in Lemma A1 such that the Berry–Esseen Theorem can be obtained as a special case of Theorem 2, i.e., when

θ = 0

. If such improvement on Theorem 2 is possible, Theorem 3 will be significantly improved and it would be more precise everywhere and in particular in the vicinity of the mean of the sum in (1).

Author Contributions

All authors have equally contributed to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the French National Agency for Research (ANR) under grant ANR-16-CE25-0001.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2

The proof of Theorem 2 relies on the notion of exponentially tilted distributions. Let

φ_{Y}

be the moment generating function of the distribution

P_{Y}

. Given

θ \in Θ_{Y}

, let

Y_{1}^{(θ)}

,

Y_{2}^{(θ)}

, …,

Y_{n}^{(θ)}

be random variables whose joint probability distribution, denoted by

P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}

, satisfies for all

(y_{1}, y_{2}, \dots, y_{n}) \in R^{n}

,

\begin{matrix} \frac{d P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}}{d P_{Y_{1} Y_{2} \dots Y_{n}}} (y_{1}, y_{2}, \dots, y_{n}) = \frac{exp (θ \sum_{j = 1}^{n} y_{j})}{{(φ_{Y} (θ))}^{n}} . \end{matrix}

(A1)

That is, the distribution

P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}

is an exponentially tilted distribution with respect to

P_{Y_{1} Y_{2} \dots Y_{n}}

. Using this notation, for all

A \subseteq R

and for all

θ \in Θ_{Y}

,

\begin{matrix} P_{X_{n}} (A) & = & E_{P_{X_{n}}} [1_{{X_{n} \in A}}] \end{matrix}

(A2a)

\begin{matrix} = & E_{P_{Y_{1} Y_{2} \dots Y_{n}}} [1_{\{\sum_{j = 1}^{n} Y_{j} \in A\}}] \end{matrix}

(A2b)

\begin{matrix} = & E_{P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}} [\frac{d P_{Y_{1} Y_{2} \dots Y_{n}}}{d P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}} (Y_{1}^{(θ)}, Y_{2}^{(θ)}, \dots, Y_{n}^{(θ)}) 1_{\{\sum_{j = 1}^{n} Y_{j}^{(θ)} \in A\}}] \end{matrix}

(A2c)

\begin{matrix} = & E_{P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}} [{(\frac{d P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}}{d P_{Y_{1} Y_{2} \dots Y_{n}}} (Y_{1}^{(θ)}, Y_{2}^{(θ)}, \dots, Y_{n}^{(θ)}))}^{- 1} 1_{\{\sum_{j = 1}^{n} Y_{j}^{(θ)} \in A\}}] \end{matrix}

(A2d)

\begin{matrix} = & E_{P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}} [{(\frac{exp (θ \sum_{j = 1}^{n} Y_{j}^{(θ)})}{{(φ_{Y} (θ))}^{n}})}^{- 1} 1_{\{\sum_{j = 1}^{n} Y_{j}^{(θ)} \in A\}}] \end{matrix}

(A2e)

\begin{matrix} = & {(φ_{Y} (θ))}^{n} E_{P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}} [exp (- θ \sum_{j = 1}^{n} Y_{j}^{(θ)}) 1_{\{\sum_{j = 1}^{n} Y_{j}^{(θ)} \in A\}}] \end{matrix}

(A2f)

For the ease of the notation, consider the random variable

\begin{matrix} S_{n, θ} = \sum_{j = 1}^{n} Y_{j}^{(θ)}, \end{matrix}

(A3)

whose probability distribution is denoted by

P_{S_{n, θ}}

. Hence, plugging (A3) in (A2f) yields,

\begin{matrix} P_{X_{n}} (A) & = & {(φ_{Y} (θ))}^{n} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{\{S_{n, θ} \in A\}}] . \end{matrix}

(A4)

The proof continues by upper bounding the following absolute difference

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]|, \end{matrix}

(A5)

where

Z_{n, θ}

is a Gaussian random variable with the same mean and variance as

S_{n, θ}

, and probability distribution denoted by

P_{Z_{n, θ}}

. The relevance of the absolute difference in (A5) is that it is equal to the error of calculating

P_{X_{n}} (A)

under the assumption that the resulting random variable

S_{n}

follows a Gaussian distribution. The following lemma provides an upper bound on the absolute difference in (A5) in terms of the Kolmogorov–Smirnov distance between the distributions

P_{S_{n, θ}}

and

P_{Z_{n, θ}}

, denoted by

\begin{matrix} Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) ≜ sup_{x \in R} |F_{S_{n, θ}} (x) - F_{Z_{n, θ}} (x)|, \end{matrix}

(A6)

where

F_{S_{n, θ}}

and

F_{Z_{n, θ}}

are the CDFs of the random variables

S_{n, θ}

and

Z_{n, θ}

, respectively.

Lemma A1.

Given

θ \in Θ_{Y}

and

a \in R

, consider the following conditions:

(i): $θ ⩽ 0$ and $A = (- \infty, a]$ , and
(ii): $θ > 0$ and $A = (a, \infty)$ .

If at least one of the above conditions is satisfied, then the absolute difference in (A5) satisfies

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| ⩽ \frac{{(φ_{Y} (θ))}^{n}}{exp (θ a)} min \{1, 2 Δ (P_{S_{n, θ}}, P_{Z_{n, θ}})\} . \end{matrix}

(A7)

Proof.

The proof of Lemma A1 is presented in Appendix D. ☐

The proof continues by providing an upper bound on

Δ (P_{S_{n, θ}}, P_{Z_{n, θ}})

in (A7) leveraging the observation that

S_{n, θ}

is the sum of n independent and identically distributed random variables. This follows immediately from the assumptions of Theorem 2, nonetheless, for the sake of completeness, the following lemma provides a proof of this statement.

Lemma A2.

For all

θ \in Θ_{Y}

,

Y_{1}^{(θ)}

,

Y_{2}^{(θ)}

, …,

Y_{n}^{(θ)}

are mutually independent and identically distributed random variables with probability distribution

P_{Y^{(θ)}}

. Moreover,

P_{Y^{(θ)}}

is an exponential tilted distribution with respect to

P_{Y}

. That is,

P_{Y^{(θ)}}

satisfies for all

y \in R

,

\begin{matrix} \frac{d P_{Y^{(θ)}}}{d P_{Y}} (y) = \frac{exp (θ y)}{φ_{Y} (θ)} . \end{matrix}

(A8)

Proof.

The proof of Lemma A2 is presented in Appendix E. ☐

Lemma A2 paves the way for obtaining an upper bound on

Δ (P_{S_{n, θ}}, P_{Z_{n, θ}})

in (A7) via the Berry–Esseen Theorem (Theorem 1). Let

μ_{θ}

,

V_{θ}

, and

T_{θ}

be the mean, the variance, and the third absolute central moment of the random variable

Y^{(θ)}

, whose probability distribution is

P_{Y^{(θ)}}

in (A8). More specifically:

\begin{matrix} μ_{θ} & = & E_{P_{Y^{(θ)}}} [Y^{(θ)}] = E_{P_{Y}} [\frac{Y exp (θ Y)}{φ_{Y} (θ)}], \end{matrix}

(A9)

\begin{matrix} V_{θ} & = & E_{P_{Y^{(θ)}}} [{(Y^{(θ)} - μ_{θ})}^{2}] = E_{P_{Y}} [\frac{{(Y - μ_{θ})}^{2} exp (θ Y)}{φ_{Y} (θ)}], and \end{matrix}

(A10)

\begin{matrix} T_{θ} & = & E_{P_{Y^{(θ)}}} [| Y^{(θ)} - μ_{θ} |^{3}] = E_{P_{Y}} [\frac{| Y - μ_{θ} |^{3} exp (θ Y)}{φ_{Y} (θ)}] . \end{matrix}

(A11)

Let also

ξ_{θ}

be

\begin{matrix} ξ_{θ} = c_{1} (\frac{T_{θ}}{V_{θ}^{3 / 2}} + c_{2}), \end{matrix}

(A12)

with

c_{1}

and

c_{2}

defined in (23).

From Theorem 1, it follows that

Δ (P_{S_{n, θ}}, P_{Z_{n, θ}})

in (A7) satisfies:

\begin{matrix} Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) ⩽ min \{1, \frac{ξ_{θ}}{\sqrt{n}}\} ⩽ \frac{ξ_{θ}}{\sqrt{n}} . \end{matrix}

(A13)

Plugging (A13) in (A7) yields

\begin{matrix} |P_{X_{n}} (A) - \frac{{(φ_{Y} (θ))}^{n}}{exp (θ b)} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1 \{Z_{n, θ} \in A\}]| ⩽ \frac{{(φ_{Y} (θ))}^{n}}{exp (θ a)} min \{1, 2 \frac{ξ_{θ}}{\sqrt{n}}\}, \end{matrix}

(A14)

under the assumption that at least one of the conditions of Lemma A1 is met.

The proof ends by obtaining a closed-form expression of the term

E_{P_{Z_{n, θ}}} [

exp (- θ Z_{n, θ})

1_{{Z_{n, θ} \in A}}

] in (A14) under the assumption that at least one of the conditions of Lemma A1 is met. First, assuming that condition

(i)

in Lemma A1 holds, it follows that:

\begin{matrix} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{{Z_{n, θ} \in A}}] \\ = \int_{- \infty}^{a} exp (- θ z) \frac{1}{\sqrt{2 π n V_{θ}}} exp (- \frac{{(z - n μ_{θ})}^{2}}{2 n V_{θ}}) d z \end{matrix}

(A15a)

\begin{matrix} = \int_{- \infty}^{a} \frac{1}{\sqrt{2 π n V_{θ}}} exp (- \frac{z^{2} - 2 z n μ_{θ} + n^{2} μ_{θ}^{2} + 2 n θ V_{θ} z}{2 n V_{θ}}) d z \end{matrix}

(A15b)

\begin{matrix} = \int_{- \infty}^{a} \frac{1}{\sqrt{2 π n V_{θ}}} exp (- \frac{{(z - n μ_{θ} + n θ V_{θ})}^{2} - n^{2} θ^{2} V_{θ}^{2} + 2 n μ_{θ} n θ V_{θ}}{2 n V_{θ}}) d z \end{matrix}

(A15c)

\begin{matrix} = exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) \int_{- \infty}^{a} \frac{1}{\sqrt{2 π n V_{θ}}} exp (- \frac{{(z - n μ_{θ} + n θ V_{θ})}^{2}}{2 n V_{θ}}) d z \end{matrix}

(A15d)

\begin{matrix} = exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) \int_{- \infty}^{\frac{a - n μ_{θ} + n θ V_{θ}}{\sqrt{n V_{θ}}}} \frac{1}{\sqrt{2 π}} exp (- \frac{t^{2}}{2}) d t \end{matrix}

(A15e)

\begin{matrix} = exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) Q (- \frac{a - n μ_{θ} + n θ V_{θ}}{\sqrt{n V_{θ}}}) . \end{matrix}

(A15f)

Second, assuming that condition

(i i)

in Lemma A1 holds, it follows that:

\begin{matrix} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{{Z_{n, θ} \in A}}] & = & \int_{a}^{\infty} exp (- θ z) \frac{1}{\sqrt{2 π n V_{θ}}} exp (- \frac{{(z - n μ_{θ})}^{2}}{2 n V_{θ}}) d z \end{matrix}

(A16a)

\begin{matrix} = & exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) \int_{\frac{a - n μ_{θ} + n θ V_{θ}}{\sqrt{n V_{θ}}}}^{\infty} \frac{1}{\sqrt{2 π}} exp (- \frac{t^{2}}{2}) d t \end{matrix}

(A16b)

\begin{matrix} = & exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) Q (\frac{a - n μ_{θ} + n θ V_{θ}}{\sqrt{n V_{θ}}}), \end{matrix}

(A16c)

where Q in n (A15f) and (A16c) is the complementary CDF of the standard Gaussian distribution defined in (13).

The expressions in (A15f) and (A16c) can be jointly written as follows:

\begin{matrix} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{{Z_{n, θ} \in A}}] = exp (- θ n μ_{θ} + \frac{1}{2} n V_{θ} θ^{2}) Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a - n μ_{θ} + n θ V_{θ}}{\sqrt{n V_{θ}}}), \end{matrix}

(A17)

under the assumption that at least one of the conditions

(i)

or

(i i)

in Lemma A1 holds.

Finally, under the same assumption, plugging (A17) in (A14) yields

\begin{matrix} | P_{X_{n}} (A) - exp (n \ln (φ_{Y} (θ)) - n θ μ_{θ} + \frac{1}{2} n θ^{2} V_{θ}) Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a + n θ V_{θ} - n μ_{θ}}{\sqrt{n V_{θ}}}) | \\ ⩽ exp (n \ln (φ_{Y} (θ)) - θ a) min \{1, \frac{2 ξ_{θ}}{\sqrt{n}}\} . \end{matrix}

(A18)

Under condition

(i)

in Lemma A1, the inequality in (A18) can be written as follows:

\begin{matrix} | F_{X_{n}} (a) - exp (n \ln (φ_{Y} (θ)) - n θ μ_{θ} + \frac{1}{2} n θ^{2} V_{θ}) \cdot Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a + n θ V_{θ} - n μ_{θ}}{\sqrt{n V_{θ}}}) | \\ ⩽ exp (n \ln (φ_{Y} (θ)) - θ a) min \{1, \frac{2 ξ_{θ}}{\sqrt{n}}\} . \end{matrix}

(A19)

Alternatively, under condition

(i i)

in Lemma A1, it follows from (A18) that

\begin{matrix} | 1 - F_{X_{n}} (a) - exp (n \ln (φ_{Y} (θ)) - n θ μ_{θ} + \frac{1}{2} n θ^{2} V_{θ}) \cdot Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a + n θ V_{θ} - n μ_{θ}}{\sqrt{n V_{θ}}}) | \\ ⩽ exp (n \ln (φ_{Y} (θ)) - θ a) min \{1, \frac{2 ξ_{θ}}{\sqrt{n}}\}, \end{matrix}

(A20)

Then, jointly writing (A19) and (A20), it follows that, for all

a \in R

and for all

θ \in Θ_{Y}

,

\begin{matrix} |F_{X_{n}} (a) - 1_{{θ > 0}} - {(- 1)}^{1_{{θ > 0}}} exp (n \ln (φ_{Y} (θ)) - n θ μ_{θ} + \frac{1}{2} n θ^{2} V_{θ}) Q ({(- 1)}^{1_{{θ ⩽ 0}}} \frac{a + n θ V_{θ} - n μ_{θ}}{\sqrt{n V_{θ}}})| \\ ⩽ exp (n \ln (φ_{Y} (θ)) - θ a) min \{1, \frac{2 ξ_{θ}}{\sqrt{n}}\}, \end{matrix}

(A21)

which can also be written as

\begin{matrix} |F_{X_{n}} (a) - η_{Y} (θ, a, n)| ⩽ exp (n K_{Y} (θ) - θ a) min \{1, \frac{2 ξ_{Y} (θ)}{\sqrt{n}}\} . \end{matrix}

(A22)

This completes the proof.

Appendix B. Proof of Lemma 1

Let

g : R^{2} \times N \to R

be for all

(θ, a, n) \in R^{2} \times N

,

g (θ, a, n) = n K_{Y} (θ) - θ a = n \ln (φ_{Y} (θ)) - θ a .

(A23)

First, note that for all

θ \in Θ_{Y}

and for all

n \in N

, the function g is a concave function of a. Hence, from the definition of the function h in (33), h is concave.

Second, note that

0 \in Θ_{Y}

given that

φ_{Y} (0) = 1 < \infty

. Hence, from (33), it holds that, for all

a \in R

,

\begin{matrix} h (a) ⩽ n K_{Y} (0) = n \ln (φ_{Y} (0)) = n \ln (1) = 0 . \end{matrix}

(A24a)

This shows that the function h in (33) is not positive.

Third, the next step of the proof consists of proving the equality in (35). For doing so, let

θ^{⋆} : R \times N \to R

be for all

(a, n) \in R \times N

,

\begin{matrix} θ^{⋆} (a, n) = \underset{θ \in Θ_{Y}}{\arg inf} g (θ, a, n) . \end{matrix}

(A25b)

Note that the function g is a convex in

θ

. This follows by verifying that its second derivative with respect to

θ

is positive. That is,

\begin{matrix} \frac{d}{d θ} g (θ, a, n) & = & \frac{n}{φ_{Y} (θ)} \frac{d}{d θ} φ_{Y} (θ) - a, and \end{matrix}

(A26a)

\begin{matrix} \frac{d^{2}}{d θ^{2}} g (θ, a, n) & = & \frac{n}{{(φ_{Y} (θ))}^{2}} (φ_{Y} (θ) \frac{d^{2}}{d θ^{2}} φ_{Y} (θ) - {(\frac{d}{d θ} φ_{Y} (θ))}^{2}) \end{matrix}

(A26b)

\begin{matrix} = & n (\frac{1}{φ_{Y} (θ)} \frac{d^{2}}{d θ^{2}} φ_{Y} (θ) - {(\frac{1}{φ_{Y} (θ)} \frac{d}{d θ} φ_{Y} (θ))}^{2}) \end{matrix}

(A26c)

\begin{matrix} = & n (\frac{1}{φ_{Y} (θ)} \frac{d^{2}}{d θ^{2}} E_{P_{Y}} [exp (θ Y)] - {(\frac{1}{φ_{Y} (θ)} \frac{d}{d θ} E_{P_{Y}} [exp (θ Y)])}^{2}) \end{matrix}

(A26d)

\begin{matrix} = & n (\frac{E_{P_{Y}} [Y^{2} exp (θ Y)]}{E_{P_{Y}} [exp (θ Y)]} - {(\frac{E_{P_{Y}} [Y exp (θ Y)]}{E_{P_{Y}} [exp (θ Y)]})}^{2}) \end{matrix}

(A26e)

\begin{matrix} = & n (E_{P_{Y}} [\frac{Y^{2} exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}] - {(E_{P_{Y}} [\frac{Y exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}])}^{2}) \\ = & n (E_{P_{Y}} [\frac{Y^{2} exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}] - 2 E_{P_{Y}} [\frac{Y exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}] K_{Y}^{(1)} (θ) + {(K_{Y}^{(1)} (θ))}^{2}) \end{matrix}

(A26f)

\begin{matrix} = & n E_{P_{Y}} [\frac{{(Y - K_{Y}^{(1)} (θ))}^{2} exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}] > 0 . \end{matrix}

(A26g)

Hence, if the first derivative of g with respect to

θ

(see (A26a)) admits a zero in

Θ_{Y}

, then

θ^{⋆} (a, n)

is the unique solution in

θ

to the following equality:

\begin{matrix} \frac{d}{d θ} g (θ, a, n) & = & \frac{n}{φ_{Y} (θ)} \frac{d}{d θ} φ_{Y} (θ) - a = 0 . \end{matrix}

(A27)

Equation (A27) in

θ

can be rewritten as follows:

\begin{matrix} \frac{a}{n} & = & \frac{1}{φ_{Y} (θ)} \frac{d}{d θ} φ_{Y} (θ) \end{matrix}

(A28a)

\begin{matrix} = & \frac{1}{E_{P_{Y}} [exp (θ Y)]} \frac{d}{d θ} E_{P_{Y}} [exp (θ Y)] \end{matrix}

(A28b)

\begin{matrix} = & \frac{1}{E_{P_{Y}} [exp (θ Y)]} E_{P_{Y}} [Y exp (θ Y)] \end{matrix}

(A28c)

\begin{matrix} = & E_{P_{Y}} [\frac{Y exp (θ Y)}{E_{P_{Y}} [exp (θ Y)]}] \end{matrix}

(A28d)

\begin{matrix} = & K_{Y}^{(1)} (θ) . \end{matrix}

(A28e)

From (A28d), it follows that

\frac{a}{n}

is the mean of a random variable that follows an exponentially tilted distribution with respect to

P_{Y}

. Thus, there exists a solution in

θ

for (A28d) if and only if

\frac{a}{n} \in int C_{Y}

—hence the equality in (35).

Finally, from (A28d),

a = n E_{P_{Y}} [Y]

implies that

θ^{⋆} (a, n) = 0

. Hence,

h (n E_{P_{Y}} [Y]) = 0

from (35). This completes the proof for

h (n E_{P_{Y}} [Y]) = 0

.

Appendix C. Proof of Theorem 3

From Lemma 1, it holds that given

(a, n) \in R \times N

such that

\frac{a}{n} \in int C_{Y}

,

\begin{matrix} n K_{Y}^{(1)} (θ^{⋆}) = a . \end{matrix}

(A29)

Then, plugging (A29) in the expression of

η_{Y} (θ^{⋆}, a, n)

, with function

η_{Y}

defined in (28), the following holds:

\begin{matrix} η_{Y} (θ^{⋆}, a, n) \\ = 1_{{θ^{⋆} > 0}} + {(- 1)}^{1_{{θ^{⋆} > 0}}} exp (\frac{1}{2} n {(θ^{⋆})}^{2} K_{Y}^{(2)} (θ) + n K_{Y}^{} (θ^{⋆}) - θ^{⋆} a) Q ({(- 1)}^{1_{{θ^{⋆} ⩽ 0}}} \frac{a + n θ^{⋆} K_{Y}^{(2)} (θ^{⋆}) - a}{\sqrt{n K_{Y}^{(2)} (θ^{⋆})}}) \end{matrix}

(A30a)

\begin{matrix} = 1_{{θ^{⋆} > 0}} + {(- 1)}^{1_{{θ^{⋆} > 0}}} exp (\frac{1}{2} n {(θ^{⋆})}^{2} K_{Y}^{(2)} (θ) + n K_{Y}^{} (θ^{⋆}) - θ^{⋆} a) Q ({(- 1)}^{1_{{θ^{⋆} ⩽ 0}}} θ^{⋆} \sqrt{n K_{Y}^{(2)} (θ^{⋆})}) \end{matrix}

(A30b)

\begin{matrix} = 1_{{θ^{⋆} > 0}} + {(- 1)}^{1_{{θ^{⋆} > 0}}} exp (\frac{1}{2} n {(θ^{⋆})}^{2} K_{Y}^{(2)} (θ) + n K_{Y}^{} (θ^{⋆}) - θ^{⋆} a) Q (| θ^{⋆} | \sqrt{n K_{Y}^{(2)} (θ^{⋆})}) \end{matrix}

(A30c)

\begin{matrix} = {\hat{F}}_{X_{n}} (a), \end{matrix}

(A30d)

where equality in (A30d) follows (12). Finally, plugging (A30d) in (29) yields

\begin{matrix} |F_{X_{n}} (a) - {\hat{F}}_{X_{n}} (a)| ⩽ exp (n K_{Y}^{} (θ^{⋆}) - θ^{⋆} a) min \{1, \frac{2 ξ_{Y} (θ^{⋆})}{\sqrt{n}}\} . \end{matrix}

(A31)

This completes the proof by observing that

\frac{a}{n} \in int C_{Y}

is equivalent to

a \in int C_{X_{n}}

.

Appendix D. Proof of Lemma A1

The left-hand side of (A7) satisfies

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| \\ = & {(φ_{Y} (θ))}^{n} |E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{\{S_{n, θ} \in A\}}] - E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| . \end{matrix}

(A32)

The focus is on obtaining explicit expressions for the terms

E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{\{S_{n, θ} \in A\}}]

and

E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]

in (A32). First, consider the case in which the random variable

S_{n, θ}

is absolutely continuous and denote its probability density function by

f_{S_{n, θ}}

and its CDF by

F_{S_{n, θ}}

. Then,

\begin{matrix} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{{S_{n, θ} \in A}}] & = & \int_{A} exp (- θ x) f_{S_{n, θ}} (x) d x . \end{matrix}

(A33)

Using integration by parts in (A33), under the assumption

(i)

or

(i i)

in Lemma A1, the following holds:

\begin{matrix} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{{S_{n, θ} \in A}}] = {(- 1)}^{1_{{θ > 0}}} exp (- θ a) F_{S_{n, θ}} (a) - \int_{A} θ exp (- θ x) F_{S_{n, θ}} (x) d x . \end{matrix}

(A34)

Second, consider the case in which the random variable

S_{n, θ}

is discrete and denote its probability mass function by

p_{S_{n, θ}}

and its CDF by

F_{S_{n, θ}}

. Let the support of

S_{n, θ}

be

{s_{0}

,

s_{1}

, …,

s_{ℓ}} \subset R

, with

ℓ \in N

. Assume that condition

(i)

in Lemma A1 is satisfied. Then,

A \cap {s_{0}, s_{1}, \dots, s_{l}} = {s_{0}, s_{1}, \dots, s_{u}},

(A35)

with

u ⩽ ℓ

, and

\begin{matrix} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{{S_{n, θ} \in A}}] \\ = \sum_{k = 0}^{u} exp (- θ s_{k}) p_{S_{n, θ}} (s_{k}) \end{matrix}

(A36a)

\begin{matrix} = F_{S_{n, θ}} (s_{0}) exp (- θ s_{0}) + \sum_{k = 1}^{u} (F_{S_{n, θ}} (s_{k}) - F_{S_{n, θ}} (s_{k - 1})) exp (- θ s_{k}) \end{matrix}

(A36b)

\begin{matrix} = \sum_{k = 0}^{u} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k}) - \sum_{k = 1}^{u} F_{S_{n, θ}} (s_{k - 1}) exp (- θ s_{k}) \end{matrix}

(A36c)

\begin{matrix} = \sum_{k = 0}^{u} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k}) - \sum_{k = 0}^{u - 1} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k + 1}) \end{matrix}

(A36d)

\begin{matrix} = F_{S_{n, θ}} (s_{u}) exp (- θ s_{u}) - \sum_{k = 0}^{u - 1} F_{S_{n, θ}} (s_{k}) (exp (- θ s_{k + 1}) - exp (- θ s_{k})) \end{matrix}

(A36e)

\begin{matrix} = F_{S_{n, θ}} (s_{u}) exp (- θ s_{u}) - \sum_{k = 0}^{u - 1} \int_{s_{k}}^{s_{k + 1}} θ exp (- θ t) F_{S_{n, θ}} (s_{k}) d t \end{matrix}

(A36f)

\begin{matrix} = F_{S_{n, θ}} (s_{u}) exp (- θ s_{u}) - \int_{s_{0}}^{s_{u}} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A36g)

\begin{matrix} = F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (a) exp (- θ a) + F_{S_{n, θ}} (s_{u}) exp (- θ s_{u}) - \int_{s_{0}}^{s_{u}} F_{S_{n, θ}} (t) θ exp (- θ t) d t \end{matrix}

(A36h)

\begin{matrix} = F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (s_{u}) exp (- θ a) + F_{S_{n, θ}} (s_{u}) exp (- θ s_{u}) - \int_{s_{0}}^{s_{u}} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A36i)

\begin{matrix} = F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (s_{u}) (exp (- θ a) - exp (- θ s_{u})) - \int_{s_{0}}^{s_{u}} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A36j)

\begin{matrix} = F_{S_{n, θ}} (a) exp (- θ a) - \int_{s_{u}}^{a} θ exp (- θ t) F_{S_{n, θ}} (s_{u}) d t - \int_{s_{0}}^{s_{u}} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A36k)

\begin{matrix} = exp (- θ a) F_{S_{n, θ}} (a) - \int_{s_{0}}^{a} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A36l)

\begin{matrix} = exp (- θ a) F_{S_{n, θ}} (a) - \int_{- \infty}^{a} θ exp (- θ t) F_{S_{n, θ}} (t) d t, \end{matrix}

(A36m)

which is an expression of the same form as the one in (A34). Alternatively, assume that condition

(i i)

in Lemma A1 holds. Then,

A \cap {s_{0}, s_{1}, \dots, s_{l}} = {s_{u}, s_{u + 1}, \dots, s_{l}},

(A37)

with

u ⩽ ℓ

, and

\begin{matrix} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{{S_{n, θ} \in A}}] \\ = \sum_{k = u}^{l} exp (- θ s_{k}) p_{S_{n, θ}} (s_{k}) \end{matrix}

(A38a)

\begin{matrix} = (F_{S_{n, θ}} (s_{u}) - F_{S_{n, θ}} (a)) exp (- θ s_{u}) + \sum_{k = u + 1}^{l} (F_{S_{n, θ}} (s_{k}) - F_{S_{n, θ}} (s_{k - 1})) exp (- θ s_{k}) \end{matrix}

(A38b)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ s_{u}) + \sum_{k = u}^{l} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k}) - \sum_{k = u + 1}^{l} F_{S_{n, θ}} (s_{k - 1}) exp (- θ s_{k}) \end{matrix}

(A38c)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ s_{u}) + \sum_{k = u}^{l} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k}) - \sum_{k = u}^{l - 1} F_{S_{n, θ}} (s_{k}) exp (- θ s_{k + 1}) \end{matrix}

(A38d)

\begin{matrix} = F_{S_{n, θ}} (s_{l}) exp (- θ s_{l}) - F_{S_{n, θ}} (a) exp (- θ s_{u}) - \sum_{k = u}^{l - 1} F_{S_{n, θ}} (s_{k}) (exp (- θ s_{k + 1}) - exp (- θ s_{k})) \end{matrix}

(A38e)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ s_{u}) - \int_{s_{l}}^{\infty} θ exp (- θ s_{t}) F_{S_{n, θ}} (s_{l}) d t - \sum_{k = u}^{l - 1} \int_{s_{k}}^{s_{k + 1}} θ exp (- θ t) F_{S_{n, θ}} (s_{k}) d t \end{matrix}

(A38f)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ s_{u}) - \int_{s_{u}}^{\infty} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A38g)

\begin{matrix} = F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (a) exp (- θ s_{u}) - \int_{s_{u}}^{\infty} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A38h)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ a) - F_{S_{n, θ}} (a) (exp (- θ s_{u}) - exp (- θ a)) - \int_{s_{u}}^{\infty} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A38i)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ a) - \int_{a}^{s_{u}} θ exp (- θ t) F_{S_{n, θ}} (a) d t - \int_{s_{u}}^{\infty} θ exp (- θ t) F_{S_{n, θ}} (t) d t \end{matrix}

(A38j)

\begin{matrix} = - F_{S_{n, θ}} (a) exp (- θ a) - \int_{a}^{\infty} θ exp (- θ t) F_{S_{n, θ}} (t) d t, \end{matrix}

(A38k)

which is an expression of the same form as those in (A34) and (A36m).

Note that, under the assumption that at least one of the conditions in Lemma A1 holds, the expressions in (A34), (A36m), and (A38k) can be jointly written as follows:

\begin{matrix} E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1_{{S_{n, θ} \in A}}] = {(- 1)}^{1_{{θ > 0}}} exp (- θ a) F_{S_{n, θ}} (a) - \int_{A} θ exp (- θ x) F_{S_{n, θ}} (x) d x . \end{matrix}

(A39)

The expression in (A39) does not involve particular assumptions on the random variable

S_{n, θ}

other than being discrete or absolutely continuous. Hence, the same expression holds with respect to the random variable

Z_{n, θ}

in (A32). More specifically,

\begin{matrix} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{{Z_{n, θ} \in A}}] = {(- 1)}^{1_{{θ > 0}}} exp (- θ a) F_{Z_{n, θ}} (a) - \int_{A} θ exp (- θ x) F_{Z_{n, θ}} (x) d x, \end{matrix}

(A40)

where

F_{Z_{n, θ}}

is the CDF of the random variable

Z_{n, θ}

.

The proof ends by plugging (A39) and (A40) into the right-hand side of (A32). This yields

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| \\ = {(φ_{Y} (θ))}^{n} | {(- 1)}^{1_{{θ > 0}}} exp (- θ a) F_{S_{n, θ}} (a) - \int_{A} θ exp (- θ x) F_{S_{n, θ}} (x) d x \\ - {(- 1)}^{1_{{θ > 0}}} exp (- θ a) F_{Z_{n, θ}} (a) + \int_{A} θ exp (- θ x) F_{Z_{n, θ}} (x) d x | \end{matrix}

(A41a)

\begin{matrix} = {(φ_{Y} (θ))}^{n} |{(- 1)}^{1_{{θ > 0}}} exp (- a) (F_{S_{n, θ}} (a) - F_{Z_{n, θ}} (a)) - \int_{A} θ exp (- θ x) (F_{S_{n, θ}} (x) - F_{Z_{n, θ}} (x)) d x| \end{matrix}

(A41b)

\begin{matrix} \leq {(φ_{Y} (θ))}^{n} (|exp (- θ a) (F_{S_{n, θ}} (a) - F_{Z_{n, θ}} (a))| + |\int_{A} θ exp (- θ x) (F_{S_{n, θ}} (x) - F_{Z_{n, θ}} (x)) d x|) \end{matrix}

(A41c)

\begin{matrix} \leq {(φ_{Y} (θ))}^{n} (exp (- θ a) Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) + \int_{A} |θ exp (- θ x)| Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) d x) \end{matrix}

(A41d)

\begin{matrix} = {(φ_{Y} (θ))}^{n} (exp (- θ a) Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) + Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) |\int_{A} θ exp (- θ x) d x|) \end{matrix}

(A41e)

\begin{matrix} = {(φ_{Y} (θ))}^{n} (exp (- θ a) Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) + Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) exp (- θ a)) \end{matrix}

(A41f)

\begin{matrix} = 2 \frac{{(φ_{Y} (θ))}^{n}}{exp (θ a)} Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}) . \end{matrix}

(A41g)

Finally, under the assumption that at least one of the conditions in Lemma A1 holds, then

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| \\ (A42a) & \leq & {(φ_{Y} (θ))}^{n} max (E_{P_{S_{n, θ}}} [exp (- θ S_{n, θ}) 1 \{S_{n, θ} \in A\}], E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1 \{Z_{n, θ} \in A\}]) \\ (A42b) & \leq & {(φ_{Y} (θ))}^{n} exp (- θ a) = \frac{{(φ_{Y} (θ))}^{n}}{exp (θ a)} . \end{matrix}

Under the same assumption, the expressions in (A41g) and (A42b) can be jointly written as follows:

\begin{matrix} |P_{X_{n}} (A) - {(φ_{Y} (θ))}^{n} E_{P_{Z_{n, θ}}} [exp (- θ Z_{n, θ}) 1_{\{Z_{n, θ} \in A\}}]| ⩽ \frac{{(φ_{Y} (θ))}^{n}}{exp (θ a)} min \{2 Δ (P_{S_{n, θ}}, P_{Z_{n, θ}}), 1\} . \end{matrix}

(A43b)

This concludes the proof of Lemma A1.

Appendix E. Proof of Lemma A2

In the case in which Y is discrete (

p_{Y}

,

p_{Y^{(θ)}}

,

p_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}

denote probability mass functions) or absolutely continuous random variables (

p_{Y}

,

p_{Y^{(θ)}}

,

p_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}

denote probability density functions), the following holds for all

(y_{1}, y_{2}, \dots, y_{n}) \in R^{n}

,

\begin{matrix} \frac{d P_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}}}{d P_{Y_{1} Y_{2} \dots Y_{n}}} (y_{1}, y_{2}, \dots, y_{n}) & = & \frac{p_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}} (y_{1}, y_{2}, \dots, y_{n})}{\prod_{j = 1}^{n} p_{Y} (y_{j})}, \end{matrix}

(A44)

and for all

y \in R

,

\begin{matrix} \frac{d P_{Y^{(θ)}}}{d P_{Y}} (y) & = & \frac{p_{Y^{(θ)}} (y)}{p_{Y} (y)} . \end{matrix}

(A45)

Equating the right-hand side of both (A1) and (A44), it yields for all

(y_{1}, y_{2}, \dots, y_{n}) \in R^{n}

\begin{matrix} p_{Y_{1}^{(θ)} Y_{2}^{(θ)} \dots Y_{n}^{(θ)}} (y_{1}, y_{2}, \dots, y_{n}) & = \prod_{j = 1}^{n} \frac{exp (θ y_{j})}{φ_{Y} (θ)} p_{Y} (y_{j}) . \end{matrix}

(A46)

Hence,

Y_{1}^{(θ)}

,

Y_{2}^{(θ)}

, …,

Y_{n}^{(θ)}

are mutually independent and identically distributed. Moreover, for all

y \in R

,

\begin{matrix} p_{Y^{(θ)}} (y) & = & \frac{exp (θ y)}{φ_{Y} (θ)} p_{Y} (y) . \end{matrix}

(A47)

Finally, plugging (A47) in (A45) yields, for all

y \in R

,

\begin{matrix} \frac{d P_{Y^{(θ)}}}{d P_{Y}} (y) & = & \frac{exp (θ y)}{φ_{Y} (θ)}, \end{matrix}

(A48)

which completes the proof.

Appendix F. Proof of Theorem 5

For a fixed product probability input distribution

P_{X}

in (53) and for the random transformation in (44), the upper bound

T (n, M, P_{X})

in (51) can be written in the form of a weighted sum of the CDF and the complementary CDF of the random variables variables

W_{n}

and

V_{n}

that are sums of i.i.d random variables, respectively. That is,

\begin{matrix} W_{n} & = & \sum_{t = 1}^{n} ι (X_{t}; Y_{t}), and \end{matrix}

(A49)

\begin{matrix} V_{n} & = & \sum_{t = 1}^{n} ι ({\bar{X}}_{t}; Y_{t}), \end{matrix}

(A50)

where

(X_{t}, Y_{t}) \sim P_{X} P_{Y | X}

and

({\bar{X}}_{t}, Y_{t}) \sim P_{\bar{X}} P_{Y}

with

P_{X} = P_{\bar{X}}

. More specifically, the function T in (51) can be rewritten in the form

\begin{matrix} T (n, M, P_{X}) & = & F_{W_{n}} (\ln (\frac{M - 1}{2})) + \frac{M - 1}{2} (1 - F_{V_{n}} (\ln (\frac{M - 1}{2}))), \end{matrix}

(A51)

where

F_{W_{n}}

and

F_{V_{n}}

are the CDFs of

W_{n}

and

V_{n}

, respectively.

The next step derives the upper and lower bounds on

F_{W_{n}} (\ln (\frac{M - 1}{2}))

and

1 - F_{V_{n}} (\ln (\frac{M - 1}{2}))

by using the result of Theorem 3. That is,

\begin{matrix} F_{W_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

\begin{matrix} ⩽ ζ_{ι (X; Y)} (θ, \ln (\frac{M - 1}{2}), n) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\}, \\ F_{W_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

(A52)

\begin{matrix} ⩾ ζ_{ι (X; Y)} (θ, \ln (\frac{M - 1}{2}), n) - exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\}, \\ 1 - F_{V_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

(A53)

\begin{matrix} ⩽ 1 - ζ_{ι (\bar{X}; Y)} (θ, \ln (\frac{M - 1}{2}), n) + exp (n \ln (φ_{ι (\bar{X}; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (\bar{X}; Y)} (θ)}{\sqrt{n}}\}, and \\ 1 - F_{V_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

(A54)

\begin{matrix} ⩾ 1 - ζ_{ι (\bar{X}; Y)} (θ, \ln (\frac{M - 1}{2}), n) - exp (n \ln (φ_{ι (\bar{X}; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (\bar{X}; Y)} (θ)}{\sqrt{n}}\}, \end{matrix}

(A55)

where

θ

and

τ

satisfy

\begin{matrix} n μ_{ι (X; Y)} (θ) & = & \ln (\frac{M - 1}{2}) = n μ_{ι (\bar{X}; Y)} (τ), \end{matrix}

(A56)

and for all

t \in R

,

\begin{matrix} φ_{ι (X; Y)} (t) & = & E_{P_{X} P_{Y | X}} [exp (t ι (X; Y))], \end{matrix}

(A57)

\begin{matrix} φ_{ι (\bar{X}; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [exp (t ι (\bar{X}; Y))], \end{matrix}

(A58)

\begin{matrix} ξ_{ι (X; Y)} (t) & = & c_{1} (\frac{E_{P_{X} P_{Y | X}} [{|ι (X; Y) - μ_{ι (X; Y)} (t)|}^{3} \frac{exp (t ι (X; Y))}{φ_{ι (X; Y)} (t)}]}{{(V_{ι (X; Y)} (t))}^{3 / 2}} + c_{2}), \end{matrix}

(A59)

\begin{matrix} ξ_{ι (\bar{X}; Y)} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} P_{Y}} [{|ι (\bar{X}; Y) - μ_{ι (\bar{X}; Y)} (t)|}^{3} \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t)}]}{{(V_{ι (\bar{X}; Y)} (t))}^{3 / 2}} + c_{2}), \end{matrix}

(A60)

\begin{matrix} μ_{ι (X; Y)} (t) & = & E_{P_{X} P_{Y | X}} [ι (X; Y) \frac{exp (t ι (X; Y))}{φ_{ι (X; Y)} (t)}], \end{matrix}

(A61)

\begin{matrix} μ_{ι (\bar{X}; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [ι (\bar{X}; Y) \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t)}], \end{matrix}

(A62)

\begin{matrix} V_{ι (X; Y)} (t) & = & E_{P_{X} P_{Y | X}} [{(ι (X; Y) - μ_{ι (X; Y)} (t))}^{2} \frac{exp (t ι (X; Y))}{φ_{ι (X; Y)} (t)}], \end{matrix}

(A63)

\begin{matrix} V_{ι (\bar{X}; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [{(ι (\bar{X}; Y) - μ_{ι (\bar{X}; Y)} (t))}^{2} \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t)}], \end{matrix}

(A64)

with

c_{1}

and

c_{2}

defined in (23); and for all

(t, a, n) \in R^{2} \times N

\begin{matrix} ζ_{ι (X; Y)} (t, a, n) \end{matrix}

\begin{matrix} ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{ι (X; Y)} (t) + n \ln (φ_{ι (X; Y)} (t)) - t a) Q (| t | \sqrt{n V_{ι (X; Y)} (t)}), \\ ζ_{ι (\bar{X}; Y)} (t, a, n) \end{matrix}

(A65)

\begin{matrix} ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{ι (\bar{X}; Y)} (t) + n \ln (φ_{ι (\bar{X}; Y)} (t)) - t a) Q (| t | \sqrt{n V_{ι (\bar{X}; Y)} (t)}) . \end{matrix}

(A66)

The next step simplifies the expressions on the right hand-side of (A54) and (A55) by studying the relation between

φ_{ι (X; Y)}

and

φ_{ι (\bar{X}; Y)}

,

θ

and

τ

,

V_{ι (X; Y)}

and

V_{ι (\bar{X}; Y)}

,

ξ_{ι (X; Y)}

and

ξ_{ι (\bar{X}; Y)}

.

First, from (A57), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} P_{Y}

because

P_{X} P_{Y | X}

is absolutely continuous with respect to

P_{\bar{X}} P_{Y}

, it holds that

\begin{matrix} φ_{ι (X; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [\frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} P_{Y}} (\bar{X}; Y) exp (t ι (\bar{X}; Y))] \end{matrix}

(A67)

\begin{matrix} = & E_{P_{\bar{X}} P_{Y}} [exp ((t + 1) ι (\bar{X}; Y))] . \end{matrix}

(A68)

Then, from (A57) and (A58), it holds that

\begin{matrix} φ_{ι (X; Y)} (t) & = & φ_{ι (\bar{X}; Y)} (t + 1) . \end{matrix}

(A69)

This concludes the relation between

φ_{ι (X; Y)}

and

φ_{ι (\bar{X}; Y)}

.

Second, from (A61), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} P_{Y}

, it holds that

\begin{matrix} μ_{ι (X; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [ι (\bar{X}; Y) \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} P_{Y}} (\bar{X}; Y)] \end{matrix}

(A70)

\begin{matrix} = & E_{P_{\bar{X}} P_{Y}} [ι (\bar{X}; Y) \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)}] . \end{matrix}

(A71)

Then, from (A69) and (A71), it holds that

\begin{matrix} μ_{ι (X; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [ι (\bar{X}; Y) \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t + 1)}] . \end{matrix}

(A72)

From (A62) and (A72), it holds that

\begin{matrix} μ_{ι (X; Y)} (t) & = & μ_{ι (\bar{X}; Y)} (t + 1) . \end{matrix}

(A73)

This concludes the relation between

μ_{ι (X; Y)}

and

μ_{ι (\bar{X}; Y)}

.

Third, from (A56) and (A73), it holds that

\begin{matrix} τ = θ + 1 . \end{matrix}

(A74)

This concludes the relation between

τ

and

θ

.

Fourth, from (A63), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} P_{Y}

, it holds that

\begin{matrix} V_{ι (X; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [{(ι (\bar{X}; Y) - μ_{ι (X; Y)} (t))}^{2} \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} P_{Y}} (\bar{X}; Y)] \end{matrix}

(A75)

\begin{matrix} = & E_{P_{\bar{X}} P_{Y}} [{(ι (\bar{X}; Y) - μ_{ι (X; Y)} (t))}^{2} \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)}] . \end{matrix}

(A76)

From (A69), (A73), and (A76), it holds that

\begin{matrix} V_{ι (X; Y)} (t) & = & E_{P_{\bar{X}} P_{Y}} [{(ι (\bar{X}; Y) - μ_{ι (\bar{X}; Y)} (t + 1))}^{2} \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t + 1)}] . \end{matrix}

(A77)

From (A64) and (A77), it holds that

\begin{matrix} V_{ι (X; Y)} (t) & = & V_{ι (\bar{X}; Y)} (t + 1) . \end{matrix}

(A78)

This concludes the relation between

V_{ι (X; Y)}

and

V_{ι (\bar{X}; Y)}

.

Fifth, from (A59), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} P_{Y}

, it holds that

\begin{matrix} ξ_{ι (X; Y)} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} P_{Y}} [{|ι (\bar{X}; Y) - μ_{ι (X; Y)} (t)|}^{3} \frac{exp (t ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} P_{Y}} (\bar{X}; Y)]}{{(V_{ι (X; Y)} (t))}^{3 / 2}} + c_{2}) \end{matrix}

(A79)

\begin{matrix} = & c_{1} (\frac{E_{P_{\bar{X}} P_{Y}} [{|ι (\bar{X}; Y) - μ_{ι (X; Y)} (t)|}^{3} \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (X; Y)} (t)}]}{{(V_{ι (X; Y)} (t))}^{3 / 2}} + c_{2}) . \end{matrix}

(A80)

From (A69), (A73), (A78), and (A80), it holds that

\begin{matrix} ξ_{ι (X; Y)} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} P_{Y}} [{|ι (\bar{X}; Y) - μ_{ι (\bar{X}; Y)} (t + 1)|}^{3} \frac{exp ((t + 1) ι (\bar{X}; Y))}{φ_{ι (\bar{X}; Y)} (t + 1)}]}{{(V_{ι (\bar{X}; Y)} (t + 1))}^{3 / 2}} + c_{2}) . \end{matrix}

(A81)

From (A60) and (A81), it holds that

\begin{matrix} ξ_{ι (X; Y)} (t) & = & ξ_{ι (\bar{X}; Y)} (t + 1) . \end{matrix}

(A82)

This concludes the relation between

ξ_{ι (X; Y)}

and

ξ_{ι (\bar{X}; Y)}

.

Sixth, plugging (A69), (A73), and (A78) into (A65), for all

t \in R

, it holds that

\begin{matrix} ζ_{ι (\bar{X}; Y)} (t, a, n) \\ ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{ι (X; Y)} (t - 1) + n \ln (φ_{ι (X; Y)} (t - 1)) - t a) Q (| t | \sqrt{n V_{ι (X; Y)} (t - 1)}) . \end{matrix}

(A83)

Then, from (67) and (A83), it holds that

\begin{matrix} ζ_{ι (\bar{X}; Y)} (t, \ln (\frac{M - 1}{2}), n) = 1 - β_{2} (n, M, t - 1, P_{X}) . \end{matrix}

(A84)

Then, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A54), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

\begin{matrix} ⩽ β_{2} (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - (θ + 1) \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\} \end{matrix}

(A85)

\begin{matrix} ⩽ β_{2} (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - (θ + 1) \ln (\frac{M - 1}{2})) \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}} . \end{matrix}

(A86)

Alternatively, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A55), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

\begin{matrix} ⩾ β_{2} (n, M, θ, P_{X}) - exp (n \ln (φ_{ι (X; Y)} (θ)) - (θ + 1) \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\} \end{matrix}

(A87)

\begin{matrix} ⩾ β_{2} (n, M, θ, P_{X}) - exp (n \ln (φ_{ι (X; Y)} (θ)) - (θ + 1) \ln (\frac{M - 1}{2})) \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}} \end{matrix}

(A88)

\begin{matrix} = G_{2} (n, M, θ, P_{X}), \end{matrix}

(A89)

where the equality in (A89) follows from (69). Observing that

1 - F_{V_{n}}

is a positive function, then from (A88), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (\frac{M - 1}{2})) & ⩾ & max \{0, G_{2} (n, M, θ, P_{X})\} . \end{matrix}

(A90)

Seventh, from (66) and (A65), it holds that

\begin{matrix} ζ_{ι (X; Y)} (t, \ln (\frac{M - 1}{2}), n) = β_{1} (n, M, t, P_{X}) . \end{matrix}

(A91)

Then, plugging (A69), (A73), (A74), (A78), (A82), and (A91) into the right hand-side of (A52), it holds that

\begin{matrix} F_{W_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

\begin{matrix} ⩽ β_{1} (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\} \end{matrix}

(A92)

\begin{matrix} ⩽ β_{1} (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}} . \end{matrix}

(A93)

Alternatively, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A53), it holds that

\begin{matrix} F_{W_{n}} (\ln (\frac{M - 1}{2})) \end{matrix}

\begin{matrix} ⩾ β_{1} (n, M, θ, P_{X}) - exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) min \{1, \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\} \end{matrix}

(A94)

\begin{matrix} ⩾ β_{1} (n, M, θ, P_{X}) - exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) \frac{2 ξ_{ι (X; Y)} (θ)}{\sqrt{n}} \end{matrix}

(A95)

\begin{matrix} = G_{1} (n, M, θ, P_{X}), \end{matrix}

(A96)

where the equality in (A96) follows from (68). Observing that

F_{W_{n}}

is a positive function, then, from (A95), it holds that

\begin{matrix} F_{W_{n}} (\ln (\frac{M - 1}{2})) ⩾ max \{0, G_{1} (n, M, θ, P_{X})\} . \end{matrix}

(A97)

Finally, plugging (A86) and (A93) in (A51), it holds that

\begin{matrix} T (n, M, P_{X}) \end{matrix}

\begin{matrix} ⩽ β_{1} (n, M, θ, P_{X}) + \frac{M - 1}{2} β_{2} (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) \frac{4 ξ_{ι (X; Y)} (θ)}{\sqrt{n}} \end{matrix}

(A98)

\begin{matrix} = β (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) \frac{4 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}, \end{matrix}

(A99)

where the equality in (A99) follows from (72). Observing that

T (n, M, P_{X}) ⩽ 1

, from (A99), it holds that

\begin{matrix} T (n, M, P_{X}) & ⩽ & min \{1, β (n, M, θ, P_{X}) + exp (n \ln (φ_{ι (X; Y)} (θ)) - θ \ln (\frac{M - 1}{2})) \frac{4 ξ_{ι (X; Y)} (θ)}{\sqrt{n}}\} \end{matrix}

(A100)

\begin{matrix} = & S (n, M, θ, P_{X}), \end{matrix}

(A101)

where the equality in (A96) follows from (71).

Alternatively, plugging (A89) and (A96) in (A51), it holds that

\begin{matrix} T (n, M, P_{X}) & ⩾ & max \{0, G_{1} (n, M, θ, P_{X})\} + \frac{M - 1}{2} max \{0, G_{2} (n, M, θ, P_{X})\} \end{matrix}

(A102)

\begin{matrix} = & G (n, M, θ, P_{X}), \end{matrix}

(A103)

where the equality in (A96) follows from (71). Combining (A101) and (A103) concludes the proof.

Appendix G. Proof of Theorem 7

Note that, for given distributions

P_{X}

subject (53),

Q_{Y}

subject to (81), and for a random transformation in (43) subject to (44), the lower bound

C (n

,M,

P_{X}

,

Q_{Y}

,

γ)

in (76) can be written in the form of a weighted sum of the CDF and the complementary CDF of the random variables variables

W_{n}

and

V_{n}

that are sums of i.i.d random variables, respectively. That is,

\begin{matrix} W_{n} & = & \sum_{t = 1}^{n} \tilde{ι} (X_{t}; Y_{t} | Q_{Y}), \end{matrix}

(A104)

\begin{matrix} V_{n} & = & \sum_{t = 1}^{n} \tilde{ι} ({\bar{X}}_{t}; Y_{t} | Q_{Y}), \end{matrix}

(A105)

where

(X_{t}, Y_{t}) \sim P_{X} P_{Y | X}

and

({\bar{X}}_{t}, Y_{t}) \sim P_{\bar{X}} Q_{Y}

with

P_{X} = P_{\bar{X}}

. More specifically, the function C in (76) can be written in the form

\begin{matrix} C (n, M, P_{X}, Q_{Y}, γ) & = & F_{W_{n}} (\ln (γ)) + γ (1 - F_{V_{n}} (\ln (γ))) - \frac{γ}{M}, \end{matrix}

(A106)

where

F_{W_{n}}

and

F_{V_{n}}

are the CDFs of the random variables

W_{n}

and

V_{n}

, respectively.

The next step derives the upper and lower bounds on

F_{W_{n}} (\ln (γ))

and

1 - F_{V_{n}} (\ln (γ))

by using the result of Theorem 3. That is,

\begin{matrix} F_{W_{n}} (\ln (γ)) ⩽ ζ_{\tilde{ι} (X; Y | Q_{Y})} (θ, \ln (γ), n) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\}, \end{matrix}

(A107)

\begin{matrix} F_{W_{n}} (\ln (γ)) ⩾ ζ_{\tilde{ι} (X; Y | Q_{Y})} (θ, \ln (γ), n) - exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\}, \end{matrix}

(A108)

\begin{matrix} 1 - F_{V_{n}} (\ln (γ)) ⩽ 1 - ζ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ, \ln (γ), n) + exp (n \ln (φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ)}{\sqrt{n}}\}, \end{matrix}

(A109)

and

\begin{matrix} 1 - F_{V_{n}} (\ln (γ)) ⩾ 1 - ζ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ, \ln (γ), n) - exp (n \ln (φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (θ)}{\sqrt{n}}\}, \end{matrix}

(A110)

where

θ

and

τ

satisfy

\begin{matrix} n μ_{\tilde{ι} (X; Y | Q_{Y})} (θ) & = & \ln (γ) = n μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (τ), \end{matrix}

(A111)

and for all

t \in R

\begin{matrix} φ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{X} P_{Y | X}} [exp (t \tilde{ι} (X; Y | Q_{Y}))], \end{matrix}

(A112)

\begin{matrix} φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [exp (t \tilde{ι} (\bar{X}; Y | Q_{Y}))], \end{matrix}

(A113)

\begin{matrix} ξ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & c_{1} (\frac{E_{P_{X} P_{Y | X}} [{|\tilde{ι} (X; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t)|}^{3} \frac{exp (t \tilde{ι} (X; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}]}{{(V_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{3 / 2}} + c_{2}), \end{matrix}

(A114)

\begin{matrix} ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} Q_{Y}} [{|\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)|}^{3} \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)}]}{{(V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t))}^{3 / 2}} + c_{2}), \end{matrix}

(A115)

\begin{matrix} μ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{X} P_{Y | X}} [\tilde{ι} (X; Y | Q_{Y}) \frac{exp (t \tilde{ι} (X; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}], \end{matrix}

(A116)

\begin{matrix} μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [\tilde{ι} (\bar{X}; Y | Q_{Y}) \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)}], \end{matrix}

(A117)

\begin{matrix} V_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{X} P_{Y | X}} [{(\tilde{ι} (X; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{2} \frac{exp (t \tilde{ι} (X; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}], \end{matrix}

(A118)

\begin{matrix} V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [{(\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t))}^{2} \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)}], \end{matrix}

(A119)

with

c_{1}

and

c_{2}

defined in (23); and for all

(t, a, n) \in R^{2} \times N

\begin{matrix} ζ_{\tilde{ι} (X; Y | Q_{Y})} (t, a, n) \end{matrix}

\begin{matrix} ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{\tilde{ι} (X; Y | Q_{Y})} (t) + n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (t)) - t a) Q (| t | \sqrt{n V_{\tilde{ι} (X; Y | Q_{Y})} (t)}), \\ ζ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t, a, n) \end{matrix}

(A120)

\begin{matrix} ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t) + n \ln (φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)) - t a) Q (| t | \sqrt{n V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t)}) . \end{matrix}

(A121)

The next step simplifies the expressions on the right hand-side of (A109) and (A110) by studying the relation between

φ_{\tilde{ι} (X; Y | Q_{Y})}

and

φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

,

θ

and

τ

,

V_{\tilde{ι} (X; Y | Q_{Y})}

and

V_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

,

ξ_{\tilde{ι} (X; Y | Q_{Y})}

and

ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

when the

P_{Y | X}

is absolutely continuous with respect to

Q_{Y}

.

First, from (A112), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} Q_{Y}

because

P_{X} P_{Y | X}

is absolutely continuous with respect to

P_{\bar{X}} Q_{Y}

, it holds that

\begin{matrix} φ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [\frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} Q_{Y}} (\bar{X}; Y) exp (t \tilde{ι} (\bar{X}; Y | Q_{Y})] \end{matrix}

(A122)

\begin{matrix} = & E_{P_{\bar{X}} Q_{Y}} [exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))] . \end{matrix}

(A123)

Then, from (A112) and (A113), it holds that

\begin{matrix} φ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1) . \end{matrix}

(A124)

This concludes the relation between

φ_{\tilde{ι} (X; Y | Q_{Y})}

and

φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

.

Second, from (A116), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} Q_{Y}

, it holds that

\begin{matrix} μ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [\tilde{ι} (\bar{X}; Y | Q_{Y}) \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y})}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} Q_{Y}} (\bar{X}; Y)] \end{matrix}

(A125)

\begin{matrix} = & E_{P_{\bar{X}} Q_{Y}} [\tilde{ι} (\bar{X}; Y | Q_{Y}) \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}] . \end{matrix}

(A126)

Then, from (A124) and (A126), it holds that

\begin{matrix} μ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [\tilde{ι} (\bar{X}; Y | Q_{Y}) \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1)}] . \end{matrix}

(A127)

From (A117) and (A127), it holds that

\begin{matrix} μ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1) . \end{matrix}

(A128)

This concludes the relation between

μ_{\tilde{ι} (X; Y | Q_{Y})}

and

μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

.

Third, from (A111) and (A128), it holds that

\begin{matrix} τ = θ + 1 . \end{matrix}

(A129)

This concludes the relation between

τ

and

θ

.

Fourth, from (A118), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} Q_{Y}

, it holds that

\begin{matrix} V_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [{(\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{2} \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y})}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} Q_{Y}} (\bar{X}; Y)] \end{matrix}

(A130)

\begin{matrix} = & E_{P_{\bar{X}} Q_{Y}} [{(\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{2} \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}] . \end{matrix}

(A131)

From (A124), (A128), and (A131), it holds that

\begin{matrix} V_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & E_{P_{\bar{X}} Q_{Y}} [{(\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1))}^{2} \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1)}] . \end{matrix}

(A132)

From (A119) and (A132), it holds that

\begin{matrix} V_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1) . \end{matrix}

(A133)

This concludes the relation between

V_{\tilde{ι} (X; Y | Q_{Y})}

and

V_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

.

Fifth, from (A114), using the change of measure from

P_{X} P_{Y | X}

to

P_{\bar{X}} Q_{Y}

, it holds that

\begin{matrix} ξ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} Q_{Y}} [{|\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t)|}^{3} \frac{exp (t \tilde{ι} (\bar{X}; Y | Q_{Y})}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)} \frac{d P_{X} P_{Y | X}}{d P_{\bar{X}} Q_{Y}} (\bar{X}; Y)]}{{(V_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{3 / 2}} + c_{2}) \end{matrix}

(A134)

\begin{matrix} = & c_{1} (\frac{E_{P_{\bar{X}} Q_{Y}} [{|\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (X; Y | Q_{Y})} (t)|}^{3} \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (X; Y | Q_{Y})} (t)}]}{{(V_{\tilde{ι} (X; Y | Q_{Y})} (t))}^{3 / 2}} + c_{2}) . \end{matrix}

(A135)

From (A124), (A128), (A133), and (A135), it holds that

\begin{matrix} ξ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & c_{1} (\frac{E_{P_{\bar{X}} Q_{Y}} [{|\tilde{ι} (\bar{X}; Y | Q_{Y}) - μ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1)|}^{3} \frac{exp ((t + 1) \tilde{ι} (\bar{X}; Y | Q_{Y}))}{φ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1)}]}{{(V_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1))}^{3 / 2}} + c_{2}) . \end{matrix}

(A136)

From (A115) and (A136), it holds that

\begin{matrix} ξ_{\tilde{ι} (X; Y | Q_{Y})} (t) & = & ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t + 1) . \end{matrix}

(A137)

This concludes the relation between

ξ_{\tilde{ι} (X; Y | Q_{Y})}

and

ξ_{\tilde{ι} (\bar{X}; Y | Q_{Y})}

.

Sixth, plugging (A124), (A128), and (A133) into (A120), for all

t \in R

, it holds that

\begin{matrix} ζ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t, a, n) \\ ≜ 1_{{t > 0}} + {(- 1)}^{1_{{t > 0}}} exp (\frac{1}{2} n t^{2} V_{\tilde{ι} (X; Y | Q_{Y})} (t - 1) + n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (t - 1)) - t a) Q (| t | \sqrt{n V_{\tilde{ι} (X; Y | Q_{Y})} (t - 1)}) . \end{matrix}

(A138)

Then, from (95) and (A138), it holds that

\begin{matrix} ζ_{\tilde{ι} (\bar{X}; Y | Q_{Y})} (t, \ln (γ), n) = 1 - {\tilde{β}}_{2} (n, γ, t - 1, P_{X}, Q_{Y}) . \end{matrix}

(A139)

Then, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A109), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (γ)) \end{matrix}

\begin{matrix} ⩽ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - (θ + 1) \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\} \end{matrix}

(A140)

\begin{matrix} ⩽ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - (θ + 1) \ln (γ)) \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}} . \end{matrix}

(A141)

Alternatively, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A110), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (γ)) \end{matrix}

\begin{matrix} ⩾ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) - exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - (θ + 1) \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\} \end{matrix}

(A142)

\begin{matrix} ⩾ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) - exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - (θ + 1) \ln (γ)) \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}} \end{matrix}

(A143)

\begin{matrix} = {\tilde{G}}_{2} (n, γ, θ, P_{X}, Q_{Y}), \end{matrix}

(A144)

where the equality in (A144) follows from (97). Observing that

1 - F_{V_{n}}

is a positive function, then, from (A143), it holds that

\begin{matrix} 1 - F_{V_{n}} (\ln (γ)) ⩾ max \{0, {\tilde{G}}_{2} (n, γ, θ, P_{X}, Q_{Y})\} . \end{matrix}

(A145)

Seventh, from (94) and (A120), it holds that

\begin{matrix} ζ_{\tilde{ι} (X; Y | Q_{Y})} (t, \ln (γ), n) = {\tilde{β}}_{1} (n, γ, t, P_{X}, Q_{Y}) . \end{matrix}

(A146)

Then, plugging (A124), (A128), (A129), (A133), (A137), and (A146) into the right hand-side of (A107), it holds that

\begin{matrix} F_{W_{n}} (\ln (γ)) \end{matrix}

\begin{matrix} ⩽ {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\} \end{matrix}

(A147)

\begin{matrix} ⩽ {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}} . \end{matrix}

(A148)

Alternatively, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A108), it holds that

\begin{matrix} F_{W_{n}} (\ln (γ)) \end{matrix}

\begin{matrix} ⩾ {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) - exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) min \{1, \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\} \end{matrix}

(A149)

\begin{matrix} ⩾ {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) - exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) \frac{2 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}} \end{matrix}

(A150)

\begin{matrix} = {\tilde{G}}_{1} (n, γ, θ, P_{X}, Q_{Y}), \end{matrix}

(A151)

where the equality in (A151) follows from (96). Observing that

F_{W_{n}}

is a positive function, then from (A150), it holds that

\begin{matrix} F_{W_{n}} (\ln (γ)) ⩾ max \{0, {\tilde{G}}_{1} (n, γ, θ, P_{X}, Q_{Y})\} . \end{matrix}

(A152)

Finally, plugging (A141) and (A148) in (A106), it holds that

\begin{matrix} C (n, M, P_{X}, Q_{Y}, γ) \\ ⩽ {\tilde{β}}_{1} (n, γ, θ, P_{X}, Q_{Y}) + γ {\tilde{β}}_{2} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) \frac{4 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}} - \frac{γ}{M} \end{matrix}

(A153)

\begin{matrix} = \tilde{β} (n, γ, θ, P_{X}, Q_{Y}, M) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) \frac{4 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}, \end{matrix}

(A154)

where the equality in (A150) follows from (100). Observing that

C (n, M, P_{X}, Q_{Y}, γ) + \frac{γ}{M} ⩽ 1

, from (A153), it holds that

\begin{matrix} C (n, M, P_{X}, Q_{Y}, γ) \end{matrix}

\begin{matrix} ⩽ min \{1, \tilde{β} (n, γ, θ, P_{X}, Q_{Y}) + exp (n \ln (φ_{\tilde{ι} (X; Y | Q_{Y})} (θ)) - θ \ln (γ)) \frac{4 ξ_{\tilde{ι} (X; Y | Q_{Y})} (θ)}{\sqrt{n}}\} \end{matrix}

(A155)

\begin{matrix} = \tilde{S} (n, γ, θ, P_{X}, Q_{Y}, M), \end{matrix}

(A156)

where (A156) follows from (99).

Alternatively, plugging (A144) and (A151) in (A106), it holds that

\begin{matrix} C (n, M, P_{X}, Q_{Y}, γ) & ⩾ & max \{0, {\tilde{G}}_{1} (n, γ, θ, P_{X}, Q_{Y})\} + γ max \{0, {\tilde{G}}_{2} (n, γ, θ, P_{X}, Q_{Y})\} - \frac{γ}{M} \end{matrix}

(A157)

\begin{matrix} = & \tilde{G} (n, γ, θ, P_{X}, Q_{Y}, M), \end{matrix}

(A158)

where the equality in (A158) follows from (98). Combining (A156) and (A158) concludes the proof.

References

Zolotarev, V.M. One-Dimensional Stable Distributions; American Mathematical Society: Providence, RI, USA, 1986. [Google Scholar]
Jensen, J.L. Saddlepoint Approximations; Clarendon Press—Oxford: New York, NY, USA, 1995. [Google Scholar]
Feller, W. An Introduction to Probability Theory and Its Applications, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1971; Volume 2. [Google Scholar]
Daniels, H.E. Saddlepoint Approximations in Statistics. Ann. Math. Stat. 1954, 25, 631–650. [Google Scholar] [CrossRef]
Daniels, H.E. Exact saddlepoint approximations. Biometrika 1980, 67, 59–63. [Google Scholar] [CrossRef]
Daniels, H.E. Tail Probability Approximations. Int. Stat. Rev./Rev. Int. Stat. 1987, 55, 37–48. [Google Scholar] [CrossRef]
Temme, N. The Uniform Asymptotic Expansion of a Class of Integrals Related to Cumulative Distribution Functions. SIAM J. Math. Anal. 1982, 13, 239–253. [Google Scholar] [CrossRef] [Green Version]
Lugannani, R.; Rice, S. Saddle Point Approximation for the Distribution of the Sum of Independent Random Variables. Adv. Appl. Probab. 1980, 12, 475–490. [Google Scholar] [CrossRef]
Esscher, F. On the probability function in the collective theory of risk. Skand. Aktuarie Tidskr. 1932, 15, 175–195. [Google Scholar]
Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
MolavianJazi, E.; Laneman, J.N. A Second-Order Achievable Rate Region for Gaussian Multi-Access Channels via a Central Limit Theorem for Functions. IEEE Trans. Inf. Theory 2015, 61, 6719–6733. [Google Scholar] [CrossRef]
MolavianJazi, E. A Unified Approach to Gaussian Channels with Finite Blocklength. Ph.D. Thesis, University of Notre Dame, Notre Dame, Indiana, 2014. [Google Scholar]
Font-Segura, J.; Vazquez-Vilar, G.; Martinez, A.; Guillén i Fàbregas, A.; Lancho, A. Saddlepoint approximations of lower and upper bounds to the error probability in channel coding. In Proceedings of the 52nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
Martinez, A.; Scarlett, J.; Dalai, M.; Guillén i Fàbregas, A. A complex-integration approach to the saddlepoint approximation for random-coding bounds. In Proceedings of the 11th International Symposium on Wireless Communications Systems (ISWCS), Barcelona, Spain, 26–29 August 2014; pp. 618–621. [Google Scholar]
Shevtsova, I. On the absolute constants in the Berry–Esseen type inequalities for identically distributed summands. arXiv 2011, arXiv:1111.6554. [Google Scholar]

Figure 1. Sum of 100 Bernoulli random variables with parameter

p = 0.2

. The function

F_{X_{100}} (a)

(asterisk markers *) in Example 1; the function

F_{Z_{100}} (a)

(star markers ⋆) in (25); the function

{\hat{F}}_{X_{100}} (a)

(diamond markers ⋄) in (12); the function

\bar{Σ} (a, 100)

(circle marker ∘) in (26); the function

\underset{̲}{Σ} (a, 100)

(square marker □) in (27); the function

\bar{Ω} (a, 100)

(upward-pointing triangle marker ▵) in (41); and the function

\underset{̲}{Ω} (a, 100)

(downward-pointing triangle marker ▿) in (42) are plotted as functions of a, with

a \in [5, 35]

.

Figure 1. Sum of 100 Bernoulli random variables with parameter

p = 0.2

. The function

F_{X_{100}} (a)

(asterisk markers *) in Example 1; the function

F_{Z_{100}} (a)

(star markers ⋆) in (25); the function

{\hat{F}}_{X_{100}} (a)

(diamond markers ⋄) in (12); the function

\bar{Σ} (a, 100)

(circle marker ∘) in (26); the function

\underset{̲}{Σ} (a, 100)

(square marker □) in (27); the function

\bar{Ω} (a, 100)

(upward-pointing triangle marker ▵) in (41); and the function

\underset{̲}{Ω} (a, 100)

(downward-pointing triangle marker ▿) in (42) are plotted as functions of a, with

a \in [5, 35]

.

Figure 2. Sum of 50 Chi-squared random variables with parameter

k = 1

. The function

F_{X_{50}} (a)

(asterisk markers *) in Example 2; the function

F_{Z_{50}} (a)

(star markers ⋆) in (25); the function

{\hat{F}}_{X_{50}} (a)

(diamond markers ⋄) in (12); the function

\bar{Σ} (a, 50)

(circle marker ∘) in (26); the function

\underset{̲}{Σ} (a, 50)

(square marker □) in (27); the function

\bar{Ω} (a, 50)

(upward-pointing triangle marker ▵) in (41); and the function

\underset{̲}{Ω} (a, 50)

(downward-pointing triangle marker ▿) in (42) are plotted as functions of a, with

a \in [0, 100]

.

Figure 2. Sum of 50 Chi-squared random variables with parameter

k = 1

. The function

F_{X_{50}} (a)

(asterisk markers *) in Example 2; the function

F_{Z_{50}} (a)

(star markers ⋆) in (25); the function

{\hat{F}}_{X_{50}} (a)

(diamond markers ⋄) in (12); the function

\bar{Σ} (a, 50)

(circle marker ∘) in (26); the function

\underset{̲}{Σ} (a, 50)

(square marker □) in (27); the function

\bar{Ω} (a, 50)

(upward-pointing triangle marker ▵) in (41); and the function

\underset{̲}{Ω} (a, 50)

(downward-pointing triangle marker ▿) in (42) are plotted as functions of a, with

a \in [0, 100]

.

Figure 3. Normal and saddlepoint approximations to the functions T (Figure 3a) in (51) and C (Figure 3b) in (76) as functions of the blocklength n for the case of a BSC with cross-over probability

δ = 0.11

. The information rate is

R = 0.32

and

R = 0.42

bits per channel use for Figure 3a,b, respectively. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is chosen to be respectively the unique solution to t in (74) in Figure 3a and in (102) in Figure 3b.

Figure 3. Normal and saddlepoint approximations to the functions T (Figure 3a) in (51) and C (Figure 3b) in (76) as functions of the blocklength n for the case of a BSC with cross-over probability

δ = 0.11

. The information rate is

R = 0.32

and

R = 0.42

bits per channel use for Figure 3a,b, respectively. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is chosen to be respectively the unique solution to t in (74) in Figure 3a and in (102) in Figure 3b.

Figure 4. Normal and saddlepoint approximations to the functions T (Figure 4a) in (51) and C (Figure 4b) in (76) as functions of the blocklength n for the case of a real-valued AWGN channel with discrete channel inputs,

X = {- 1, 1}

, signal to noise ratio

SNR = 1

, and information rate

R = 0.425

bits per channel use. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is respectively chosen to be the unique solution to t in (74) in Figure 4a and in (102) in Figure 4b.

Figure 4. Normal and saddlepoint approximations to the functions T (Figure 4a) in (51) and C (Figure 4b) in (76) as functions of the blocklength n for the case of a real-valued AWGN channel with discrete channel inputs,

X = {- 1, 1}

, signal to noise ratio

SNR = 1

, and information rate

R = 0.425

bits per channel use. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is respectively chosen to be the unique solution to t in (74) in Figure 4a and in (102) in Figure 4b.

Figure 5. Normal and saddlepoint approximation to the functions T (Figure 5a) in (51) and C (Figure 5b) in (76) as functions of the blocklength n for the case of a real-valued symmetric

α

-stable noise channel with discrete channel inputs

X = {- 1, 1}

, shape parameter

α = 1.4

, dispersion parameter

σ = 0.6

, and information rate

R = 0.38

bits per channel use. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is respectively chosen to be the unique solution to t in (74) in Figure 5a and in (102) in Figure 5b.

Figure 5. Normal and saddlepoint approximation to the functions T (Figure 5a) in (51) and C (Figure 5b) in (76) as functions of the blocklength n for the case of a real-valued symmetric

α

-stable noise channel with discrete channel inputs

X = {- 1, 1}

, shape parameter

α = 1.4

, dispersion parameter

σ = 0.6

, and information rate

R = 0.38

bits per channel use. The channel input distribution

P_{X}

is chosen to be the uniform distribution, the output distribution

Q_{Y}

is chosen to be the channel output distribution

P_{Y}

, and the parameter

γ

is chosen to maximize C in (76). The parameter

θ

is respectively chosen to be the unique solution to t in (74) in Figure 5a and in (102) in Figure 5b.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Anade, D.; Gorce, J.-M.; Mary, P.; Perlaza, S.M. An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory. Entropy 2020, 22, 690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

AMA Style

Anade D, Gorce J-M, Mary P, Perlaza SM. An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory. Entropy. 2020; 22(6):690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

Chicago/Turabian Style

Anade, Dadja, Jean-Marie Gorce, Philippe Mary, and Samir M. Perlaza. 2020. "An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory" Entropy 22, no. 6: 690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory^†

Abstract

1. Introduction

1.1. Contributions

1.2. Applications

2. Sums of Independent and Identically Distributed Random Variables

2.1. Error Induced by the Normal Approximation

2.2. Error Induced by the Saddlepoint Approximation

2.3. Examples

3. Application to Information Theory: Channel Coding

3.1. System Model

3.2. Dependence Testing Bound

3.2.1. Normal Approximation

3.2.2. Saddlepoint Approximation

3.3. Meta Converse Bound

3.3.1. Normal Approximation

3.3.2. Saddlepoint Approximation

3.4. Numerical Experimentation

4. Discussion and Further Work

Author Contributions

Funding

Conflicts of Interest

Appendix A. Proof of Theorem 2

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 3

Appendix D. Proof of Lemma A1

Appendix E. Proof of Lemma A2

Appendix F. Proof of Theorem 5

Appendix G. Proof of Theorem 7

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory †

Abstract

1. Introduction

1.1. Contributions

1.2. Applications

2. Sums of Independent and Identically Distributed Random Variables

2.1. Error Induced by the Normal Approximation

2.2. Error Induced by the Saddlepoint Approximation

2.3. Examples

3. Application to Information Theory: Channel Coding

3.1. System Model

3.2. Dependence Testing Bound

3.2.1. Normal Approximation

3.2.2. Saddlepoint Approximation

3.3. Meta Converse Bound

3.3.1. Normal Approximation

3.3.2. Saddlepoint Approximation

3.4. Numerical Experimentation

4. Discussion and Further Work

Author Contributions

Funding

Conflicts of Interest

Appendix A. Proof of Theorem 2

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 3

Appendix D. Proof of Lemma A1

Appendix E. Proof of Lemma A2

Appendix F. Proof of Theorem 5

Appendix G. Proof of Theorem 7

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory^†