Next Article in Journal
A New Belief Entropy in Dempster–Shafer Theory Based on Basic Probability Assignment and the Frame of Discernment
Next Article in Special Issue
MISO Broadcast Channel under Unequal Link Coherence Times and Channel State Information
Previous Article in Journal
Exergoeconomic Analysis of Corn Drying in a Novel Industrial Drying System
Previous Article in Special Issue
Stealthy Secret Key Generation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory †

1
Laboratoire CITI, a Joint Laboratory between INRIA, the Université de Lyon and the Institut National de Sciences Appliquées (INSA) de Lyon. 6 Av. des Arts, 69621 Villeurbanne, France
2
IETR and the Institut National de Sciences Appliquées (INSA) de Rennes, 20 Avenue des Buttes de Coësmes, CS 70839, 35708 Rennes, France
3
INRIA, Centre de Recherche de Sophia Antipolis—Méditerranée, 2004 Route des Lucioles, 06902 Sophia Antipolis, France
4
Princeton University, Electrical Engineering Department, Princeton, NJ 08544, USA
*
Author to whom correspondence should be addressed.
Parts of this paper appear in the proceedings of the IEEE international Conference on Communications (ICC), Dublin, Ireland, June 2020, and in the INRIA technical report RR-9329.
Authors are listed in alphabetical order.
Submission received: 12 May 2020 / Revised: 10 June 2020 / Accepted: 11 June 2020 / Published: 20 June 2020
(This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives)

Abstract

:
This paper introduces an upper bound on the absolute difference between: ( a ) the cumulative distribution function (CDF) of the sum of a finite number of independent and identically distributed random variables with finite absolute third moment; and ( b ) a saddlepoint approximation of such CDF. This upper bound, which is particularly precise in the regime of large deviations, is used to study the dependence testing (DT) bound and the meta converse (MC) bound on the decoding error probability (DEP) in point-to-point memoryless channels. Often, these bounds cannot be analytically calculated and thus lower and upper bounds become particularly useful. Within this context, the main results include, respectively, new upper and lower bounds on the DT and MC bounds. A numerical experimentation of these bounds is presented in the case of the binary symmetric channel, the additive white Gaussian noise channel, and the additive symmetric α -stable noise channel.

1. Introduction

This paper focuses on approximating the cumulative distribution function (CDF) of sums of a finite number of real-valued independent and identically distributed (i.i.d.) random variables with finite absolute third moment. More specifically, let Y 1 , Y 2 , ⋯, Y n , with n an integer and 2 n < , be real-valued random variables with probability distribution P Y . Denote by F Y the CDF associated with P Y , and, if it exists, denote by f Y the corresponding probability density function (PDF). Let also
X n = t = 1 n Y t
be a random variable with distribution P X n . Denote by F X n the CDF and if it exists, denote by f X n the PDF associated with P X n . The objective is to provide a positive function that approximates F X n and an upper bound on the resulting approximation error. In the following, a positive function g : R R + is said to approximate F X n with an approximation error that is upper bounded by a function ϵ : R R + , if, for all x R ,
| F X n ( x ) g ( x ) | ϵ ( x ) .
The case in which Y 1 , Y 2 , ⋯, Y n in (1) are stable random variables with F Y analytically expressible is trivial. This is essentially because the sum X n follows the same distribution of a random variable a n Y + b n , where ( a n , b n ) R 2 and Y is a random variable whose CDF is F Y . Examples of this case are random variables following the Gaussian, Cauchy, or Levy distributions [1].
In general, the problem of calculating the CDF of X n boils down to calculating n 1 convolutions. More specifically, it holds that
f X n ( x ) = f X n 1 x t f Y ( t ) d t ,
where f X 1 = f Y . Even for discrete random variables and small values of n, the integral in (3) often requires excessive computation resources [2].
When the PDF of the random variable X n cannot be conveniently obtained but only the r first moments are known, with r N , an approximation of the PDF can be obtained by using an Edgeworth expansion. Nonetheless, the resulting relative error in the large deviation regime makes these approximations inaccurate [3].
When the cumulant generating function (CGF) associated with F Y , denoted by K Y : R R , is known, the PDF f X n can be obtained via the Laplace inversion lemma [2]. That is, given two reals α < 0 and α + > 0 , if K Y is analytic for all z { a + i b C : ( a , b ) R 2 and α a α + } C , then,
f X n ( x ) = 1 2 π i γ i γ + i exp n K Y ( z ) z x d z ,
with i = 1 and γ ( α , α + ) . Note that the domain of K Y in (4) has been extended to the complex plane and thus it is often referred to as the complex CGF. With an abuse of notation, both the CGF and the complex CGF are identically denoted.
In the case in which n is sufficiently large, an approximation to the Bromwich integral in (4) can be obtained by choosing the contour to include the unique saddlepoint of the integrand as suggested in [4]. The intuition behind this lies on the following observations:
(i) 
the saddlepoint, denoted by z 0 , is unique, real and z 0 ( α , α + ) ;
(ii) 
within a neighborhood around the saddlepoint of the form z z 0 < ϵ , with z C and ϵ > 0 sufficiently small, Im n K Y ( z ) z x = 0 and Re n K Y ( z ) z x can be assumed constant; and
(iii) 
outside such neighborhood, the integrand is negligible.
From ( i ) , it follows that the derivative of n K Y ( t ) t x with respect to t, with t R , is equal to zero when it is evaluated at the saddlepoint z 0 . More specifically, for all t R ,
d d t K Y ( t ) = E P Y Y exp t Y K Y ( t ) ,
and thus
E P Y Y exp z 0 Y K Y ( z 0 ) = x n ,
which shows the dependence of z 0 on both x and n.
A Taylor series expansion of the exponent n K Y ( z ) z x in the neighborhood of z 0 , leads to the following asymptotic expansion in powers of 1 n of the Bromwich integral in (4):
f X n ( x ) = f ^ X n ( x ) 1 + 1 n 1 8 K Y ( 4 ) ( z 0 ) K Y ( 2 ) ( z 0 ) 2 5 24 K Y ( 3 ) ( z 0 ) 2 K Y ( 2 ) ( z 0 ) 3 + O 1 n 2 ,
where f ^ X n : R R + is
f ^ X n ( x ) = 1 2 π n K Y ( 2 ) ( z 0 ) exp n K Y ( z 0 ) z 0 x ,
and for all k N and t R , the notation K Y ( k ) ( t ) represents the k-th real derivative of the CGF K Y evaluated at t. The first two derivatives K Y ( 1 ) and K Y ( 2 ) play a central role, and thus it is worth providing explicit expressions. That is,
K Y ( 1 ) ( t ) E P Y Y exp t Y K Y ( t ) ,   and
K Y ( 2 ) ( t ) E P Y Y K Y ( 1 ) ( t ) 2 exp t Y K Y ( t ) .
The function f ^ X n in (8) is referred to as the saddlepoint approximation of the PDF f X n and was first introduced in [4]. Nonetheless, f ^ X n is not necessarily a PDF as often its integral on R is not equal to one. A particular exception is observed only in three cases [5]. First, when f Y is the PDF of a Gaussian random variable, the saddlepoint approximation f ^ X n is identical to f X n , for all n > 0 . Second and third, when f Y is the PDF associated with a Gamma distribution and an inverse normal distribution, respectively, the saddlepoint approximation f ^ X n is exact up to a normalization constant for all n > 0 .
An approximation to the CDF F X n can be obtained by integrating the PDF in (4), cf., [6,7,8]. In particular, the result reported in [6] leads to an asymptotic expansion of the CDF of X n , for all x R , of the form:
F X n ( x ) = F ^ X n ( x ) + O 1 n exp n K Y ( z 0 ) x z 0 ,
where the function F ^ X n : R R is the saddlepoint approximation of F X n . That is, for all x R ,
F ^ X n ( x ) = 1 z 0 > 0 + ( 1 ) 1 z 0 > 0 exp n K Y ( z 0 ) z 0 x + 1 2 z 0 2 n K Y ( 2 ) ( z 0 ) Q | z 0 | n K Y ( 2 ) ( z 0 ) ,
where the function Q : R [ 0 , 1 ] is the complementary CDF of a Gaussian random variable with zero mean and unit variance. That is, for all t R ,
Q ( t ) = 1 2 π t exp x 2 2 d x .
Finally, from the central limit theorem [3], for large values of n and for all x R , a reasonable approximation to F X n ( x ) is 1 Q ( x ) . In the following, this approximation is referred to as the normal approximation of F X n .

1.1. Contributions

The main contribution of this work is an upper bound on the error induced by the saddlepoint approximation F ^ X n in (12) (Theorem 3 in Section 2.2). This result builds upon two observations. The first observation is that the CDF F X n can be written for all x R in the form,
F X n ( x ) = 1 { z 0 0 } E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n x } + 1 { z 0 > 0 } 1 E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n > x } ,
where the random variable
S n = t = 1 n Y t ( z 0 )
has a probability distribution denoted by P S n , and the random variables Y 1 ( z 0 ) , Y 2 ( z 0 ) , , Y n ( z 0 ) are independent with probability distribution P Y ( z 0 ) . The distribution P Y ( z 0 ) is an exponentially tilted distribution [9] with respect to the distribution P Y at the saddlepoint z 0 . More specifically, the Radon–Nikodym derivative of the distribution P Y ( z 0 ) with respect to the distribution P Y satisfies for all y supp P Y ,
d P Y ( z 0 ) d P Y ( y ) = exp K Y ( z 0 ) z 0 y .
The second observation is that the saddlepoint approximation F ^ X n in (12) can be written for all x R in the form,
F ^ X n ( x ) = 1 { z 0 0 } E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n x } + 1 { z 0 > 0 } 1 E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n > x } ,
where Z n is a Gaussian random variable with mean x, variance n K Y ( 2 ) ( z 0 ) , and probability distribution P Z n . Note that the means of the random variable S n in (14) and Z n in (17) are equal to n K Y ( 1 ) ( z 0 ) , whereas their variances are equal to n K Y ( 2 ) ( z 0 ) . Note also that, from (6), it holds that x = n K Y ( 1 ) ( z 0 ) .
Using these observations, it holds that the absolute difference between F X n in (14) and F ^ X n in (17) satisfies for all x R ,
F X n ( x ) F ^ X n ( x ) = 1 { z 0 0 } E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n x } E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n x } + 1 { z 0 > 0 } E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n > x } E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n > x } .
A step forward (Lemma A1 in Appendix A) is to note that, when x is such that z 0 0 , then,
E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n x } E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n x } exp n K Y ( z 0 ) z 0 x min 1 , 2 sup a R F S n ( a ) F Z n ( a ) ,
and when x is such that z 0 > 0 , it holds that
E P S n exp n K Y ( z 0 ) z 0 S n 1 { S n > x } E P Z n exp n K Y ( z 0 ) z 0 Z n 1 { Z n > x } exp n K Y ( z 0 ) z 0 x min 1 , 2 sup a R F S n ( a ) F Z n ( a ) ,
where F S n and F Z n are the CDFs of the random variables S n and Z n , respectively. The final result is obtained by observing that sup a R F S n ( a ) F Z n ( a ) can be upper bounded using the Berry–Esseen Theorem (Theorem 1 in Section 2.1). This is essentially due to the fact that the random variable S n is the sum of n independent random variables, i.e., (15), and Z n is a Gaussian random variable, and both S n and Z n possess identical means and variances. Thus, the main result (Theorem 3 in Section 2.2) is that, for all x R ,
F X n ( x ) F ^ X n ( x ) 2 ξ Y ( z 0 ) n exp n K Y ( z 0 ) z 0 x ,
where
ξ Y ( z 0 ) = c 1 E P Y Y K Y ( 1 ) ( z 0 ) 3 exp z 0 Y K Y ( z 0 ) K Y ( 2 ) ( z 0 ) 3 / 2 + c 2 ,
with
c 1 0.33554 , and
c 2 0.415 .
Finally, note that (21) holds for any finite value of n and admits the asymptotic scaling law with respect to n suggested in (11).

1.2. Applications

In the realm of information theory, the normal approximation has played a central role in the calculation of bounds on the minimum decoding error probability (DEP) in point-to-point memoryless channels, cf., [10,11]. Thanks to the normal approximation, simple approximations for the dependence testing (DT) bound, the random coding union bound (RCU) bound, and the meta converse (MC) bound have been obtained in [10,12]. The success of these approximations stems from the fact that they are easy to calculate. Nonetheless, easy computation comes at the expense of loose upper and lower bounds and thus uncontrolled approximation errors.
On the other hand, saddlepoint techniques have been extensively used to approximate existing lower and upper bounds on the minimum DEP. See, for instance, [13,14] in the case of the RCU bound and the MC bound. Nonetheless, the errors induced by saddlepoint approximations are often neglected due to the fact that calculating them involves a large number of optimizations and numerical integrations. Currently, the validation of saddlepoint approximations is carried through Monte Carlo simulations. Within this context, the main objectives of this paper are twofold: ( a ) to analytically assess the tightness of the approximation of DT and MC bounds based on the saddlepoint approximation of the CDFs of sums of i.i.d. random variables; ( b ) to provide new lower and upper bounds on the minimum DEP by providing a lower bound on the MC bound and an upper bound on the DT bound. Numerical experimentation of these bounds is presented for the binary symmetric channel (BSC), the additive white Gaussian noise (AWGN) channel, and the additive symmetric α -stable noise (S α S) channel, where the new bounds are tight and obtained at low computational cost.

2. Sums of Independent and Identically Distributed Random Variables

In this section, upper bounds on the absolute error of approximating F X n by the normal approximation and the saddlepoint approximation are presented.

2.1. Error Induced by the Normal Approximation

Given a random variable Y, let the function ξ Y : R R be for all t R :
ξ Y ( t ) c 1 E P Y Y K Y ( 1 ) ( t ) 3 exp t Y K Y ( t ) K Y ( 2 ) ( t ) 3 / 2 + c 2 ,
where c 1 and c 2 are defined in (23).
The following theorem, known as the Berry–Esseen theorem [3], introduces an upper bound on the approximation error induced by the normal approximation.
Theorem 1
(Berry–Esseen [15]). Let Y 1 , Y 2 , …, Y n be i.i.d random variables with probability distribution P Y . Let also Z n be a Gaussian random variable with mean n K Y ( 1 ) ( 0 ) , variance n K Y ( 2 ) ( 0 ) , and CDF denoted by F Z n . Then, the CDF of the random variable X n = Y 1 + Y 2 +…+ Y n , denoted by F X n , satisfies
sup a R F X n ( a ) F Z n ( a ) min 1 , ξ Y ( 0 ) n ,
where the functions K Y ( 1 ) , K Y ( 2 ) and ξ Y are defined in (9), (10), and (24).
An immediate result from Theorem 1 gives the following upper and lower bounds on F X n ( a ) , for all a R ,
F X n ( a ) F Z n ( a ) + min 1 , ξ Y ( 0 ) n Σ ¯ ( a , n ) , and
F X n ( a ) F Z n ( a ) min 1 , ξ Y ( 0 ) n Σ ̲ ( a , n ) .
The main drawback of Theorem 1 is that the upper bound on the approximation error does not depend on the exact value of a. More importantly, for some values of a and n, the upper bound on the approximation error might be particularly big, which leads to irrelevant results.

2.2. Error Induced by the Saddlepoint Approximation

The following theorem introduces an upper bound on the approximation error induced by approximating the CDF F X n of X n in (1) by the function η Y : R 2 × N R defined such that for all ( θ , a, n ) R 2 × N ,
η Y ( θ , a , n ) 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp 1 2 n θ 2 K Y ( 2 ) ( θ ) + n K Y ( θ ) n θ K Y ( 1 ) ( θ ) Q ( 1 ) 1 { θ 0 } a + n θ K Y ( 2 ) ( θ ) n K Y ( 1 ) ( θ ) n K Y ( 2 ) ( θ ) ,
where the function Q : R [ 0 , 1 ] is the complementary CDF of the standard Gaussian distribution defined in (13). Note that η Y ( θ , n , a ) is identical to F ^ X n ( a ) , when θ is chosen to satisfy the saddlepoint K Y ( 1 ) ( θ ) = a n . Note also that η Y ( 0 , n , a ) is the CDF of a Gaussian random variable with mean n K Y ( 1 ) ( 0 ) and variance n K Y ( 2 ) ( 0 ) , which are the mean and the variance of X n in (1), respectively.
Theorem 2.
Let Y 1 , Y 2 , …, Y n be i.i.d. random variables with probability distribution P Y and CGF K Y . Let also F X n be the CDF of the random variable X n = Y 1 + Y 2 +…+ Y n . Hence, for all a∈ R and for all θ Θ Y , it holds that
F X n ( a ) η Y θ , a , n exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n ,
where
Θ Y { t R : K Y ( t ) < } ;
and the functions ξ Y and η Y are defined in (24) and (28), respectively.
Proof. 
The proof of Theorem 2 is presented in Appendix A. ☐
This result leads to the following upper and lower bounds on F X n ( a ) , for all a R ,
F X n ( a ) η Y θ , a , n + exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n , and
F X n ( a ) η Y θ , a , n exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n ,
with θ Θ Y .
The advantages of approximating F X n by using Theorem 2 instead of Theorem 1 are twofold. First, both the approximation η Y and the corresponding approximation error depend on the exact value of a. In particular, the approximation can be optimized for each value of a via the parameter θ . Second, the parameter θ in (29) can be optimized to improve either the upper bound in (31) or the lower bound in (32) for some a R . Nonetheless, such optimizations are not necessarily simple.
An alternative to the optimization on θ in (31) and (32) is to choose θ such that it minimizes n K Y ( θ ) θ a . This follows the intuition that, for some values of a and n, the term exp ( n K Y ( θ ) θ a ) is the one that influences the most the value of the right-hand side of (29). To build upon this idea, consider the following lemma.
Lemma 1.
Consider a random variable Y with probability distribution P Y and CGF K Y . Given n N , let the function h : R R be defined for all a R satisfying a n int C Y , with int C Y denoting the interior of the convex hull of supp P X n , as follows:
h ( a ) = inf θ Θ Y n K Y ( θ ) θ a ,
where Θ Y is defined in (30). Then, the function h is concave and for all a R ,
h ( a ) h ( n E P Y [ Y ] ) = 0 .
Furthermore,
h ( a ) = n K Y ( θ ) θ a ,
where θ is the unique solution in θ to
n K Y ( 1 ) ( θ ) = a ,
with K Y ( 1 ) is defined in (9).
Proof. 
The proof of Lemma 1 is presented in Appendix B. ☐
Given ( a , n ) R × N , the value of h ( a ) in (33) is the argument that minimizes the exponential term in (29). An interesting observation from Lemma 1 is that the maximum of h is zero, and it is reached when a = n E P Y [ Y ] = E P X n [ X n ] . In this case, θ = 0 , and thus, from (31) and (32), it holds that
F X n ( a ) η Y 0 , a , n + min 1 , 2 ξ Y ( 0 ) n
= F Z n ( a ) + min 1 , 2 ξ Y ( 0 ) n , and F X n ( a ) η Y 0 , a , n min 1 , 2 ξ Y ( 0 ) n
= F Z n ( a ) min 1 , 2 ξ Y ( 0 ) n ,
where F Z n is the CDF defined in Theorem 1. Hence, the upper bound in (37) and the lower bound in (38) obtained from Theorem 2 are worse than those in (26) and (27) obtained from Theorem 1. In a nutshell, for values of a around the vicinity of n E P Y [ Y ] = E P X n [ X n ] , it is more interesting to use Theorem 1 instead of Theorem 2.
Alternatively, given that h is non-positive and concave, when a n E P Y [ Y ] = | a E P X n [ X n ] | > γ , with γ sufficiently large, it follows that
exp n K Y ( θ ) θ a < min 1 , ξ Y ( 0 ) n ,
with θ defined in (36). Hence, in this case, the right-hand side of (29) is always smaller than the right-hand side of (25). That is, for such values of a and n, the upper and lower bounds in (31) and (32) are better than those in (26) and (27), respectively. The following theorem leverages this observation.
Theorem 3.
Let Y 1 , Y 2 , …, Y n be i.i.d. random variables with probability distribution P Y and CGF K Y . Let also F X n be the CDF of the random variable X n = Y 1 + Y 2 + + Y n . Hence, for all a∈ int C X n , with int C X n the interior of the convex hull of supp P X n , it holds that
F X n ( a ) F ^ X n ( a ) exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n ,
where θ is defined in (36), and the functions F ^ X n and ξ Y are defined in (12), and (24), respectively.
Proof. 
The proof of Theorem 3 is presented in Appendix C. ☐
An immediate result from Theorem 3 gives the following upper and lower bounds on F X ( a ) , for all a R ,
F X n ( a ) F ^ X n ( a ) + exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n Ω ¯ ( a , n ) , and
F X n ( a ) F ^ X n ( a ) exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n Ω ̲ ( a , n ) .
The following section presents two examples that highlight the observations mentioned above.

2.3. Examples

Example 1
(Discrete random variable). Let the random variables Y 1 , Y 2 , …, Y n in (1) be i.i.d. Bernoulli random variables with parameter p = 0 . 2 and n = 100 . In this case, E P X n X n = n E P Y Y = 20 . Figure 1 depicts the CDF F X 100 of X 100 in (1); the normal approximation F Z 100 in (25); and the saddlepoint approximation F ^ X 100 in (12). Therein, it is also depicted the upper and lower bounds due to the normal approximation Σ ¯ in (26) and Σ ̲ in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation Ω ¯ in (41) and Ω ̲ in (42), respectively. These functions are plotted as a function of a, with a∈ [ 5 , 35 ] .
Example 2
(Continuous random variable). Let the random variables Y 1 , Y 2 , …, Y n in (1) be i.i.d. chi-squared random variables with parameter k = 1 and n = 50 . In this case, E P X n X n = n E P Y Y = 50 . Figure 2 depicts the CDF F X 50 of X 50 in (1); the normal approximation F Z 50 in (25); and the saddlepoint approximation F ^ X 50 in (12). Therein, it is also depicted the upper and lower bounds due to the normal approximation Σ ¯ in (26) and Σ ̲ in (27), respectively; and the upper and lower bounds due to the saddlepoint approximation Ω ¯ in (41) and Ω ̲ in (42), respectively. These functions are plotted as a function of a, with a∈ [ 0 , 100 ] .

3. Application to Information Theory: Channel Coding

This section focuses on the study of the DEP in point-to-point memoryless channels. The problem is formulated in Section 3.1. The main results presented in this section consist of lower and upper bounds on the DEP. The former, which are obtained building upon the existing DT bound [10], are presented in Section 3.2. The latter, which are obtained from the MC bound [10], are presented in Section 3.3.

3.1. System Model

Consider a point-to-point communication in which a transmitter aims at sending information to one receiver through a noisy memoryless channel. Such a channel can be modeled by a random transformation
( X n , Y n , P Y | X ) ,
where n N is the blocklength and X and Y are the channel input and channel output sets. Given the channel inputs x = ( x 1 , x 2 , , x n ) X n , the outputs y = ( y 1 , y 2 , , y n ) Y n are observed at the receiver with probability
P Y | X ( y | x ) = t = 1 n P Y | X ( y t | x t ) ,
where, for all x X , P Y | X = x Y , with Y , the set of all possible probability distributions whose support is a subset of Y . The objective of the communication is to transmit a message index i, which is a realization of a random variable W that is uniformly distributed over the set
W { 1 , 2 , , M } ,
with 1 <M<. To achieve this objective, the transmitter uses an ( n , M, λ ) -code, where λ [ 0 , 1 ] .
Definition 1
( ( n , M, λ ) -code). Given a tuple ( M , n, λ ) N 2 × [ 0 , 1 ] , an ( n , M, λ ) -code for the random transformation in (43) is a system
u ( 1 ) , D ( 1 ) , u ( 2 ) , D ( 2 ) , , u ( M ) , D ( M ) ,
where for all ( j , ) W 2 , with j :
u ( j ) = ( u 1 ( j ) , u 2 ( j ) , , u n ( j ) ) X n ,
D ( j ) D ( ) = ,
j W D ( j ) Y n , and
1 M i = 1 M E P Y | X = u ( i ) 1 Y D ( i ) λ .
To transmit message index i W , the transmitter uses the codeword u ( i ) . For all t∈{ 1,2,, n } , at channel use t, the transmitter inputs the symbol u t ( i ) into the channel. Assume that, at the end of channel use t, the receiver observes the output y t . After n channel uses, the receiver uses the vector y = ( y 1 , y 2 ,, y n ) and determines that the symbol j was transmitted if y D ( j ) , with j W .
Given the ( n ,M, λ ) -code described by the system in (46), the DEP of the message index i can be computed as E P Y | X = u ( i ) [ 1 { Y D ( i ) } ] . As a consequence, the average DEP is
1 M i = 1 M E P Y | X = u ( i ) 1 { Y D ( i ) } .
Note that, from (47d), the average DEP of such an ( n , M , λ ) -code is upper bounded by λ . Given a fixed pair ( n , M ) N 2 , the minimum λ for which an ( n ,M, λ ) -code exists is defined hereunder.
Definition 2.
Given a pair ( n , M ) N 2 , the minimum average DEP for the random transformation in (43), denoted by λ * ( n , M ) , is given by
λ * ( n , M ) = min λ [ 0 , 1 ] : ( n , M , λ ) - code .
When λ is chosen accordingly with the reliability constraints, an ( n , M , λ ) -code is said to transmit at an information rate R = log 2 ( M ) n bits per channel use.
The remainder of this section introduces the DT and MC bounds. The DT bound is one of the tightest existing upper bounds on λ * ( n , M ) in (49), whereas the MC bound is one of the tightest lower bounds.

3.2. Dependence Testing Bound

This section describes an upper bound on λ * ( n , M ) , for a fixed pair ( n , M ) N 2 . Given a probability distribution P X X n , let the random variable ι X ; Y satisfy
ι X ; Y ln d P X Y d P X P Y ( X , Y ) ,
where the function d P X Y d P X P Y : X n × Y n R denotes the Radon–Nikodym derivative of the joint probability measure P X Y with respect to the product of probability measures P X P Y , with P X Y = P X P Y | X and P Y the corresponding marginal. Let the function T : N 2 × X n R + be for all ( n , M ) N 2 and for all probability distributions P X X n ,
T ( n , M , P X ) = E P X P Y | X 1 ι ( X ; Y ) ln M 1 2 + M 1 2 E P X P Y 1 ι ( X ; Y ) > ln M 1 2 .
Using this notation, the following lemma states the DT bound.
Lemma 2
(Dependence testing bound [10]). Given a pair ( n , M ) N 2 , the following holds for all P X X n , with respect to the random transformation in (43):
λ * ( n , M ) T ( n , M , P X ) ,
with the function T defined in (51).
Note that the input probability distribution P X in Lemma 2 can be chosen among all possible probability distributions P X X n to minimize the right-hand side of (52), which improves the bound. Note also that with some loss of optimality, the optimization domain can be restricted to the set of product probability distributions for which for all x X n ,
P X ( x ) = t = 1 n P X ( x t ) ,
with P X X . Hence, subject to (44), the random variable ι ( X ; Y ) in (50) can be written as the sum of i.i.d. random variables, i.e.,
ι ( X ; Y ) = t = 1 n ι ( X t ; Y t ) .
This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function T in (51), for some given values ( n , M ) N 2 and a given distribution P X X n for the random transformation in (43) subject to (44). These bounds become significantly relevant when the exact value of T ( n , M , P X ) cannot be calculated with respect to the random transformation in (43). In such a case, providing upper and lower bounds on T ( n , M , P X ) helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

3.2.1. Normal Approximation

This section describes the normal approximation of the function T in (51). That is, the random variable ι ( X ; Y ) is assumed to satisfy (54) and to follow a Gaussian distribution. More specifically, for all P X X , let
μ ( P X ) E P X P Y | X ι ( X ; Y ) ,
σ ( P X ) E P X P Y | X ι ( X ; Y ) μ ( P X ) 2 , and
ξ ( P X ) c 1 E P X P Y | X | ι ( X ; Y ) μ ( P X ) | 3 σ ( P X ) 3 2 + c 2 ,
with c 1 and c 2 defined in (23), be functions of the input distribution P X . In particular, μ ( P X ) and σ ( P X ) are respectively the first moment and the second central moment of the random variables ι ( X 1 ; Y 1 ) , ι ( X 2 ; Y 2 ) ι ( X n ; Y n ) . Using this notation, consider the functions D : N 2 × X R + and N : N 2 × X R + such that for all ( n , M ) N 2 and for all P X X ,
D ( n , M , P X ) = max 0 , α n , M , P X ξ ( P X ) n , and
N ( n , M , P X ) = min 1 , α n , M , P X + 5 ξ ( P X ) n + 2 ln 2 σ ( P X ) 1 2 2 n π ,
where
α n , M , P X Q n μ ( P X ) ln M 1 2 n σ ( P X ) .
Using this notation, the following theorem introduces lower and upper bounds on the function T in (51).
Theorem 4.
Given a pair ( n , M ) N 2 , for all input distributions P X X n subject to (53), the following holds with respect to the random transformation in (43) subject to (44),
D ( n , M , P X ) T ( n , M , P X ) N ( n , M , P X ) ,
where the functions T, D and N are defined in (51), (58) and (59), respectively.
Proof. 
The proof of Theorem 4 is presented in [12]. Essentially, it relies on Theorem 1 for upper and lower bounding the terms E P X P Y | X 1 ι ( X ; Y ) ln M 1 2 in (51). The upper bound on E P X P Y 1 ι ( X ; Y ) > ln M 1 2 in (51) follows from Lemma 47 in [10]. ☐
In [12], the function α ( n , M , P X ) in (60) is often referred to as the normal approximation of T ( n , M , P X ) , which is indeed a language abuse. In Section 2.1, a comment is given on the fact that the lower and upper bounds, i.e., the functions D in (58) and N in (59), are often too far from the normal approximation α in (60).

3.2.2. Saddlepoint Approximation

This section describes an approximation of the function T in (51) by using the saddlepoint approximation of the CDF of the random variable ι ( X ; Y ) , as suggested in Section 2.2. Given a distribution P X X , the moment generating function of ι ( X ; Y ) is
φ ( P X , θ ) E P X P Y | X exp θ ι ( X ; Y ) ,
with θ R . For all P X X and for all θ R , consider the following functions:
μ ( P X , θ ) E P X P Y | X ι ( X ; Y ) exp θ ι ( X ; Y ) φ ( P X , θ ) ,
V ( P X , θ ) E P X P Y | X ι ( X ; Y ) μ ( P X , θ ) 2 exp θ ι ( X ; Y ) φ ( P X , θ ) , and
ξ ( P X , θ ) c 1 E P X P Y | X ι ( X ; Y ) μ ( P X , θ ) 3 exp θ ι ( X ; Y ) φ ( P X , θ ) V ( P X , θ ) 3 / 2 + c 2 ,
where c 1 and c 2 are defined in (23). Using this notation, consider the functions β 1 : N 2 × R × X R + and β 2 : N 2 × R × X R + :
β 1 ( n , M , θ , P X ) = 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp n ln φ ( P X , θ ) θ ln M 1 2 + 1 2 θ 2 n V ( P X , θ ) Q n V ( P X , θ ) | θ | ,
and
β 2 ( n , M , θ , P X ) = 1 { θ 1 } + ( 1 ) 1 { θ 1 } exp n ln φ ( P X , θ ) θ + 1 ln M 1 2 + 1 2 ( θ + 1 ) 2 n V ( P X , θ ) Q n V ( P X , θ ) | θ + 1 | .
Note that β 1 is the saddlepoint approximation of the CDF of the random variable ι ( X ; Y ) in (54) when X and Y follow the distribution P X P Y | X . Note also that β 2 is the saddlepoint approximation of the complementary CDF of the random variable ι ( X ; Y ) in (54) when X and Y follow the distribution P X P Y .
Consider also the following functions:
G 1 ( n , M , θ , P X ) = β 1 ( n , M , θ , P X ) 2 ξ ( P X , θ ) n exp n ln φ ( P X , θ ) θ ln M 1 2 ,
G 2 ( n , M , θ , P X ) = β 2 ( n , M , θ , P X ) 2 ξ ( P X , θ ) n exp n ln φ ( P X , θ ) ( θ + 1 ) ln M 1 2 ,
G ( n , M , θ , P X ) = max 0 , G 1 ( n , M , θ , P X ) + M 1 2 max 0 , G 2 ( n , M , θ , P X ) , and
S ( n , M , θ , P X ) = min 1 , β n , M , θ , P X + 4 ξ ( P X , θ ) n exp n ln φ ( P X , θ ) θ ln M 1 2 ,
where,
β ( n , M , θ , P X ) = β 1 ( n , M , θ , P X ) + M 1 2 β 2 ( n , M , θ , P X ) ,
with β 1 in (66) and β 2 in (67). Often, the function β in (72) is referred to as the saddlepoint approximation of the function T in (51), which is indeed a language abuse.
The following theorem introduces new lower and upper bounds on the function T in (51).
Theorem 5.
Given a pair ( n , M ) N 2 , for all input distributions P X X n subject to (53), the following holds with respect to the random transformation in (43) subject to (44),
G ( n , M , θ , P X ) T ( n , M , P X ) S ( n , M , θ , P X )
where θ is the unique solution in t to
n μ ( P X , t ) = ln M 1 2 ,
and the functions T, G, and S are defined in (51), (70), and (71).
Proof. 
The proof of Theorem 5 is provided in Appendix F. In a nutshell, the proof relies on Theorem 3 for independently bounding the terms E P X P Y | X 1 ι ( X ; Y ) ln M 1 2 and E P X P Y 1 ι ( X ; Y ) > ln M 1 2 in (51). ☐

3.3. Meta Converse Bound

This section describes a lower bound on λ * ( n , M ) , for a fixed pair ( n , M ) N 2 . Given two probability distributions P X Y X n × Y n and Q Y Y n , let the random variable ι ˜ X ; Y | Q Y satisfy
ι ˜ X ; Y | Q Y ln d P X Y d P X Q Y ( X , Y ) .
For all ( n ,M, γ ) N 2 × R and for all probability distributions P X X n and Q Y Y n , let the function C : N 2 × X n × Y n × R + R + be
C ( n , M , P X , Q Y , γ ) E P X P Y | X 1 ι ˜ X ; Y | Q Y ln γ + γ E P X Q Y 1 ι ˜ X ; Y | Q Y > ln γ 1 M .
Using this notation, the following lemma describes the MC bound.
Lemma 3
(MC Bound [10,13]). Given a pair ( n , M ) N 2 , the following holds for all Q Y Δ ( Y n ) , with respect to the random transformation in (43):
λ * ( n , M ) inf P X Δ ( X n ) max γ 0 C ( n , M , P X , Q Y , γ ) ,
where the function C is defined in (76).
Note that the output probability distribution Q Y in Lemma 3 can be chosen among all possible probability distributions Q Y Y n to maximize the right-hand side of (76), which improves the bound. Note also that, with some loss of optimality, the optimization domain can be restricted to the set of probability distributions for which for all y Y n ,
Q Y ( y ) = t = 1 n Q Y ( y t ) ,
with Q Y Y . Hence, subject to (44), for all x X n , the random variable ι ˜ ( x ; Y | Q Y ) in (76) can be written as the sum of the independent random variables, i.e.,
ι ˜ ( x ; Y | Q Y ) = t = 1 n ι ˜ ( x t ; Y t | Q Y ) .
With some loss of generality, the focus is on a channel transformation of the form in (43) for which the following condition holds: The infimum in (77) is achieved by a product distribution, i.e., P X is of the form in (53), when the probability distribution Q Y satisfies (78). Note that this condition is met by memoryless channels such as the BSC, the AWGN and S α S channels with binary antipodal inputs, i.e., input alphabets are of the form X = { a , a } , with a R . This follows from the fact that the random variable ι ˜ ( x ; Y | Q Y ) is invariant of the choice of x X n when the probability distribution Q Y satisfies (78) and for all y Y ,
Q Y ( y ) = P Y | X ( y | a ) + P Y | X ( y | a ) 2 .
Under these conditions, the random variable ι ˜ ( X ; Y | Q Y ) in (76) can be written as the sum of i.i.d. random variables, i.e.,
ι ˜ ( X ; Y | Q Y ) = t = 1 n ι ˜ ( X t ; Y t | Q Y ) .
This observation motivates the application of the results of Section 2 to provide upper and lower bounds on the function C in (76), for some given values ( n , M ) N 2 and given distributions P X X n and Q Y Y n . These bounds become significantly relevant when the exact value of C ( n , M , P X , Q Y , γ ) cannot be calculated with respect to the random transformation in (43). In such a case, providing upper and lower bounds on C ( n , M , P X , Q Y , γ ) helps in approximating its exact value subject to an error sufficiently small such that the approximation is relevant.

3.3.1. Normal Approximation

This section describes the normal approximation of the function C in (76), that is to say, the random variable ι ˜ ( X ; Y | Q Y ) is assumed to satisfy (81) and to follow a Gaussian distribution. More specifically, for all ( P X , Q Y ) X × Y , let
μ ˜ ( P X , Q Y ) E P X P Y | X ι ˜ ( X ; Y | Q Y ) ,
σ ˜ ( P X , Q Y ) E P X P Y | X ι ˜ ( X ; Y | Q Y ) μ ˜ ( P X , Q Y ) 2 , and
ξ ˜ ( P X , Q Y ) c 1 E P X P Y | X | ι ˜ ( X ; Y | Q Y ) μ ˜ ( P X , Q Y ) | 3 σ ˜ ( P X , Q Y ) 3 / 2 + c 2
with c 1 and c 2 defined in (23), be functions of the input and output distributions P X and Q Y , respectively. In particular, μ ˜ ( P X , Q Y ) and σ ˜ ( P X , Q Y ) are respectively the first moment and the second central moment of the random variables ι ˜ ( X 1 ; Y 1 | Q Y ) , ι ˜ ( X 2 ; Y 2 | Q Y ) , ι ˜ ( X n ; Y n | Q Y ) . Using this notation, consider the functions D ˜ : N 2 × X × Y × R + R + and N ˜ : N 2 × X × Y × R + R + such that, for all ( n , M , γ ) N 2 × R + and for all P X X and for all Q Y Y ,
D ˜ ( n , M , P X , Q Y , γ ) = max 0 , α ˜ n , M , P X , Q Y , γ ξ ˜ ( P X , Q Y ) n , and
N ˜ ( n , M , P X , Q Y , γ ) = min 1 , α ˜ n , M , P X , Q Y , γ + 5 ξ ˜ ( P X , Q Y ) n + 2 ln 2 σ ˜ ( P X , Q Y ) 1 2 2 n π ,
where
α ˜ n , M , P X , Q Y , γ Q n μ ˜ ( P X , Q Y ) ln γ n σ ˜ ( P X , Q Y ) γ M .
Using this notation, the following theorem introduces lower and upper bounds on the function C in (76).
Theorem 6.
Given a pair ( n , M ) N 2 , for all input distributions P X X n subject to (53), for all output distributions Q Y Y n subject to (78), and for all γ 0 , the following holds with respect to the random transformation in (43) subject to (44),
D ˜ ( n , M , P X , Q Y , γ ) C ( n , M , P X , Q Y , γ ) N ˜ ( n , M , P X , Q Y , γ ) ,
where the functions C, D ˜ , and N ˜ are defined in (76), (85), and (86), respectively.
Proof. 
The proof of Theorem 6 is partially presented in [10]. Essentially, it relies on Theorem 1 for upper and lower bounding the term E P X P Y | X 1 ι ˜ ( X ; Y | Q Y ) ln γ in (76); and using Lemma 47 in [10] for upper bounding the term E P X Q Y 1 ι ˜ ( X ; Y | Q Y ) > ln γ in (76). ☐
The function α ˜ n , M , P X , Q Y , γ in (87) is often referred to as the normal approximation of C ( n , M , P X ) , which is indeed a language abuse. In Section 2.1, a comment is given on the fact that the lower and upper bounds on the normal approximation, i.e., the functions D ˜ in (85) and N ˜ in (86), are often too far from the normal approximation α ˜ in (87).

3.3.2. Saddlepoint Approximation

This section describes an approximation of the function C in (76) by using the saddlepoint approximation of the CDF of the random variable ι ˜ ( X ; Y | Q Y ) , as suggested in Section 2.2. Given two distributions P X X and Q Y Y , let the random variable ι ˜ ( X ; Y | Q Y ) satisfy
ι ˜ ( X ; Y | Q Y ) ln d P X P Y | X d P X Q Y ( X , Y ) ,
where P Y | X is in (44). The moment generating function of ι ˜ ( X ; Y | Q Y ) is
φ ˜ ( P X , Q Y , θ ) E P X P Y | X exp θ ι ˜ ( X ; Y | Q Y ) ,
with θ R . For all P X X and Q Y Y , and for all θ R , consider the following functions:
μ ˜ ( P X , Q Y , θ ) E P X P Y | X ι ˜ ( X ; Y | Q Y ) exp θ ι ˜ ( X ; Y | Q Y ) φ ˜ ( P X , Q Y , θ ) ,
V ˜ ( P X , Q Y , θ ) E P X P Y | X ι ˜ ( X ; Y | Q Y ) μ ˜ ( P X , Q Y , θ ) 2 exp θ ι ˜ ( X ; Y | Q Y ) φ ˜ ( P X , Q Y , θ ) , and
ξ ˜ ( P X , Q Y , θ ) c 1 E P X P Y | X ι ˜ ( X ; Y | Q Y ) μ ˜ ( P X , Q Y , θ ) 3 exp θ ι ˜ ( X ; Y | Q Y ) φ ˜ ( P X , Q Y , θ ) V ˜ ( P X , Q Y , θ ) 3 / 2 + c 2 ,
where c 1 and c 2 are defined in (23). Using this notation, consider the functions β ˜ 1 : N × R + × R × X × Y R + and β ˜ 2 : N × R + × R × X × Y R + :
β ˜ 1 ( n , γ , θ , P X , Q Y )
= 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp n ln φ ˜ ( P X , Q Y , θ ) θ ln γ + 1 2 θ 2 n V ˜ ( P X , Q Y , θ ) Q n V ˜ ( P X , Q Y , θ ) | θ | , and β ˜ 2 ( n , γ , θ , P X , Q Y ) = 1 { θ 1 } + ( 1 ) 1 { θ 1 } exp n ln φ ˜ ( P X , Q Y , θ ) θ + 1 ln γ + 1 2 ( θ + 1 ) 2 n V ˜ ( P X , Q Y , θ )
Q n V ˜ ( P X , Q Y , θ ) | θ + 1 | .
Note that β ˜ 1 and β ˜ 2 are the saddlepoint approximation of the CDF and the complementary CDF of the random variable ι ˜ ( X ; Y | Q Y ) in (81) when X , Y follows the distribution P X P Y | X and P X Q Y , respectively. Consider also the following functions:
G ˜ 1 ( n , γ , θ , P X , Q Y ) = β ˜ 1 ( n , γ , θ , P X , Q Y ) 2 ξ ˜ ( P X , Q Y , θ ) n exp n ln φ ˜ ( P X , Q Y , θ ) θ ln γ ,
G ˜ 2 ( n , γ , θ , P X , Q Y ) = β ˜ 2 ( n , γ , θ , P X , Q Y ) 2 ξ ˜ ( P X , Q Y , θ ) n exp n ln φ ˜ ( P X , Q Y , θ ) ( θ + 1 ) ln γ ,
G ˜ ( n , γ , θ , P X , Q Y , M ) = max 0 , G ˜ 1 ( n , γ , θ , P X , Q Y ) + γ max 0 , G ˜ 2 ( n , γ , θ , P X , Q Y ) γ M ,
S ˜ ( n , γ , θ , P X , Q Y , M ) = min 1 , β ˜ n , γ , θ , P X , Q Y , M + 4 ξ ˜ ( P X , Q Y , θ ) n exp n ln φ ˜ ( P X , Q Y , θ ) θ ln γ ,
and
β ˜ ( n , γ , θ , P X , Q Y , M ) = β ˜ 1 ( n , γ , θ , P X , Q Y ) + γ β ˜ 2 ( n , γ , θ , P X , Q Y ) γ M .
The function β ˜ ( n , γ , θ , P X , Q Y , M ) in (100) is referred to as the saddlepoint approximation of the function C in (76), which is indeed a language abuse.
The following theorem introduces new lower and upper bounds on the function C in (76).
Theorem 7.
Given a pair ( n , M ) N 2 , for all input distributions P X X n subject to (53), for all output distributions Q Y Y n subject to (81) such that for all x X , P Y | X = x is absolutely continuous with respect to Q Y , for all γ 0 , the following holds with respect to the random transformation in (43) subject to (44),
G ˜ ( n , γ , θ , P X , Q Y , M ) C ( n , M , P X , Q Y , γ ) S ˜ ( n , γ , θ , P X , Q Y , M )
where θ is the unique solution in t to
n μ ( P X , t ) = ln γ ,
and the functions C, G ˜ , and S ˜ are defined in (76), (98) and (99).
Proof. 
The proof of Theorem 7 is provided in Appendix G. ☐
Note that, in (101), the parameter γ can be optimized as in (77).

3.4. Numerical Experimentation

The normal and the saddlepoint approximations of the DT and MC bounds as well as their corresponding upper and lower bounds presented from Section 3.2.1 to Section 3.3.2 are studied in the cases of the BSC, the AWGN channel, and the S α S channel. The latter is defined by the random transformation in (43) subject to (44) and for all ( x , y ) X × Y :
P Y | X ( y | x ) = P Z ( y x ) ,
where P Z is a probability distribution satisfying for all t R ,
E P Z exp i t Z = exp σ t α ,
with i = 1 . The reals α ( 0 , 2 ] and σ R + in (104) are parameters of the S α S channel.
In the following figures, Figure 3, Figure 4 and Figure 5, the channel inputs are discrete X = { 1 , 1 } , P X is the uniform distribution, and θ is chosen to be the unique solution to t in (74) or (102) depending on whether the DT or MC bound is considered. For the results relative to the MC bound, Q Y is chosen to be equal to the distribution P Y , i.e., the marginal of P X P Y | X . The parameter γ is chosen to maximize the function C ( n , 2 n R , P X , Q Y , γ ) in (76). The plots in Figure 3a, Figure 4a and Figure 5a illustrate the function T ( n , 2 n R , P X ) in (51) as well as the bounds in Theorems 4 and 5. Figure 3b, Figure 4b and Figure 5b illustrate the function C in (76) and the bounds in Theorems 6 and 7. The normal approximations, i.e, α n , 2 n R , P X in (60) and α ˜ n , 2 n R , P X , Q Y , γ in (87), of the DT and MC bounds, respectively, are plotted in black diamonds. The upper bounds, i.e., N n , 2 n R , P X in (59) and N ˜ n , 2 n R , P X , Q Y , γ in (86), are plotted in blue squares. The lower bounds of the DT and MC bounds, i.e., D n , M , P X in (58) and D ˜ n , 2 n R , P X , Q Y , γ in (85), are non-positive in these cases, and thus do not appear in the figures. The saddlepoint approximations of the DT and MC bounds, i.e., β n , 2 n R , θ , P X in (72) and β ˜ n , γ , θ , P X , Q Y , 2 n R in (100), respectively, are plotted in black stars. The upper bounds, i.e., S n , 2 n R , θ , P X in (71) and S ˜ n , γ , θ , P X , Q Y , 2 n R in (99), are plotted in blue upward-pointing triangles. The lower bounds, i.e., G n , 2 n R , θ , P X in (70) and G ˜ n , γ , θ , P X , Q Y , 2 n R in (98), are plotted in red downward-pointing triangles.
Figure 3 illustrates the case of a BSC with cross-over probability δ = 0 . 11 . The information rates are chosen to be R = 0 . 32 and R = 0 . 42 bits per channel use in Figure 3a,b, respectively. The functions T and C can be calculated exactly and thus they are plotted in magenta asterisks in Figure 3a,b, respectively. In these figures, it can be observed that the saddlepoint approximations of the DT and MC bounds, i.e., β and β ˜ , respectively, overlap with the functions T and C. These observations are in line with those reported in [13]. Therein, the saddlepoint approximations of the RCU bound and the MC bound are both shown to be precise approximations. Alternatively, the normal approximations of the DT and MC bounds, i.e., α and α ˜ , do not overlap with T and C respectively.
In Figure 3, it can be observed that the new bounds on the DT and MC provided in Theorems 5 and 7, respectively, are tighter than those in Theorems 4 and 6. Indeed, the upper-bounds N and N ˜ on the DT and MC bounds derived from the normal approximations α and α ˜ , are several order of magnitude above T and C, respectively. This observation remains valid for AWGN channels in Figure 4 and S α S channels in Figure 5, respectively. Note that, in Figure 3a, for n > 1000 , the normal approximation α is below the lower bound G showing that approximating T by α is too optimistic. These results show that the use of the Berry–Esseen Theorem to approximate the DT and MC bounds may lead to erroneous conclusions due to the uncontrolled error made on the approximation.
Figure 4 and Figure 5 illustrate the cases of a real-valued AWGN channel and a S α S channel, respectively. The signal-to-noise ratio (SNR) is SNR = 1 for the AWGN channel. The information rate is R = 0 . 425 bits per channel use for the AWGN channel and R = 0 . 38 bits per channel use for the S α S channel with ( α , σ ) = ( 1 . 4 , 0 . 6 ) . In both cases, the functions T in (51) and C in (76) can not be computed explicitly and hence does not appear in Figure 4 and Figure 5. In addition, the lower bounds D n , M , P X and D ˜ n , 2 n R , P X , Q Y , γ obtained from Theorems 4 and 6 are non-positive in these cases, and thus, do not appear on these figures.
In Figure 4, note that the saddlepoint approximations, β and β ˜ , are well bounded by Theorems 5 and 7 for a large range of blocklengths. Alternatively, the lower bounds D and D ˜ based on the normal approximation do not even exist in that case.
In Figure 5, note that the upper bounds S and S ˜ on the DT and MC respectively are relatively tight compared to those in AWGN channel case. This characteristic is of a particular importance in a channel such as S α S channel, where the DT and MC bounds remain computable only by Monte Carlo simulations.

4. Discussion and Further Work

One of the main results of this work is Theorem 3, which gives an upper bound on the error induced by the saddlepoint approximation of the CDF of a sum of i.i.d. random variables. This result paves the way to study channel coding problems at any finite blocklength and any constraint on the DEP. In particular, Theorem 3 is used to bound the DT and MC bounds in point-to-point memoryless channels. This leads to tighter bounds than those obtained from Berry–Esseen Theorem (Theorem 1), cf., examples in Section 3.4, particularly for the small values of the DEP.
The bound on the approximation error presented in Theorem 2 uses a triangle inequality in the proof of Lemma A1, which is loose. This is essentially the reason why Theorem 2 is not reduced to the Berry–Esseen Theorem when the parameter θ is equal to zero. An interesting extension of this work is to tighten the inequality in Lemma A1 such that the Berry–Esseen Theorem can be obtained as a special case of Theorem 2, i.e., when θ = 0 . If such improvement on Theorem 2 is possible, Theorem 3 will be significantly improved and it would be more precise everywhere and in particular in the vicinity of the mean of the sum in (1).

Author Contributions

All authors have equally contributed to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the French National Agency for Research (ANR) under grant ANR-16-CE25-0001.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2

The proof of Theorem 2 relies on the notion of exponentially tilted distributions. Let φ Y be the moment generating function of the distribution P Y . Given θ Θ Y , let Y 1 ( θ ) , Y 2 ( θ ) , , Y n ( θ ) be random variables whose joint probability distribution, denoted by P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) , satisfies for all ( y 1 , y 2 , , y n ) R n ,
d P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) d P Y 1 Y 2 Y n ( y 1 , y 2 , , y n ) = exp θ j = 1 n y j φ Y ( θ ) n .
That is, the distribution P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) is an exponentially tilted distribution with respect to P Y 1 Y 2 Y n . Using this notation, for all A R and for all θ Θ Y ,
P X n ( A ) = E P X n [ 1 { X n A } ]
= E P Y 1 Y 2 Y n 1 j = 1 n Y j A
= E P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) d P Y 1 Y 2 Y n d P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) ( Y 1 ( θ ) , Y 2 ( θ ) , , Y n ( θ ) ) 1 j = 1 n Y j ( θ ) A
= E P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) d P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) d P Y 1 Y 2 Y n ( Y 1 ( θ ) , Y 2 ( θ ) , , Y n ( θ ) ) 1 1 j = 1 n Y j ( θ ) A
= E P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) exp θ j = 1 n Y j ( θ ) φ Y ( θ ) n 1 1 j = 1 n Y j ( θ ) A
= φ Y ( θ ) n E P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) exp θ j = 1 n Y j ( θ ) 1 j = 1 n Y j ( θ ) A
For the ease of the notation, consider the random variable
S n , θ = j = 1 n Y j ( θ ) ,
whose probability distribution is denoted by P S n , θ . Hence, plugging (A3) in (A2f) yields,
P X n ( A ) = φ Y ( θ ) n E P S n , θ exp θ S n , θ 1 S n , θ A .
The proof continues by upper bounding the following absolute difference
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A ,
where Z n , θ is a Gaussian random variable with the same mean and variance as S n , θ , and probability distribution denoted by P Z n , θ . The relevance of the absolute difference in (A5) is that it is equal to the error of calculating P X n ( A ) under the assumption that the resulting random variable S n follows a Gaussian distribution. The following lemma provides an upper bound on the absolute difference in (A5) in terms of the Kolmogorov–Smirnov distance between the distributions P S n , θ and P Z n , θ , denoted by
Δ P S n , θ , P Z n , θ sup x R F S n , θ ( x ) F Z n , θ ( x ) ,
where F S n , θ and F Z n , θ are the CDFs of the random variables S n , θ and Z n , θ , respectively.
Lemma A1.
Given θ Θ Y and a R , consider the following conditions:
(i) 
θ 0 and A = , a , and
(ii) 
θ > 0 and A = a , .
If at least one of the above conditions is satisfied, then the absolute difference in (A5) satisfies
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A φ Y ( θ ) n exp ( θ a ) min 1 , 2 Δ ( P S n , θ , P Z n , θ ) .
Proof. 
The proof of Lemma A1 is presented in Appendix D. ☐
The proof continues by providing an upper bound on Δ P S n , θ , P Z n , θ in (A7) leveraging the observation that S n , θ is the sum of n independent and identically distributed random variables. This follows immediately from the assumptions of Theorem 2, nonetheless, for the sake of completeness, the following lemma provides a proof of this statement.
Lemma A2.
For all θ Θ Y , Y 1 ( θ ) , Y 2 ( θ ) , …, Y n ( θ ) are mutually independent and identically distributed random variables with probability distribution P Y ( θ ) . Moreover, P Y ( θ ) is an exponential tilted distribution with respect to P Y . That is, P Y ( θ ) satisfies for all y R ,
d P Y ( θ ) d P Y ( y ) = exp θ y φ Y ( θ ) .
Proof. 
The proof of Lemma A2 is presented in Appendix E. ☐
Lemma A2 paves the way for obtaining an upper bound on Δ P S n , θ , P Z n , θ in (A7) via the Berry–Esseen Theorem (Theorem 1). Let μ θ , V θ , and T θ be the mean, the variance, and the third absolute central moment of the random variable Y ( θ ) , whose probability distribution is P Y ( θ ) in (A8). More specifically:
μ θ = E P Y ( θ ) [ Y ( θ ) ] = E P Y Y exp ( θ Y ) φ Y ( θ ) ,
V θ = E P Y ( θ ) [ ( Y ( θ ) μ θ ) 2 ] = E P Y ( Y μ θ ) 2 exp ( θ Y ) φ Y ( θ ) , and
T θ = E P Y ( θ ) [ | Y ( θ ) μ θ | 3 ] = E P Y | Y μ θ | 3 exp ( θ Y ) φ Y ( θ ) .
Let also ξ θ be
ξ θ = c 1 T θ V θ 3 / 2 + c 2 ,
with c 1 and c 2 defined in (23).
From Theorem 1, it follows that Δ P S n , θ , P Z n , θ in (A7) satisfies:
Δ ( P S n , θ , P Z n , θ ) min 1 , ξ θ n ξ θ n .
Plugging (A13) in (A7) yields
P X n ( A ) φ Y ( θ ) n exp ( θ b ) E P Z n , θ exp θ Z n , θ 1 Z n , θ A φ Y ( θ ) n exp ( θ a ) min 1 , 2 ξ θ n ,
under the assumption that at least one of the conditions of Lemma A1 is met.
The proof ends by obtaining a closed-form expression of the term E P Z n , θ [ exp θ Z n , θ 1 { Z n , θ A } ] in (A14) under the assumption that at least one of the conditions of Lemma A1 is met. First, assuming that condition ( i ) in Lemma A1 holds, it follows that:
E P Z n , θ exp θ Z n , θ 1 { Z n , θ A } = a exp θ z 1 2 π n V θ exp ( z n μ θ ) 2 2 n V θ d z
= a 1 2 π n V θ exp z 2 2 z n μ θ + n 2 μ θ 2 + 2 n θ V θ z 2 n V θ d z
= a 1 2 π n V θ exp ( z n μ θ + n θ V θ ) 2 n 2 θ 2 V θ 2 + 2 n μ θ n θ V θ 2 n V θ d z
= exp θ n μ θ + 1 2 n V θ θ 2 a 1 2 π n V θ exp ( z n μ θ + n θ V θ ) 2 2 n V θ d z
= exp θ n μ θ + 1 2 n V θ θ 2 a n μ θ + n θ V θ n V θ 1 2 π exp t 2 2 d t
= exp θ n μ θ + 1 2 n V θ θ 2 Q a n μ θ + n θ V θ n V θ .
Second, assuming that condition ( i i ) in Lemma A1 holds, it follows that:
E P Z n , θ exp θ Z n , θ 1 { Z n , θ A } = a exp θ z 1 2 π n V θ exp ( z n μ θ ) 2 2 n V θ d z
= exp θ n μ θ + 1 2 n V θ θ 2 a n μ θ + n θ V θ n V θ 1 2 π exp t 2 2 d t
= exp θ n μ θ + 1 2 n V θ θ 2 Q a n μ θ + n θ V θ n V θ ,
where Q in n (A15f) and (A16c) is the complementary CDF of the standard Gaussian distribution defined in (13).
The expressions in (A15f) and (A16c) can be jointly written as follows:
E P Z n , θ exp θ Z n , θ 1 { Z n , θ A } = exp θ n μ θ + 1 2 n V θ θ 2 Q ( 1 ) 1 { θ 0 } a n μ θ + n θ V θ n V θ ,
under the assumption that at least one of the conditions ( i ) or ( i i ) in Lemma A1 holds.
Finally, under the same assumption, plugging (A17) in (A14) yields
| P X n ( A ) exp n ln φ Y ( θ ) n θ μ θ + 1 2 n θ 2 V θ Q ( 1 ) 1 { θ 0 } a + n θ V θ n μ θ n V θ | exp n ln φ Y ( θ ) θ a min 1 , 2 ξ θ n .
Under condition ( i ) in Lemma A1, the inequality in (A18) can be written as follows:
| F X n ( a ) exp n ln φ Y ( θ ) n θ μ θ + 1 2 n θ 2 V θ · Q ( 1 ) 1 { θ 0 } a + n θ V θ n μ θ n V θ | exp n ln φ Y ( θ ) θ a min 1 , 2 ξ θ n .
Alternatively, under condition ( i i ) in Lemma A1, it follows from (A18) that
| 1 F X n ( a ) exp n ln φ Y ( θ ) n θ μ θ + 1 2 n θ 2 V θ · Q ( 1 ) 1 { θ 0 } a + n θ V θ n μ θ n V θ | exp n ln φ Y ( θ ) θ a min 1 , 2 ξ θ n ,
Then, jointly writing (A19) and (A20), it follows that, for all a R and for all θ Θ Y ,
F X n ( a ) 1 { θ > 0 } ( 1 ) 1 { θ > 0 } exp n ln φ Y ( θ ) n θ μ θ + 1 2 n θ 2 V θ Q ( 1 ) 1 { θ 0 } a + n θ V θ n μ θ n V θ exp n ln φ Y ( θ ) θ a min 1 , 2 ξ θ n ,
which can also be written as
F X n ( a ) η Y θ , a , n exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n .
This completes the proof.

Appendix B. Proof of Lemma 1

Let g : R 2 × N R be for all ( θ , a , n ) R 2 × N ,
g ( θ , a , n ) = n K Y ( θ ) θ a = n ln φ Y ( θ ) θ a .
First, note that for all θ Θ Y and for all n N , the function g is a concave function of a. Hence, from the definition of the function h in (33), h is concave.
Second, note that 0 Θ Y given that φ Y ( 0 ) = 1 < . Hence, from (33), it holds that, for all a R ,
h ( a ) n K Y ( 0 ) = n ln φ Y ( 0 ) = n ln 1 = 0 .
This shows that the function h in (33) is not positive.
Third, the next step of the proof consists of proving the equality in (35). For doing so, let θ : R × N R be for all ( a , n ) R × N ,
θ ( a , n ) = arg inf θ Θ Y g ( θ , a , n ) .
Note that the function g is a convex in θ . This follows by verifying that its second derivative with respect to θ is positive. That is,
d d θ g ( θ , a , n ) = n φ Y ( θ ) d d θ φ Y ( θ ) a , and
d 2 d θ 2 g ( θ , a , n ) = n φ Y ( θ ) 2 φ Y ( θ ) d 2 d θ 2 φ Y ( θ ) d d θ φ Y ( θ ) 2
= n 1 φ Y ( θ ) d 2 d θ 2 φ Y ( θ ) 1 φ Y ( θ ) d d θ φ Y ( θ ) 2
= n 1 φ Y ( θ ) d 2 d θ 2 E P Y [ exp ( θ Y ) ] 1 φ Y ( θ ) d d θ E P Y [ exp ( θ Y ) ] 2
= n E P Y [ Y 2 exp ( θ Y ) ] E P Y [ exp ( θ Y ) ] E P Y [ Y exp ( θ Y ) ] E P Y [ exp ( θ Y ) ] 2
= n E P Y Y 2 exp ( θ Y ) E P Y [ exp ( θ Y ) ] E P Y Y exp ( θ Y ) E P Y [ exp ( θ Y ) ] 2 = n E P Y Y 2 exp ( θ Y ) E P Y [ exp ( θ Y ) ] 2 E P Y Y exp ( θ Y ) E P Y [ exp ( θ Y ) ] K Y ( 1 ) ( θ ) + K Y ( 1 ) ( θ ) 2
= n E P Y Y K Y ( 1 ) ( θ ) 2 exp ( θ Y ) E P Y [ exp ( θ Y ) ] > 0 .
Hence, if the first derivative of g with respect to θ (see (A26a)) admits a zero in Θ Y , then θ ( a , n ) is the unique solution in θ to the following equality:
d d θ g ( θ , a , n ) = n φ Y ( θ ) d d θ φ Y ( θ ) a = 0 .
Equation (A27) in θ can be rewritten as follows:
a n = 1 φ Y ( θ ) d d θ φ Y ( θ )
= 1 E P Y [ exp ( θ Y ) ] d d θ E P Y [ exp ( θ Y ) ]
= 1 E P Y [ exp ( θ Y ) ] E P Y [ Y exp ( θ Y ) ]
= E P Y Y exp ( θ Y ) E P Y [ exp ( θ Y ) ]
= K Y ( 1 ) ( θ ) .
From (A28d), it follows that a n is the mean of a random variable that follows an exponentially tilted distribution with respect to P Y . Thus, there exists a solution in θ for (A28d) if and only if a n int C Y —hence the equality in (35).
Finally, from (A28d), a = n E P Y [ Y ] implies that θ ( a , n ) = 0 . Hence, h ( n E P Y [ Y ] ) = 0 from (35). This completes the proof for h ( n E P Y [ Y ] ) = 0 .

Appendix C. Proof of Theorem 3

From Lemma 1, it holds that given ( a , n ) R × N such that a n int C Y ,
n K Y ( 1 ) ( θ ) = a .
Then, plugging (A29) in the expression of η Y ( θ , a , n ) , with function η Y defined in (28), the following holds:
η Y ( θ , a , n ) = 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp 1 2 n ( θ ) 2 K Y ( 2 ) ( θ ) + n K Y ( θ ) θ a Q ( 1 ) 1 { θ 0 } a + n θ K Y ( 2 ) ( θ ) a n K Y ( 2 ) ( θ )
= 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp 1 2 n ( θ ) 2 K Y ( 2 ) ( θ ) + n K Y ( θ ) θ a Q ( 1 ) 1 { θ 0 } θ n K Y ( 2 ) ( θ )
= 1 { θ > 0 } + ( 1 ) 1 { θ > 0 } exp 1 2 n ( θ ) 2 K Y ( 2 ) ( θ ) + n K Y ( θ ) θ a Q | θ | n K Y ( 2 ) ( θ )
= F ^ X n ( a ) ,
where equality in (A30d) follows (12). Finally, plugging (A30d) in (29) yields
F X n ( a ) F ^ X n ( a ) exp n K Y ( θ ) θ a min 1 , 2 ξ Y ( θ ) n .
This completes the proof by observing that a n int C Y is equivalent to a int C X n .

Appendix D. Proof of Lemma A1

The left-hand side of (A7) satisfies
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A = φ Y ( θ ) n E P S n , θ exp θ S n , θ 1 S n , θ A E P Z n , θ exp θ Z n , θ 1 Z n , θ A .
The focus is on obtaining explicit expressions for the terms E P S n , θ exp θ S n , θ 1 S n , θ A and E P Z n , θ exp θ Z n , θ 1 Z n , θ A in (A32). First, consider the case in which the random variable S n , θ is absolutely continuous and denote its probability density function by f S n , θ and its CDF by F S n , θ . Then,
E P S n , θ exp θ S n , θ 1 { S n , θ A } = A exp θ x f S n , θ ( x ) d x .
Using integration by parts in (A33), under the assumption ( i ) or ( i i ) in Lemma A1, the following holds:
E P S n , θ exp θ S n , θ 1 { S n , θ A } = ( 1 ) 1 { θ > 0 } exp θ a F S n , θ ( a ) A θ exp θ x F S n , θ ( x ) d x .
Second, consider the case in which the random variable S n , θ is discrete and denote its probability mass function by p S n , θ and its CDF by F S n , θ . Let the support of S n , θ be { s 0 , s 1 , , s } R , with N . Assume that condition ( i ) in Lemma A1 is satisfied. Then,
A { s 0 , s 1 , , s l } = { s 0 , s 1 , , s u } ,
with u , and
E P S n , θ exp θ S n , θ 1 { S n , θ A } = k = 0 u exp θ s k p S n , θ ( s k )
= F S n , θ ( s 0 ) exp θ s 0 + k = 1 u F S n , θ ( s k ) F S n , θ ( s k 1 ) exp θ s k
= k = 0 u F S n , θ ( s k ) exp θ s k k = 1 u F S n , θ ( s k 1 ) exp θ s k
= k = 0 u F S n , θ ( s k ) exp θ s k k = 0 u 1 F S n , θ ( s k ) exp θ s k + 1
= F S n , θ ( s u ) exp θ s u k = 0 u 1 F S n , θ ( s k ) exp θ s k + 1 exp θ s k
= F S n , θ ( s u ) exp θ s u k = 0 u 1 s k s k + 1 θ exp θ t F S n , θ ( s k ) d t
= F S n , θ ( s u ) exp θ s u s 0 s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a F S n , θ ( a ) exp θ a + F S n , θ ( s u ) exp θ s u s 0 s u F S n , θ ( t ) θ exp θ t d t
= F S n , θ ( a ) exp θ a F S n , θ ( s u ) exp θ a + F S n , θ ( s u ) exp θ s u s 0 s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a F S n , θ ( s u ) exp θ a exp θ s u s 0 s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a s u a θ exp θ t F S n , θ ( s u ) d t s 0 s u θ exp θ t F S n , θ ( t ) d t
= exp θ a F S n , θ ( a ) s 0 a θ exp θ t F S n , θ ( t ) d t
= exp θ a F S n , θ ( a ) a θ exp θ t F S n , θ ( t ) d t ,
which is an expression of the same form as the one in (A34). Alternatively, assume that condition ( i i ) in Lemma A1 holds. Then,
A { s 0 , s 1 , , s l } = { s u , s u + 1 , , s l } ,
with u , and
E P S n , θ exp θ S n , θ 1 { S n , θ A } = k = u l exp θ s k p S n , θ ( s k )
= F S n , θ ( s u ) F S n , θ ( a ) exp θ s u + k = u + 1 l F S n , θ ( s k ) F S n , θ ( s k 1 ) exp θ s k
= F S n , θ ( a ) exp θ s u + k = u l F S n , θ ( s k ) exp θ s k k = u + 1 l F S n , θ ( s k 1 ) exp θ s k
= F S n , θ ( a ) exp θ s u + k = u l F S n , θ ( s k ) exp θ s k k = u l 1 F S n , θ ( s k ) exp θ s k + 1
= F S n , θ ( s l ) exp θ s l F S n , θ ( a ) exp θ s u k = u l 1 F S n , θ ( s k ) exp θ s k + 1 exp θ s k
= F S n , θ ( a ) exp θ s u s l θ exp θ s t F S n , θ ( s l ) d t k = u l 1 s k s k + 1 θ exp θ t F S n , θ ( s k ) d t
= F S n , θ ( a ) exp θ s u s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a F S n , θ ( a ) exp θ a F S n , θ ( a ) exp θ s u s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a F S n , θ ( a ) exp θ s u exp θ a s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a a s u θ exp θ t F S n , θ ( a ) d t s u θ exp θ t F S n , θ ( t ) d t
= F S n , θ ( a ) exp θ a a θ exp θ t F S n , θ ( t ) d t ,
which is an expression of the same form as those in (A34) and (A36m).
Note that, under the assumption that at least one of the conditions in Lemma A1 holds, the expressions in (A34), (A36m), and (A38k) can be jointly written as follows:
E P S n , θ exp θ S n , θ 1 { S n , θ A } = ( 1 ) 1 { θ > 0 } exp θ a F S n , θ ( a ) A θ exp θ x F S n , θ ( x ) d x .
The expression in (A39) does not involve particular assumptions on the random variable S n , θ other than being discrete or absolutely continuous. Hence, the same expression holds with respect to the random variable Z n , θ in (A32). More specifically,
E P Z n , θ exp θ Z n , θ 1 { Z n , θ A } = ( 1 ) 1 { θ > 0 } exp θ a F Z n , θ ( a ) A θ exp θ x F Z n , θ ( x ) d x ,
where F Z n , θ is the CDF of the random variable Z n , θ .
The proof ends by plugging (A39) and (A40) into the right-hand side of (A32). This yields
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A = φ Y ( θ ) n | ( 1 ) 1 { θ > 0 } exp θ a F S n , θ ( a ) A θ exp θ x F S n , θ ( x ) d x ( 1 ) 1 { θ > 0 } exp θ a F Z n , θ ( a ) + A θ exp θ x F Z n , θ ( x ) d x |
= φ Y ( θ ) n ( 1 ) 1 { θ > 0 } exp a F S n , θ ( a ) F Z n , θ ( a ) A θ exp θ x F S n , θ ( x ) F Z n , θ ( x ) d x
φ Y ( θ ) n exp θ a F S n , θ ( a ) F Z n , θ ( a ) + A θ exp θ x F S n , θ ( x ) F Z n , θ ( x ) d x
φ Y ( θ ) n exp θ a Δ P S n , θ , P Z n , θ + A θ exp θ x Δ P S n , θ , P Z n , θ d x
= φ Y ( θ ) n exp θ a Δ P S n , θ , P Z n , θ + Δ P S n , θ , P Z n , θ A θ exp θ x d x
= φ Y ( θ ) n exp θ a Δ P S n , θ , P Z n , θ + Δ P S n , θ , P Z n , θ exp θ a
= 2 φ Y ( θ ) n exp ( θ a ) Δ P S n , θ , P Z n , θ .
Finally, under the assumption that at least one of the conditions in Lemma A1 holds, then
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A (A42a) φ Y ( θ ) n max E P S n , θ exp θ S n , θ 1 S n , θ A , E P Z n , θ exp θ Z n , θ 1 Z n , θ A (A42b) φ Y ( θ ) n exp θ a = φ Y ( θ ) n exp ( θ a ) .
Under the same assumption, the expressions in (A41g) and (A42b) can be jointly written as follows:
P X n ( A ) φ Y ( θ ) n E P Z n , θ exp θ Z n , θ 1 Z n , θ A φ Y ( θ ) n exp ( θ a ) min 2 Δ P S n , θ , P Z n , θ , 1 .
This concludes the proof of Lemma A1.

Appendix E. Proof of Lemma A2

In the case in which Y is discrete ( p Y , p Y ( θ ) , p Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) denote probability mass functions) or absolutely continuous random variables ( p Y , p Y ( θ ) , p Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) denote probability density functions), the following holds for all ( y 1 , y 2 , , y n ) R n ,
d P Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) d P Y 1 Y 2 Y n ( y 1 , y 2 , , y n ) = p Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) ( y 1 , y 2 , , y n ) j = 1 n p Y ( y j ) ,
and for all y R ,
d P Y ( θ ) d P Y ( y ) = p Y ( θ ) ( y ) p Y ( y ) .
Equating the right-hand side of both (A1) and (A44), it yields for all ( y 1 , y 2 , , y n ) R n
p Y 1 ( θ ) Y 2 ( θ ) Y n ( θ ) ( y 1 , y 2 , , y n ) = j = 1 n exp θ y j φ Y ( θ ) p Y ( y j ) .
Hence, Y 1 ( θ ) , Y 2 ( θ ) , , Y n ( θ ) are mutually independent and identically distributed. Moreover, for all y R ,
p Y ( θ ) ( y ) = exp θ y φ Y ( θ ) p Y ( y ) .
Finally, plugging (A47) in (A45) yields, for all y R ,
d P Y ( θ ) d P Y ( y ) = exp θ y φ Y ( θ ) ,
which completes the proof.

Appendix F. Proof of Theorem 5

For a fixed product probability input distribution P X in (53) and for the random transformation in (44), the upper bound T ( n , M , P X ) in (51) can be written in the form of a weighted sum of the CDF and the complementary CDF of the random variables variables W n and V n that are sums of i.i.d random variables, respectively. That is,
W n = t = 1 n ι ( X t ; Y t ) , and
V n = t = 1 n ι ( X ¯ t ; Y t ) ,
where ( X t , Y t ) P X P Y | X and ( X ¯ t , Y t ) P X ¯ P Y with P X = P X ¯ . More specifically, the function T in (51) can be rewritten in the form
T ( n , M , P X ) = F W n ln M 1 2 + M 1 2 1 F V n ln M 1 2 ,
where F W n and F V n are the CDFs of W n and V n , respectively.
The next step derives the upper and lower bounds on F W n ln M 1 2 and 1 F V n ln M 1 2 by using the result of Theorem 3. That is,
F W n ln M 1 2
ζ ι ( X ; Y ) θ , ln M 1 2 , n + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n , F W n ln M 1 2
ζ ι ( X ; Y ) θ , ln M 1 2 , n exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n , 1 F V n ln M 1 2
1 ζ ι ( X ¯ ; Y ) θ , ln M 1 2 , n + exp n ln φ ι ( X ¯ ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ¯ ; Y ) ( θ ) n , and 1 F V n ln M 1 2
1 ζ ι ( X ¯ ; Y ) θ , ln M 1 2 , n exp n ln φ ι ( X ¯ ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ¯ ; Y ) ( θ ) n ,
where θ and τ satisfy
n μ ι ( X ; Y ) ( θ ) = ln M 1 2 = n μ ι ( X ¯ ; Y ) ( τ ) ,
and for all t R ,
φ ι ( X ; Y ) ( t ) = E P X P Y | X exp ( t ι ( X ; Y ) ) ,
φ ι ( X ¯ ; Y ) ( t ) = E P X ¯ P Y exp t ι ( X ¯ ; Y ) ,
ξ ι ( X ; Y ) ( t ) = c 1 E P X P Y | X ι ( X ; Y ) μ ι ( X ; Y ) ( t ) 3 exp ( t ι ( X ; Y ) ) φ ι ( X ; Y ) ( t ) V ι ( X ; Y ) ( t ) 3 / 2 + c 2 ,
ξ ι ( X ¯ ; Y ) ( t ) = c 1 E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ¯ ; Y ) ( t ) 3 exp t ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t ) V ι ( X ¯ ; Y ) ( t ) 3 / 2 + c 2 ,
μ ι ( X ; Y ) ( t ) = E P X P Y | X ι ( X ; Y ) exp ( t ι ( X ; Y ) ) φ ι ( X ; Y ) ( t ) ,
μ ι ( X ¯ ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) exp t ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t ) ,
V ι ( X ; Y ) ( t ) = E P X P Y | X ι ( X ; Y ) μ ι ( X ; Y ) ( t ) 2 exp ( t ι ( X ; Y ) ) φ ι ( X ; Y ) ( t ) ,
V ι ( X ¯ ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ¯ ; Y ) ( t ) 2 exp t ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t ) ,
with c 1 and c 2 defined in (23); and for all ( t , a , n ) R 2 × N
ζ ι ( X ; Y ) ( t , a , n )
1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ( X ; Y ) ( t ) + n ln φ ι ( X ; Y ) ( t ) t a Q | t | n V ι ( X ; Y ) ( t ) , ζ ι ( X ¯ ; Y ) ( t , a , n )
1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ( X ¯ ; Y ) ( t ) + n ln φ ι ( X ¯ ; Y ) ( t ) t a Q | t | n V ι ( X ¯ ; Y ) ( t ) .
The next step simplifies the expressions on the right hand-side of (A54) and (A55) by studying the relation between φ ι ( X ; Y ) and φ ι ( X ¯ ; Y ) , θ and τ , V ι ( X ; Y ) and V ι ( X ¯ ; Y ) , ξ ι ( X ; Y ) and ξ ι ( X ¯ ; Y ) .
First, from (A57), using the change of measure from P X P Y | X to P X ¯ P Y because P X P Y | X is absolutely continuous with respect to P X ¯ P Y , it holds that
φ ι ( X ; Y ) ( t ) = E P X ¯ P Y d P X P Y | X d P X ¯ P Y X ¯ ; Y exp ( t ι ( X ¯ ; Y ) )
= E P X ¯ P Y exp ( t + 1 ) ι ( X ¯ ; Y ) .
Then, from (A57) and (A58), it holds that
φ ι ( X ; Y ) ( t ) = φ ι ( X ¯ ; Y ) ( t + 1 ) .
This concludes the relation between φ ι ( X ; Y ) and φ ι ( X ¯ ; Y ) .
Second, from (A61), using the change of measure from P X P Y | X to P X ¯ P Y , it holds that
μ ι ( X ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) exp ( t ι ( X ¯ ; Y ) ) φ ι ( X ; Y ) ( t ) d P X P Y | X d P X ¯ P Y X ¯ ; Y
= E P X ¯ P Y ι ( X ¯ ; Y ) exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ; Y ) ( t ) .
Then, from (A69) and (A71), it holds that
μ ι ( X ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t + 1 ) .
From (A62) and (A72), it holds that
μ ι ( X ; Y ) ( t ) = μ ι ( X ¯ ; Y ) ( t + 1 ) .
This concludes the relation between μ ι ( X ; Y ) and μ ι ( X ¯ ; Y ) .
Third, from (A56) and (A73), it holds that
τ = θ + 1 .
This concludes the relation between τ and θ .
Fourth, from (A63), using the change of measure from P X P Y | X to P X ¯ P Y , it holds that
V ι ( X ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ; Y ) ( t ) 2 exp ( t ι ( X ¯ ; Y ) ) φ ι ( X ; Y ) ( t ) d P X P Y | X d P X ¯ P Y X ¯ ; Y
= E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ; Y ) ( t ) 2 exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ; Y ) ( t ) .
From (A69), (A73), and (A76), it holds that
V ι ( X ; Y ) ( t ) = E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ¯ ; Y ) ( t + 1 ) 2 exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t + 1 ) .
From (A64) and (A77), it holds that
V ι ( X ; Y ) ( t ) = V ι ( X ¯ ; Y ) ( t + 1 ) .
This concludes the relation between V ι ( X ; Y ) and V ι ( X ¯ ; Y ) .
Fifth, from (A59), using the change of measure from P X P Y | X to P X ¯ P Y , it holds that
ξ ι ( X ; Y ) ( t ) = c 1 E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ; Y ) ( t ) 3 exp ( t ι ( X ¯ ; Y ) ) φ ι ( X ; Y ) ( t ) d P X P Y | X d P X ¯ P Y X ¯ ; Y V ι ( X ; Y ) ( t ) 3 / 2 + c 2
= c 1 E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ; Y ) ( t ) 3 exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ; Y ) ( t ) V ι ( X ; Y ) ( t ) 3 / 2 + c 2 .
From (A69), (A73), (A78), and (A80), it holds that
ξ ι ( X ; Y ) ( t ) = c 1 E P X ¯ P Y ι ( X ¯ ; Y ) μ ι ( X ¯ ; Y ) ( t + 1 ) 3 exp ( t + 1 ) ι ( X ¯ ; Y ) φ ι ( X ¯ ; Y ) ( t + 1 ) V ι ( X ¯ ; Y ) ( t + 1 ) 3 / 2 + c 2 .
From (A60) and (A81), it holds that
ξ ι ( X ; Y ) ( t ) = ξ ι ( X ¯ ; Y ) ( t + 1 ) .
This concludes the relation between ξ ι ( X ; Y ) and ξ ι ( X ¯ ; Y ) .
Sixth, plugging (A69), (A73), and (A78) into (A65), for all t R , it holds that
ζ ι ( X ¯ ; Y ) ( t , a , n ) 1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ( X ; Y ) ( t 1 ) + n ln φ ι ( X ; Y ) ( t 1 ) t a Q | t | n V ι ( X ; Y ) ( t 1 ) .
Then, from (67) and (A83), it holds that
ζ ι ( X ¯ ; Y ) t , ln M 1 2 , n = 1 β 2 ( n , M , t 1 , P X ) .
Then, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A54), it holds that
1 F V n ln M 1 2
β 2 ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ + 1 ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n
β 2 ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ + 1 ln M 1 2 2 ξ ι ( X ; Y ) ( θ ) n .
Alternatively, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A55), it holds that
1 F V n ln M 1 2
β 2 ( n , M , θ , P X ) exp n ln φ ι ( X ; Y ) ( θ ) θ + 1 ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n
β 2 ( n , M , θ , P X ) exp n ln φ ι ( X ; Y ) ( θ ) θ + 1 ln M 1 2 2 ξ ι ( X ; Y ) ( θ ) n
= G 2 ( n , M , θ , P X ) ,
where the equality in (A89) follows from (69). Observing that 1 F V n is a positive function, then from (A88), it holds that
1 F V n ln M 1 2 max 0 , G 2 ( n , M , θ , P X ) .
Seventh, from (66) and (A65), it holds that
ζ ι ( X ; Y ) t , ln M 1 2 , n = β 1 ( n , M , t , P X ) .
Then, plugging (A69), (A73), (A74), (A78), (A82), and (A91) into the right hand-side of (A52), it holds that
F W n ln M 1 2
β 1 ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n
β 1 ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 2 ξ ι ( X ; Y ) ( θ ) n .
Alternatively, plugging (A69), (A73), (A74), (A78), (A82), and (A84) into the right hand-side of (A53), it holds that
F W n ln M 1 2
β 1 ( n , M , θ , P X ) exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 min 1 , 2 ξ ι ( X ; Y ) ( θ ) n
β 1 ( n , M , θ , P X ) exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 2 ξ ι ( X ; Y ) ( θ ) n
= G 1 ( n , M , θ , P X ) ,
where the equality in (A96) follows from (68). Observing that F W n is a positive function, then, from (A95), it holds that
F W n ln M 1 2 max 0 , G 1 ( n , M , θ , P X ) .
Finally, plugging (A86) and (A93) in (A51), it holds that
T ( n , M , P X )
β 1 ( n , M , θ , P X ) + M 1 2 β 2 ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 4 ξ ι ( X ; Y ) ( θ ) n
= β ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 4 ξ ι ( X ; Y ) ( θ ) n ,
where the equality in (A99) follows from (72). Observing that T ( n , M , P X ) 1 , from (A99), it holds that
T ( n , M , P X ) min 1 , β ( n , M , θ , P X ) + exp n ln φ ι ( X ; Y ) ( θ ) θ ln M 1 2 4 ξ ι ( X ; Y ) ( θ ) n
= S ( n , M , θ , P X ) ,
where the equality in (A96) follows from (71).
Alternatively, plugging (A89) and (A96) in (A51), it holds that
T ( n , M , P X ) max 0 , G 1 ( n , M , θ , P X ) + M 1 2 max 0 , G 2 ( n , M , θ , P X )
= G ( n , M , θ , P X ) ,
where the equality in (A96) follows from (71). Combining (A101) and (A103) concludes the proof.

Appendix G. Proof of Theorem 7

Note that, for given distributions P X subject (53), Q Y subject to (81), and for a random transformation in (43) subject to (44), the lower bound C ( n ,M, P X , Q Y , γ ) in (76) can be written in the form of a weighted sum of the CDF and the complementary CDF of the random variables variables W n and V n that are sums of i.i.d random variables, respectively. That is,
W n = t = 1 n ι ˜ ( X t ; Y t | Q Y ) ,
V n = t = 1 n ι ˜ ( X ¯ t ; Y t | Q Y ) ,
where ( X t , Y t ) P X P Y | X and ( X ¯ t , Y t ) P X ¯ Q Y with P X = P X ¯ . More specifically, the function C in (76) can be written in the form
C ( n , M , P X , Q Y , γ ) = F W n ln γ + γ 1 F V n ln γ γ M ,
where F W n and F V n are the CDFs of the random variables W n and V n , respectively.
The next step derives the upper and lower bounds on F W n ln γ and 1 F V n ln γ by using the result of Theorem 3. That is,
F W n ln γ ζ ι ˜ ( X ; Y | Q Y ) ( θ , ln γ , n ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n ,
F W n ln γ ζ ι ˜ ( X ; Y | Q Y ) ( θ , ln γ , n ) exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n ,
1 F V n ln γ 1 ζ ι ˜ ( X ¯ ; Y | Q Y ) ( θ , ln γ , n ) + exp n ln φ ι ˜ ( X ¯ ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ¯ ; Y | Q Y ) ( θ ) n ,
and
1 F V n ln γ 1 ζ ι ˜ ( X ¯ ; Y | Q Y ) ( θ , ln γ , n ) exp n ln φ ι ˜ ( X ¯ ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ¯ ; Y | Q Y ) ( θ ) n ,
where θ and τ satisfy
n μ ι ˜ ( X ; Y | Q Y ) ( θ ) = ln γ = n μ ι ˜ ( X ¯ ; Y | Q Y ) ( τ ) ,
and for all t R
φ ι ˜ ( X ; Y | Q Y ) ( t ) = E P X P Y | X exp t ι ˜ ( X ; Y | Q Y ) ,
φ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) = E P X ¯ Q Y exp t ι ˜ ( X ¯ ; Y | Q Y ) ,
ξ ι ˜ ( X ; Y | Q Y ) ( t ) = c 1 E P X P Y | X ι ˜ ( X ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 3 exp ( t ι ˜ ( X ; Y | Q Y ) ) φ ι ˜ ( X ; Y | Q Y ) ( t ) V ι ˜ ( X ; Y | Q Y ) ( t ) 3 / 2 + c 2 ,
ξ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) = c 1 E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) 3 exp t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) V ι ˜ ( X ¯ ; Y | Q Y ) ( t ) 3 / 2 + c 2 ,
μ ι ˜ ( X ; Y | Q Y ) ( t ) = E P X P Y | X ι ˜ ( X ; Y | Q Y ) exp ( t ι ˜ ( X ; Y | Q Y ) ) φ ι ˜ ( X ; Y | Q Y ) ( t ) ,
μ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) exp t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) ,
V ι ˜ ( X ; Y | Q Y ) ( t ) = E P X P Y | X ι ˜ ( X ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 2 exp ( t ι ˜ ( X ; Y | Q Y ) ) φ ι ˜ ( X ; Y | Q Y ) ( t ) ,
V ι ˜ ( X ¯ ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) 2 exp t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) ,
with c 1 and c 2 defined in (23); and for all ( t , a , n ) R 2 × N
ζ ι ˜ ( X ; Y | Q Y ) ( t , a , n )
1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ˜ ( X ; Y | Q Y ) ( t ) + n ln φ ι ˜ ( X ; Y | Q Y ) ( t ) t a Q | t | n V ι ˜ ( X ; Y | Q Y ) ( t ) , ζ ι ˜ ( X ¯ ; Y | Q Y ) ( t , a , n )
1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ˜ ( X ¯ ; Y | Q Y ) ( t ) + n ln φ ι ˜ ( X ¯ ; Y | Q Y ) ( t ) t a Q | t | n V ι ˜ ( X ¯ ; Y | Q Y ) ( t ) .
The next step simplifies the expressions on the right hand-side of (A109) and (A110) by studying the relation between φ ι ˜ ( X ; Y | Q Y ) and φ ι ˜ ( X ¯ ; Y | Q Y ) , θ and τ , V ι ˜ ( X ; Y | Q Y ) and V ι ˜ ( X ¯ ; Y | Q Y ) , ξ ι ˜ ( X ; Y | Q Y ) and ξ ι ˜ ( X ¯ ; Y | Q Y ) when the P Y | X is absolutely continuous with respect to Q Y .
First, from (A112), using the change of measure from P X P Y | X to P X ¯ Q Y because P X P Y | X is absolutely continuous with respect to P X ¯ Q Y , it holds that
φ ι ˜ ( X ; Y | Q Y ) ( t ) = E P X ¯ Q Y d P X P Y | X d P X ¯ Q Y X ¯ ; Y exp ( t ι ˜ ( X ¯ ; Y | Q Y )
= E P X ¯ Q Y exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) .
Then, from (A112) and (A113), it holds that
φ ι ˜ ( X ; Y | Q Y ) ( t ) = φ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
This concludes the relation between φ ι ˜ ( X ; Y | Q Y ) and φ ι ˜ ( X ¯ ; Y | Q Y ) .
Second, from (A116), using the change of measure from P X P Y | X to P X ¯ Q Y , it holds that
μ ι ˜ ( X ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) exp ( t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) d P X P Y | X d P X ¯ Q Y X ¯ ; Y
= E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) .
Then, from (A124) and (A126), it holds that
μ ι ˜ ( X ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
From (A117) and (A127), it holds that
μ ι ˜ ( X ; Y | Q Y ) ( t ) = μ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
This concludes the relation between μ ι ˜ ( X ; Y | Q Y ) and μ ι ˜ ( X ¯ ; Y | Q Y ) .
Third, from (A111) and (A128), it holds that
τ = θ + 1 .
This concludes the relation between τ and θ .
Fourth, from (A118), using the change of measure from P X P Y | X to P X ¯ Q Y , it holds that
V ι ˜ ( X ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 2 exp ( t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) d P X P Y | X d P X ¯ Q Y X ¯ ; Y
= E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 2 exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) .
From (A124), (A128), and (A131), it holds that
V ι ˜ ( X ; Y | Q Y ) ( t ) = E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) 2 exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
From (A119) and (A132), it holds that
V ι ˜ ( X ; Y | Q Y ) ( t ) = V ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
This concludes the relation between V ι ˜ ( X ; Y | Q Y ) and V ι ˜ ( X ¯ ; Y | Q Y ) .
Fifth, from (A114), using the change of measure from P X P Y | X to P X ¯ Q Y , it holds that
ξ ι ˜ ( X ; Y | Q Y ) ( t ) = c 1 E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 3 exp ( t ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) d P X P Y | X d P X ¯ Q Y X ¯ ; Y V ι ˜ ( X ; Y | Q Y ) ( t ) 3 / 2 + c 2
= c 1 E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ; Y | Q Y ) ( t ) 3 exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ; Y | Q Y ) ( t ) V ι ˜ ( X ; Y | Q Y ) ( t ) 3 / 2 + c 2 .
From (A124), (A128), (A133), and (A135), it holds that
ξ ι ˜ ( X ; Y | Q Y ) ( t ) = c 1 E P X ¯ Q Y ι ˜ ( X ¯ ; Y | Q Y ) μ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) 3 exp ( t + 1 ) ι ˜ ( X ¯ ; Y | Q Y ) φ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) V ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) 3 / 2 + c 2 .
From (A115) and (A136), it holds that
ξ ι ˜ ( X ; Y | Q Y ) ( t ) = ξ ι ˜ ( X ¯ ; Y | Q Y ) ( t + 1 ) .
This concludes the relation between ξ ι ˜ ( X ; Y | Q Y ) and ξ ι ˜ ( X ¯ ; Y | Q Y ) .
Sixth, plugging (A124), (A128), and (A133) into (A120), for all t R , it holds that
ζ ι ˜ ( X ¯ ; Y | Q Y ) ( t , a , n ) 1 { t > 0 } + ( 1 ) 1 { t > 0 } exp 1 2 n t 2 V ι ˜ ( X ; Y | Q Y ) ( t 1 ) + n ln φ ι ˜ ( X ; Y | Q Y ) ( t 1 ) t a Q | t | n V ι ˜ ( X ; Y | Q Y ) ( t 1 ) .
Then, from (95) and (A138), it holds that
ζ ι ˜ ( X ¯ ; Y | Q Y ) ( t , ln γ , n ) = 1 β ˜ 2 ( n , γ , t 1 , P X , Q Y ) .
Then, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A109), it holds that
1 F V n ln γ
β ˜ 2 ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ + 1 ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
β ˜ 2 ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ + 1 ln γ 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n .
Alternatively, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A110), it holds that
1 F V n ln γ
β ˜ 2 ( n , γ , θ , P X , Q Y ) exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ + 1 ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
β ˜ 2 ( n , γ , θ , P X , Q Y ) exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ + 1 ln γ 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
= G ˜ 2 ( n , γ , θ , P X , Q Y ) ,
where the equality in (A144) follows from (97). Observing that 1 F V n is a positive function, then, from (A143), it holds that
1 F V n ln γ max 0 , G ˜ 2 ( n , γ , θ , P X , Q Y ) .
Seventh, from (94) and (A120), it holds that
ζ ι ˜ ( X ; Y | Q Y ) ( t , ln γ , n ) = β ˜ 1 ( n , γ , t , P X , Q Y ) .
Then, plugging (A124), (A128), (A129), (A133), (A137), and (A146) into the right hand-side of (A107), it holds that
F W n ln γ
β ˜ 1 ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
β ˜ 1 ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n .
Alternatively, plugging (A124), (A128), (A129), (A133), (A137), and (A139) into the right hand-side of (A108), it holds that
F W n ln γ
β ˜ 1 ( n , γ , θ , P X , Q Y ) exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ min 1 , 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
β ˜ 1 ( n , γ , θ , P X , Q Y ) exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ 2 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
= G ˜ 1 ( n , γ , θ , P X , Q Y ) ,
where the equality in (A151) follows from (96). Observing that F W n is a positive function, then from (A150), it holds that
F W n ln γ max 0 , G ˜ 1 ( n , γ , θ , P X , Q Y ) .
Finally, plugging (A141) and (A148) in (A106), it holds that
C ( n , M , P X , Q Y , γ ) β ˜ 1 ( n , γ , θ , P X , Q Y ) + γ β ˜ 2 ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ 4 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n γ M
= β ˜ ( n , γ , θ , P X , Q Y , M ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ 4 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n ,
where the equality in (A150) follows from (100). Observing that C ( n , M , P X , Q Y , γ ) + γ M 1 , from (A153), it holds that
C ( n , M , P X , Q Y , γ )
min 1 , β ˜ ( n , γ , θ , P X , Q Y ) + exp n ln φ ι ˜ ( X ; Y | Q Y ) ( θ ) θ ln γ 4 ξ ι ˜ ( X ; Y | Q Y ) ( θ ) n
= S ˜ ( n , γ , θ , P X , Q Y , M ) ,
where (A156) follows from (99).
Alternatively, plugging (A144) and (A151) in (A106), it holds that
C ( n , M , P X , Q Y , γ ) max 0 , G ˜ 1 ( n , γ , θ , P X , Q Y ) + γ max 0 , G ˜ 2 ( n , γ , θ , P X , Q Y ) γ M
= G ˜ ( n , γ , θ , P X , Q Y , M ) ,
where the equality in (A158) follows from (98). Combining (A156) and (A158) concludes the proof.

References

  1. Zolotarev, V.M. One-Dimensional Stable Distributions; American Mathematical Society: Providence, RI, USA, 1986. [Google Scholar]
  2. Jensen, J.L. Saddlepoint Approximations; Clarendon Press—Oxford: New York, NY, USA, 1995. [Google Scholar]
  3. Feller, W. An Introduction to Probability Theory and Its Applications, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1971; Volume 2. [Google Scholar]
  4. Daniels, H.E. Saddlepoint Approximations in Statistics. Ann. Math. Stat. 1954, 25, 631–650. [Google Scholar] [CrossRef]
  5. Daniels, H.E. Exact saddlepoint approximations. Biometrika 1980, 67, 59–63. [Google Scholar] [CrossRef]
  6. Daniels, H.E. Tail Probability Approximations. Int. Stat. Rev./Rev. Int. Stat. 1987, 55, 37–48. [Google Scholar] [CrossRef]
  7. Temme, N. The Uniform Asymptotic Expansion of a Class of Integrals Related to Cumulative Distribution Functions. SIAM J. Math. Anal. 1982, 13, 239–253. [Google Scholar] [CrossRef] [Green Version]
  8. Lugannani, R.; Rice, S. Saddle Point Approximation for the Distribution of the Sum of Independent Random Variables. Adv. Appl. Probab. 1980, 12, 475–490. [Google Scholar] [CrossRef]
  9. Esscher, F. On the probability function in the collective theory of risk. Skand. Aktuarie Tidskr. 1932, 15, 175–195. [Google Scholar]
  10. Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
  11. MolavianJazi, E.; Laneman, J.N. A Second-Order Achievable Rate Region for Gaussian Multi-Access Channels via a Central Limit Theorem for Functions. IEEE Trans. Inf. Theory 2015, 61, 6719–6733. [Google Scholar] [CrossRef]
  12. MolavianJazi, E. A Unified Approach to Gaussian Channels with Finite Blocklength. Ph.D. Thesis, University of Notre Dame, Notre Dame, Indiana, 2014. [Google Scholar]
  13. Font-Segura, J.; Vazquez-Vilar, G.; Martinez, A.; Guillén i Fàbregas, A.; Lancho, A. Saddlepoint approximations of lower and upper bounds to the error probability in channel coding. In Proceedings of the 52nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
  14. Martinez, A.; Scarlett, J.; Dalai, M.; Guillén i Fàbregas, A. A complex-integration approach to the saddlepoint approximation for random-coding bounds. In Proceedings of the 11th International Symposium on Wireless Communications Systems (ISWCS), Barcelona, Spain, 26–29 August 2014; pp. 618–621. [Google Scholar]
  15. Shevtsova, I. On the absolute constants in the Berry–Esseen type inequalities for identically distributed summands. arXiv 2011, arXiv:1111.6554. [Google Scholar]
Figure 1. Sum of 100 Bernoulli random variables with parameter p = 0 . 2 . The function F X 100 ( a ) (asterisk markers *) in Example 1; the function F Z 100 ( a ) (star markers ) in (25); the function F ^ X 100 ( a ) (diamond markers ) in (12); the function Σ ¯ ( a , 100 ) (circle marker ) in (26); the function Σ ̲ ( a , 100 ) (square marker ) in (27); the function Ω ¯ ( a , 100 ) (upward-pointing triangle marker ) in (41); and the function Ω ̲ ( a , 100 ) (downward-pointing triangle marker ) in (42) are plotted as functions of a, with a [ 5 , 35 ] .
Figure 1. Sum of 100 Bernoulli random variables with parameter p = 0 . 2 . The function F X 100 ( a ) (asterisk markers *) in Example 1; the function F Z 100 ( a ) (star markers ) in (25); the function F ^ X 100 ( a ) (diamond markers ) in (12); the function Σ ¯ ( a , 100 ) (circle marker ) in (26); the function Σ ̲ ( a , 100 ) (square marker ) in (27); the function Ω ¯ ( a , 100 ) (upward-pointing triangle marker ) in (41); and the function Ω ̲ ( a , 100 ) (downward-pointing triangle marker ) in (42) are plotted as functions of a, with a [ 5 , 35 ] .
Entropy 22 00690 g001
Figure 2. Sum of 50 Chi-squared random variables with parameter k = 1 . The function F X 50 ( a ) (asterisk markers *) in Example 2; the function F Z 50 ( a ) (star markers ) in (25); the function F ^ X 50 ( a ) (diamond markers ) in (12); the function Σ ¯ ( a , 50 ) (circle marker ) in (26); the function Σ ̲ ( a , 50 ) (square marker ) in (27); the function Ω ¯ ( a , 50 ) (upward-pointing triangle marker ) in (41); and the function Ω ̲ ( a , 50 ) (downward-pointing triangle marker ) in (42) are plotted as functions of a, with a [ 0 , 100 ] .
Figure 2. Sum of 50 Chi-squared random variables with parameter k = 1 . The function F X 50 ( a ) (asterisk markers *) in Example 2; the function F Z 50 ( a ) (star markers ) in (25); the function F ^ X 50 ( a ) (diamond markers ) in (12); the function Σ ¯ ( a , 50 ) (circle marker ) in (26); the function Σ ̲ ( a , 50 ) (square marker ) in (27); the function Ω ¯ ( a , 50 ) (upward-pointing triangle marker ) in (41); and the function Ω ̲ ( a , 50 ) (downward-pointing triangle marker ) in (42) are plotted as functions of a, with a [ 0 , 100 ] .
Entropy 22 00690 g002
Figure 3. Normal and saddlepoint approximations to the functions T (Figure 3a) in (51) and C (Figure 3b) in (76) as functions of the blocklength n for the case of a BSC with cross-over probability δ = 0.11 . The information rate is R = 0.32 and R = 0.42 bits per channel use for Figure 3a,b, respectively. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is chosen to be respectively the unique solution to t in (74) in Figure 3a and in (102) in Figure 3b.
Figure 3. Normal and saddlepoint approximations to the functions T (Figure 3a) in (51) and C (Figure 3b) in (76) as functions of the blocklength n for the case of a BSC with cross-over probability δ = 0.11 . The information rate is R = 0.32 and R = 0.42 bits per channel use for Figure 3a,b, respectively. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is chosen to be respectively the unique solution to t in (74) in Figure 3a and in (102) in Figure 3b.
Entropy 22 00690 g003
Figure 4. Normal and saddlepoint approximations to the functions T (Figure 4a) in (51) and C (Figure 4b) in (76) as functions of the blocklength n for the case of a real-valued AWGN channel with discrete channel inputs, X = { 1 , 1 } , signal to noise ratio SNR = 1 , and information rate R = 0.425 bits per channel use. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 4a and in (102) in Figure 4b.
Figure 4. Normal and saddlepoint approximations to the functions T (Figure 4a) in (51) and C (Figure 4b) in (76) as functions of the blocklength n for the case of a real-valued AWGN channel with discrete channel inputs, X = { 1 , 1 } , signal to noise ratio SNR = 1 , and information rate R = 0.425 bits per channel use. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 4a and in (102) in Figure 4b.
Entropy 22 00690 g004
Figure 5. Normal and saddlepoint approximation to the functions T (Figure 5a) in (51) and C (Figure 5b) in (76) as functions of the blocklength n for the case of a real-valued symmetric α -stable noise channel with discrete channel inputs X = { 1 , 1 } , shape parameter α = 1.4 , dispersion parameter σ = 0.6 , and information rate R = 0.38 bits per channel use. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 5a and in (102) in Figure 5b.
Figure 5. Normal and saddlepoint approximation to the functions T (Figure 5a) in (51) and C (Figure 5b) in (76) as functions of the blocklength n for the case of a real-valued symmetric α -stable noise channel with discrete channel inputs X = { 1 , 1 } , shape parameter α = 1.4 , dispersion parameter σ = 0.6 , and information rate R = 0.38 bits per channel use. The channel input distribution P X is chosen to be the uniform distribution, the output distribution Q Y is chosen to be the channel output distribution P Y , and the parameter γ is chosen to maximize C in (76). The parameter θ is respectively chosen to be the unique solution to t in (74) in Figure 5a and in (102) in Figure 5b.
Entropy 22 00690 g005

Share and Cite

MDPI and ACS Style

Anade, D.; Gorce, J.-M.; Mary, P.; Perlaza, S.M. An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory. Entropy 2020, 22, 690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

AMA Style

Anade D, Gorce J-M, Mary P, Perlaza SM. An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory. Entropy. 2020; 22(6):690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

Chicago/Turabian Style

Anade, Dadja, Jean-Marie Gorce, Philippe Mary, and Samir M. Perlaza. 2020. "An Upper Bound on the Error Induced by Saddlepoint Approximations—Applications to Information Theory" Entropy 22, no. 6: 690. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060690

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop