Next Article in Journal
Controlled Remote State Preparation via General Pure Three-Qubit State
Previous Article in Journal
Improved Genetic Algorithm Optimization for Forward Vehicle Detection Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Class of New Metrics Based on Triangular Discrimination

1
Department of Risk Management and Insurance, Nankai University, No. 94 Weijin Road, 300071 Tianjin, China
2
School of Statistics and Mathematics, Zhongnan University of Economics and Law, No. 182 Nanhu Avenue, 430073 Wuhan, China
*
Author to whom correspondence should be addressed.
Submission received: 2 June 2015 / Revised: 8 July 2015 / Accepted: 14 July 2015 / Published: 17 July 2015
(This article belongs to the Section Information Theory and Methodology)

Abstract

:
In the field of information theory, statistics and other application areas, the information-theoretic divergences are used widely. To meet the requirement of metric properties, we introduce a class of new metrics based on triangular discrimination which are bounded. Moreover, we obtain some sharp inequalities for the triangular discrimination and other information-theoretic divergences. Their asymptotic approximation properties are also involved.

1. Introduction

In many applications such as pattern recognition, machine learning, statistics, optimization and other applied branches of mathematics, it is beneficial to use the information-theoretic divergences rather than the squared Euclidean distance to estimate the (dis)similarity of two probability distributions or positive arrays [1,2,3,4,5,6,7,8,9]. Among them the Kullback–Leibler divergence (relative entropy), triangular discrimination, variation distance, Hellinger distance, Jensen–Shannon divergence, symmetric Chi-square divergence, J-divergence and other important measures often play a critical role. Unfortunately, most of these divergences do not satisfy the metric properties and unboundedness [10]. As we know, metric properties are the preconditions for numerous convergence properties of iterative algorithms [11]. Moreover, boundedness is also highly concerned in numerical computations and simulations. In paper [12], Endres and Schindelin have proved that the square root of twice Jensen–Shannon divergence is a metric. Triangular discrimination presented by Topsøe in [13] is a non-logarithmic measure and is simple in complex computation. Inspired by [12], we discuss the triangular discrimination. In this paper, the main result is that a class of new metrics derived from the triangular discrimination are introduced. Finally, some new relationships among triangular discrimination, Jensen–Shannon divergence, square of Hellinger distance, variation distance are also obtained.

2. Definition and Auxiliary Results

Definition 1. Let
Γ n = P = ( p 1 , p 2 , , p n ) | p i 0 , i = 1 n p i = 1 , n 2
be the set of all complete finite discrete probability distributions. For all P , Q Γ n , the triangular discrimination is defined by
Δ ( P , Q ) = i = 1 n ( p i q i ) 2 p i + q i .
In the above definition, we use convention based on limitation property that 0 0 = 0 .
The triangular discrimination is obviously symmetric, nonnegative and vanishes for P = Q , but it does not fulfill the triangle inequality. In the view of the foregoing, the concept of triangular discrimination should be generalized. If P , Q Γ n , the function Δ α ( P , Q ) is studied:
Δ α ( P , Q ) = i = 1 n ( p i q i ) 2 p i + q i α ,
where α ( 0 , + ) .
In the following, the α power of the summand in Δ ( P , Q ) with all α ( 0 , + ) are discussed.
Definition 2. Let the function L ( p , q ) : [ 0 , + ) × [ 0 , + ) [ 0 , + ) be defined by
L ( p , q ) = ( p q ) 2 p + q .
It is easy to see that L ( p , q ) 0 and L ( p , q ) = L ( q , p ) . To all α ( 0 , + ) , the issue of whether ( L ( p , q ) ) α satisfies the triangle inequality is considered in the following.
Lemma 1. If the function g : [ 0 , a ) ( a , + ) ( , + ) is defined by
g ( x ) = ( x a ) ( x + 3 a ) ( x + a ) 2 L ( a , x )
with a > 0 , then
lim x a + g ( x ) = 2 a , lim x a g ( x ) = 2 a .
Proof. As
g ( x ) = x + 3 a ( x + a ) 3 2 , x > a x + 3 a ( x + a ) 3 2 , 0 x < a
we can get
lim x a + g ( x ) = a + 3 a ( a + a ) 3 2 = 2 a , lim x a g ( x ) = a + 3 a ( a + a ) 3 2 = 2 a .
Lemma 2. If the function h : [ 0 , + ) ( 0 , + ) is defined by h ( x ) = 3 x + a ( x + a ) 3 2 with a > 0 , then h is monotonic increasing in [ 0 , a ) and monotonic decreasing in ( a , + ) .
Proof. Straightforward derivative shows
h ( x ) = 3 ( a x ) 2 ( x + a ) 5 2 ,
h ( x ) > 0 in [ 0 , a ) and h ( x ) < 0 in ( a , + ) . Thus the lemma holds.
Assuming 0 < p < q , we introduce function R p q : [ 0 , + ) [ 0 , + ) defined by
R p q ( r ) = L ( p , r ) + L ( q , r ) .
Lemma 3. The function R p q ( r ) has two minima, one at r = p and the other at r = q .
Proof. The derivative of the function R p q ( r ) is
R p q ( r ) = 1 2 ( r p ) ( r + 3 p ) ( r + p ) 2 L ( p , r ) + ( r q ) ( r + 3 q ) ( r + q ) 2 L ( q , r ) .
So R p q ( r ) < 0 for r [ 0 , p ) and R p q ( r ) > 0 for r ( q , + ) . It shows R p q ( r ) is monotonic decreasing in [ 0 , p ) and monotonic increasing in [ q , + ) .
Next consider the monotonicity of R p q ( r ) in the open interval ( p , q ) .
From Lemma 3, we have
lim r p + ( r p ) ( r + 3 p ) ( r + p ) 2 L ( p , r ) = 2 p , lim r q ( r q ) ( r + 3 q ) ( r + q ) 2 L ( q , r ) = 2 q .
From Lemma 2, we have
( p q ) ( p + 3 q ) ( p + q ) 2 L ( p , q ) = p + 3 q ( p + q ) 3 2 > p + 3 p ( p + p ) 3 2 = 2 p , ( q p ) ( q + 3 p ) ( p + q ) 2 L ( p , q ) = q + 3 p ( p + q ) 3 2 < q + 3 q ( q + q ) 3 2 = 2 q .
Using (5) and (6),
lim r p + R p q ( r ) = 1 2 lim r p + ( r p ) ( r + 3 p ) ( r + p ) 2 L ( p , r ) + ( p q ) ( p + 3 q ) ( p + q ) 2 L ( p , q ) = 1 2 2 p p + 3 q ( p + q ) 3 2 > 0 , lim r q R p q ( r ) = 1 2 ( q p ) ( q + 3 p ) ( q + p ) 2 L ( p , q ) + lim r q ( r q ) ( r + 3 q ) ( r + q ) 2 L ( r , q ) = 1 2 q + 3 p ( p + q ) 3 2 2 q < 0 .
Let
A ( y , r ) = ( r y ) ( r + 3 y ) ( r + y ) 2 L ( y , r ) = ( r y ) ( r + 3 y ) ( r + y ) 2 r L ( y r , 1 ) = 1 r B ( y , r ) , y > 0 ,
then
B ( y , r ) r = 3 y ( r y ) 2 r + y 2 r ( r + y ) 2 0 .
The equality holds if and only if r = y . So with respect to variable r in the open interval ( p , q ) , B ( p , r ) and B ( q , r ) are both monotonic decreasing, B ( p , r ) + B ( q , r ) is also monotonic decreasing. Using (4),
R p q ( r ) = 1 2 A ( p , r ) + A ( q , r ) = 1 2 r B ( p , r ) + B ( q , r ) ,
this shows lim r p + B ( p , r ) + B ( q , r ) > 0 , lim r q B ( p , r ) + B ( q , r ) < 0 . So we can see B ( p , r ) + B ( q , r ) has only one zero point in the open interval ( p , q ) with respect to variable r. As a consequence, R p q ( r ) has only one zero point x 0 in the open interval ( p , q ) with respect to variable r. This means R p q ( r ) > 0 in the interval ( p , x 0 ) , R p q ( r ) < 0 in the interval ( x 0 , q ) . From the above we know R p q ( r ) has only one maximum and no minimum in the open interval ( p , q ) .
As a result, the conclusion in the lemma is obtained.   ☐
Theorem 1. Let p , q , r [ 0 , + ) , then
( L ( p , q ) ) 1 2 ( L ( p , r ) ) 1 2 + ( L ( q , r ) ) 1 2 .
Proof. If p = q , then L ( p , q ) = 0 . The triangle inequality (7) obviously holds.
If p q and one of p , q is equal to 0, it is easy to obtain that (7) holds.
Next we assume 0 < p < q without loss of generality. Note that the formula is valid:
( L ( p , q ) ) 1 2 = lim r p ( L ( p , r ) ) 1 2 + ( L ( q , r ) ) 1 2 = lim r q ( L ( p , r ) ) 1 2 + ( L ( q , r ) ) 1 2 .
From Lemma 3 the triangle inequality (7) can be easily proved for any number r [ 0 , + ) .   ☐
Corollary 1. Let p , q , r [ 0 , + ) . If 0 < α < 1 2 , then
( L ( p , q ) ) α ( L ( p , r ) ) α + ( L ( q , r ) ) α .
Proof. Let a , b > 0 and 0 < γ < 1 , then a γ + b γ > ( a + b ) γ which follows from the concavity of x γ . Now a γ which satisfies α = 1 2 γ can be found. Thus from Theorem 1,
( L ( p , r ) ) α + ( L ( q , r ) ) α = ( L ( p , r ) ) 1 2 γ + ( L ( q , r ) ) 1 2 γ ( L ( p , r ) ) 1 2 + ( L ( q , r ) ) 1 2 γ ( L ( p , q ) ) 1 2 γ = ( L ( p , q ) ) α .
This is the triangle inequality (8) for the function ( L ( p , q ) ) α .   ☐
Theorem 2. Let p , q , r [ 0 , + ) . If α > 1 2 , then the triangle inequality (8) does not hold.
Proof. Assuming 0 < p < q , let l ( r ) = ( L ( p , r ) ) α + ( L ( q , r ) ) α . Firstly the formula is valid:
( L ( p , q ) ) α = lim r p ( L ( p , r ) ) α + ( L ( q , r ) ) α = lim r q ( L ( p , r ) ) α + ( L ( q , r ) ) α .
The derivative of the function l is
l ( r ) = α ( r p ) ( 3 p + r ) ( p + r ) 2 L ( p , r ) α 1 + ( r q ) ( 3 q + r ) ( q + r ) 2 L ( q , r ) α 1 .
When r ( p , q ) , let
m ( r ) = ( r p ) ( 3 p + r ) ( p + r ) 2 L ( p , r ) α 1 1 1 α .
Using l’Hôspital’s rule,
lim r p + m ( r ) = 8 p 2 ( 1 α ) ( p + r ) 3 ( r p ) ( 3 p + r ) ( p + r ) 2 2 α 1 1 α = 0 .
So
lim r p + l ( r ) = ( p q ) ( 3 q + p ) ( q + p ) 2 L ( p , q ) α 1 < 0 .
According to the definition of derivative, there exists a δ > 0 such that for any s ( p , p + δ ) ,
( L ( p , q ) ) α = lim r p + ( L ( p , r ) ) α + ( L ( q , r ) ) α > ( L ( p , s ) ) α + ( L ( q , s ) ) α .
This shows the triangle inequality (8) does not hold.   ☐
To sum up the theorems and corollary above, we can obtain the main theorem:
Theorem 3. The function ( L ( p , q ) ) α satisfies the triangle inequality (8) if and only if 0 < α 1 2 .

3. Metric Properties of Δ α ( P , Q )

In this section, we mainly prove the following theorem:
Theorem 4. The function Δ α ( P , Q ) is a metric on the space Γ n if and only if 0 < α 1 2 .
Proof. From (2) we can get Δ α ( P , Q ) = i = 1 n L ( p i , q i ) α . It is easy to see that Δ α ( P , Q ) 0 with equality only for P = Q and Δ α ( P , Q ) = Δ α ( Q , P ) . So what we concern is whether the triangle inequality
Δ α ( P , Q ) Δ α ( P , R ) + Δ α ( Q , R )
holds for any P , Q , R Γ n .
When P = Q , Δ α ( P , Q ) = 0 , the triangle inequality (9) holds apparently. So we assume P Q in the following.
Next we consider the value of α in two cases respectively:
(i) 0 < α 1 2 :
From Theorem 3, the inequality ( L ( p i , q i ) ) α ( L ( p i , r i ) ) α + ( L ( q i , r i ) ) α holds. Applying Minkowski’s inequality we have
i = 1 n L p i , q i α = i = 1 n ( L ( p i , q i ) ) α 1 α α i = 1 n ( L ( p i , r i ) ) α + ( L ( q i , r i ) ) α 1 α α i = 1 n ( L ( p i , r i ) ) α 1 α α + i = 1 n ( L ( q i , r i ) ) α 1 α α = i = 1 n L p i , r i α + i = 1 n L q i , r i α .
So the triangle inequality (9) holds.
(ii) α > 1 2 :
Let
F ( x 1 , , x n ) = F 1 ( x 1 , , x n ) + F 2 ( x 1 , , x n ) ,
where
F 1 ( x 1 , , x n ) = i = 1 n ( p i x i ) 2 p i + x i α , F 2 ( x 1 , , x n ) = i = 1 n ( q i x i ) 2 q i + x i α .
Then F ( p 1 , , p n ) = F ( q 1 , , q n ) = Δ α ( P , Q ) .
Next we prove ( p 1 , , p n ) and ( q 1 , , q n ) are not the extreme points of the function F ( x 1 , , x n ) . By the symmetry we only need to prove ( p 1 , , p n ) is not the extreme point.
By partial derivative,
F x i | ( x 1 , , x n ) = ( p 1 , , p n ) = F 1 x i | ( x 1 , , x n ) = ( p 1 , , p n ) + F 2 x i | ( x 1 , , x n ) = ( p 1 , , p n ) .
Since P Q , we might as well assume p 1 q 1 and p 1 > 0 .
F 2 x 1 | ( x 1 , , x n ) = ( p 1 , , p n ) = α ( p 1 q 1 ) ( p 1 + 3 q 1 ) ( p 1 + q 1 ) 2 · i = 1 n ( p i q i ) 2 p i + q i α 1 0 .
F 1 x 1 | ( x 1 , , x n ) = ( p 1 , , p n ) = lim Δ x 1 0 1 Δ x 1 F 1 ( p 1 + Δ x 1 , , p n ) F 1 ( p 1 , , p n ) = lim Δ x 1 0 1 Δ x 1 Δ x 1 2 2 p 1 + x 1 α = lim Δ x 1 0 Δ x 1 2 α 1 ( 2 p 1 + x 1 ) α = 0 .
Then taking (11) and (12) into (10), we have
F x 1 | ( x 1 , , x n ) = ( p 1 , , p n ) 0 .
Therefore, ( p 1 , , p n ) is not the extreme point of the function F ( x 1 , , x n ) . For the same reason, ( q 1 , , q n ) is also not the extreme point.
Using the definition of extreme point, there exists a point R = ( r 1 , , r n ) such that F ( r 1 , , r n ) < F ( p 1 , , p n ) = Δ α ( P , Q ) . As F 1 ( r 1 , , r n ) = Δ α ( P , R ) , F 2 ( r 1 , , r n ) = Δ α ( Q , R ) , then Δ α ( P , R ) + Δ α ( Q , R ) < Δ α ( P , Q ) . The inequality is not consistent with the triangle inequality (9).
From what has been discussed above, the conclusion in the theorem is obtained.   ☐s
The generalization of this result to continuous probability distributions is straightforward. Consider a measurable space ( X , A ) , and P, Q are probability distributions with Radon-Nykodym densities p = d P d μ , q = d Q d μ w.r.t. a dominating σ-finite measure μ. Then
Δ α ( P , Q ) = X ( p q ) 2 p + q d μ α
is a metric if and only if 0 < α 1 2 .
Next we will discuss the maxima and minima of Δ α ( P , Q ) . It is obvious that Δ α ( P , Q ) = 0 is the minima, if and only if P = Q . Because Δ ( P , Q ) can rewrite in the form
Δ ( P , Q ) = i = 1 n ( p i q i ) 2 p i + q i = i = 1 n p i + q i 4 p i q i p i + q i = 2 i = 1 n 4 p i q i p i + q i 2 .
Δ ( P , Q ) obtains the maxima 2 when P , Q are two distinct deterministic distributions, namely p i q i = 0 . Then the metric Δ α ( P , Q ) achieves its maximum value 2 α .

4. Some Inequalities among the Information-Theoretic Divergences

Definition 3. For all P , Q Γ n , the Jensen–Shannon divergence is defined by
J S ( P , Q ) = 1 2 i = 1 n p i ln 2 p i p i + q i + q i ln 2 q i p i + q i .
The square of the Hellinger distance is defined by
H 2 ( P , Q ) = 1 2 i = 1 n ( p i q i ) 2 .
The variance distance is defined by
V ( P , Q ) = i = 1 n | p i q i | .
Next we introduce the Csiszár’s f-divergence [14].
Definition 4. Let f : [ 0 , + ) ( , + ) be a convex function satisfying f ( 1 ) = 0 , the f-divergence measure introduced by Csiszár is defined as
C f ( P , Q ) = i = 1 n q i f p i q i
for all P , Q Γ n .
The triangular discrimination, Jensen–Shannon divergence, the square of the Hellinger distance, variance distance are all f-divergence.
Example 1. (Triangular Discrimination) Let us consider
f Δ ( x ) = ( x 1 ) 2 x + 1 , x [ 0 , + )
in (15). Then we can verify f Δ ( x ) is convex because f Δ ( x ) = 8 ( x + 1 ) 3 0 , f Δ ( 1 ) = 0 , f Δ ( x ) 0 and C f Δ ( P , Q ) = Δ ( P , Q ) .
Example 2. (Jensen–Shannon divergence) Let us consider
f J S ( x ) = x 2 ln 2 x x + 1 + 1 2 ln 2 x + 1 , x [ 0 , + )
in (15). Then we can verify f J S ( x ) is convex because f J S ( x ) = 1 2 x 2 + 2 x 0 , f J S ( 1 ) = 0 and C f J S ( P , Q ) = J S ( P , Q ) . By standard inequality ln x 1 1 x , f J S ( x ) x 2 ( 1 x + 1 2 x ) + 1 2 ( 1 x + 1 2 ) = 0 holds.
Example 3. (Square of Hellinger distance) Let us consider
f h ( x ) = 1 2 ( x 1 ) 2 , x [ 0 , + )
in (15). Then we can verify f h ( x ) is convex because f h ( x ) = 1 4 x x 0 , f h ( 1 ) = 0 , f h ( x ) 0 and C f h ( P , Q ) = H 2 ( P , Q ) .
Example 4. (Variation distance) Let us consider
f V ( x ) = | x 1 | , x [ 0 , + )
in (15). Then we can easily get f V ( x ) is convex, f V ( 1 ) = 0 , f V ( x ) 0 and C f V ( P , Q ) = V ( P , Q ) .
Theorem 5. Let f 1 , f 2 be two nonnegative generating functions and there exists the real constants k , K such that k < K and if f 2 ( x ) 0 then
k f 1 ( x ) f 2 ( x ) K ,
if f 2 ( x ) = 0 , then f 1 ( x ) = 0 . We have the inequalities:
k C f 2 ( P , Q ) C f 1 ( P , Q ) K C f 2 ( P , Q ) .
Proof. The conditions can be rewritten as k f 2 ( x ) f 1 ( x ) K f 2 ( x ) . So from the formula (15),
C f 1 ( P , Q ) = i = 1 n q i f 1 ( p i q i ) i = 1 n q i k f 2 p i q i = k i = 1 n q i f 2 p i q i = k C f 2 ( P , Q ) .
and
C f 1 ( P , Q ) = i = 1 n q i f 1 ( p i q i ) i = 1 n q i K f 2 p i q i = K i = 1 n q i f 2 p i q i = K C f 2 ( P , Q ) .
We have shown that f Δ , f J S , f h , f V are all nonnegative. In the following we will have some inequalities.
Theorem 6.
1 4 Δ ( P , Q ) J S ( P , Q ) ln 2 2 Δ ( P , Q ) .
Proof. When x 1 , both f Δ ( 1 ) and f J S ( 1 ) are not equal to 0. We consider the function:
ϕ ( x ) = f J S ( x ) f Δ ( x ) = x 2 ln 2 x x + 1 + 1 2 ln 2 x + 1 ( x 1 ) 2 x + 1 .
The derivative of the function ϕ ( x ) is
ϕ ( x ) = ( 1 + 3 x ) ln x + 4 ( 1 + x ) ln 2 x + 1 2 ( 1 x ) 3 .
Let
ψ ( x ) = ( 1 + 3 x ) ln x + 4 ( 1 + x ) ln 2 x + 1 .
Straightforward derivative shows
ψ ( x ) = 3 ln x + 4 ln 2 1 + x + 1 x 1 , ψ ( x ) = ( x 1 ) 2 x 2 ( x + 1 ) < 0 .
So ψ ( x ) is concave function when x [ 0 , + ) and ψ ( 1 ) = ψ ( 1 ) = 0 . This means ψ ( x ) gets the maximum 0 at the point x = 1 . Accordingly ψ ( x ) < 0 when x 1 . From (16), we find
ϕ ( x ) < 0 , 0 < x < 1 ϕ ( x ) > 0 , x > 1
and
lim x 0 + ϕ ( x ) = 1 2 ln 2 1 = ln 2 2 .
Using l’Hôspital’s rule (differentiate twice),
lim x 1 ϕ ( x ) = lim x 1 1 2 1 x 1 x + 1 8 ( x + 1 ) 3 = 1 4 .
Using l’Hôspital’s rule (differentiate once),
lim x + ϕ ( x ) = 1 2 ln 2 x x + 1 ( x 1 ) ( x + 3 ) ( x + 1 ) 2 = ln 2 2 .
Thus
1 4 ϕ ( x ) = f J S ( x ) f Δ ( x ) ln 2 2 .
When x = 1 , f Δ ( 1 ) = f J S ( 1 ) = 0 . As a consequence of Theorem 5, we obtain the result
1 4 C f Δ ( P , Q ) C f J S ( P , Q ) ln 2 2 C f Δ ( P , Q ) .
Thus the theorem is proved.   ☐
Theorem 7.
J S ( P , Q ) H 2 ( P , Q ) 1 ln 2 J S ( P , Q ) .
Proof. When x 1 , both f h ( 1 ) and f J S ( 1 ) are not equal to 0. We consider the function:
ξ ( x ) = f J S ( x ) f h ( x ) = x 2 ln 2 x x + 1 + 1 2 ln 2 x + 1 1 2 ( x 1 ) 2 .
The derivative of the function ϕ ( x ) is
ξ ( x ) = ln 2 x + 1 + x ln 2 x x + 1 x ( 1 x ) 3 .
By standard inequality ln x 1 1 x ,
ln 2 x + 1 + x ln 2 x x + 1 1 x + 1 2 + x 1 x + 1 2 x = ( x 1 ) 2 ( x + 1 ) 2 x > 0
So
ξ ( x ) > 0 , 0 < x < 1 ξ ( x ) < 0 , x > 1
and
lim x 0 + ξ ( x ) = 1 2 ln 2 1 2 = ln 2 .
Using l’Hôspital’s rule (differentiate twice),
lim x 1 ξ ( x ) = lim x 1 1 2 1 x 1 x + 1 1 4 x 3 = 1 .
Using l’Hôspital’s rule (differentiate once),
lim x + ξ ( x ) = 1 2 ln 2 x x + 1 x 1 2 x = ln 2 .
Thus
ln 2 ϕ ( x ) = f J S ( x ) f h ( x ) 1 ,
or
1 1 ϕ ( x ) = f h ( x ) f J S ( x ) 1 ln 2 .
When x = 1 , f h ( 1 ) = f J S ( 1 ) = 0 . As a consequence of Theorem 5, we obtain the result
C f J S ( P , Q ) C f h ( P , Q ) 1 ln 2 C f J S ( P , Q ) .
Thus the theorem is proved.   ☐
Theorem 8.
1 2 V 2 ( P , Q ) Δ ( P , Q ) V ( P , Q ) .
Proof. When x 1 , both f Δ ( 1 ) and f V ( 1 ) are not equal to 0. We consider the function:
f Δ ( x ) f V ( x ) = ( x 1 ) 2 x + 1 | x 1 | = | x 1 | x + 1 1 .
When x = 1 , f Δ ( 1 ) = f V ( 1 ) = 0 . As a consequence of Theorem 5, we obtain the result C f Δ ( P , Q ) C f V ( P , Q ) . This means Δ ( P , Q ) V ( P , Q ) . Next,
1 2 V 2 ( P , Q ) = 1 2 i = 1 n | p i q i | 2 1 2 i = 1 n ( p i + q i ) i = 1 n ( p i q i ) 2 p i + q i (Cauchy--Schwarz inequality) = 1 2 · 2 · i = 1 n ( p i q i ) 2 p i + q i = i = 1 n ( p i q i ) 2 p i + q i = Δ ( P , Q )
Thus the theorem is proved.   ☐
From the above theorems, inequalities among these measures are given by
1 8 V 2 ( P , Q ) 1 4 Δ ( P , Q ) J S ( P , Q ) H 2 ( P , Q ) 1 ln 2 J S ( P , Q ) 1 2 Δ ( P , Q ) 1 2 V ( P , Q )
These inequalities are sharper than the inequalities in [13] Theorem 2 and [15] (Section 3.1).

5. Asymptotic Approximation

Definition 5. For all P , Q Γ n , the Chi-square divergence is defined by
χ 2 ( P , Q ) = i = 1 n ( p i q i ) 2 q i .
In [12],
J S ( P , Q ) = 1 2 D P Q 2 1 2 i = 1 n 1 4 q i ( p i q i ) 2 = 1 8 χ 2 ( P , Q ) .
In this section, we will discuss the asymptotic approximation of Δ ( P , Q ) and H 2 ( P , Q ) when P Q in L 2 norm.
Theorem 9. If P Q 2 0 , then
Δ ( P , Q ) 1 2 χ 2 ( P , Q ) , H 2 ( P , Q ) 1 8 χ 2 ( P , Q ) .
Proof. From Taylor’s series expansion at q, we have
( x q ) 2 x + q = ( x q ) 2 2 q + o ( x q ) 2 1 2 ( x q ) 2 = ( x q ) 2 8 q + o ( x q ) 2
Hence
Δ ( P , Q ) = i = 1 n ( p i q i ) 2 p i + q i = i = 1 n ( p i q i ) 2 2 q i + o P Q 2 2 = 1 2 χ 2 ( P , Q ) + o P Q 2 2
H 2 ( P , Q ) = i = 1 n 1 2 ( p i q i ) 2 = i = 1 n ( p i q i ) 2 8 q i + o P Q 2 2 = 1 8 χ 2 ( P , Q ) + o P Q 2 2
Equivalently, J S ( P , Q ) H 2 ( P , Q ) 1 4 Δ ( P , Q ) 1 8 χ 2 ( P , Q ) when P Q . So in some cases, one of the information-theoretic divergences can be substituted for another. The asymptotic property can also interpret the boundedness of triangular discrimination and, on the other hand, the new metrics.

Acknowledgments

The authors would like to thank the editor and referees for their helpful suggestions and comments on the manuscript. This manuscript is supported by China Postdoctoral Science Foundation (2015M571255), the National Science Foundation of China (the NSF of China) Grant No. 71171119, the Fundamental Research Funds for the Central Universities (FRF-CU) Grant No. 2722013JC082, and the Fundamental Research Funds for the Central Universities under grant number NKZXTD1403.

Author Contributions

Wrote the paper: Guoxiang Lu and Bingqing Li. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Basseville, M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process. 2013, 93, 621–633. [Google Scholar] [CrossRef]
  2. Csiszár, I.; Shields, P.C. Information theory and statistics: A tutorial. Found. Trends Commun. Inf. Theory 2004, 1, 417–528. [Google Scholar] [CrossRef]
  3. Dragomir, S.S.; Gluščević, V. Some inequalities for the Kullback–Leibler and χ2-distances in information theory and applications. Tamsui Oxf. J. Math. Sci. 2001, 17, 97–111. [Google Scholar]
  4. Reid, M.D.; Williamson, R.C. Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 2011, 12, 731–817. [Google Scholar]
  5. Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
  6. Vajda, I. Theory of Statistical Inference and Information; Kluwer Academic Press: London, UK, 1989. [Google Scholar]
  7. Csiszár, I. Axiomatic characterizations of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
  8. Cichocki, A.; Cruces, S.; Amari, S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 2011, 13, 134–170. [Google Scholar] [CrossRef]
  9. Taneja, I.J. Seven means, generalized triangular discrimination, and generating divergence measures. Information 2013, 4, 198–239. [Google Scholar] [CrossRef]
  10. Arndt, C. Information Measures: Information and its Description in Science and Engineering; Springer Verlag: Berlin, Germany, 2004. [Google Scholar]
  11. Brown, R.F. A Topological Introduction to Nonlinear Analysis; Birkhäuser: Basel, Switzerland, 1993. [Google Scholar]
  12. Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef] [Green Version]
  13. Topsøe, F. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 2000, 46, 1602–1609. [Google Scholar] [CrossRef]
  14. Csiszár, I. Information type measures of differences of probability distribution and indirect observations. Studia Sci. Math. Hungar 1967, 2, 299–318. [Google Scholar]
  15. Taneja, I.J. Refinement inequalities among symmetric divergence measures. Austr. J. Math. Anal. Appl. 2005, 2. Available online: http://ajmaa.org/cgi-bin/paper.pl?string=v2n1/V2I1P8.tex (accessed on 14 July 2015). [Google Scholar]

Share and Cite

MDPI and ACS Style

Lu, G.; Li, B. A Class of New Metrics Based on Triangular Discrimination. Information 2015, 6, 361-374. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030361

AMA Style

Lu G, Li B. A Class of New Metrics Based on Triangular Discrimination. Information. 2015; 6(3):361-374. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030361

Chicago/Turabian Style

Lu, Guoxiang, and Bingqing Li. 2015. "A Class of New Metrics Based on Triangular Discrimination" Information 6, no. 3: 361-374. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030361

Article Metrics

Back to TopTop