Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences

Kluza, Paweł A.

doi:10.3390/e23121688

Open AccessArticle

Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences

by

Paweł A. Kluza

Department of Applied Mathematics and Computer Science, University of Life Sciences in Lublin, 28 Głęboka Street, 20-612 Lublin, Poland

Entropy 2021, 23(12), 1688; https://0-doi-org.brum.beds.ac.uk/10.3390/e23121688

Submission received: 21 October 2021 / Revised: 9 December 2021 / Accepted: 14 December 2021 / Published: 16 December 2021

(This article belongs to the Special Issue Distance in Information and Statistical Physics III)

Download Versions Notes

Abstract

:

In this paper, we introduce new divergences called Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal in relation to convex functions. Some theorems, which give the lower and upper bounds for two new introduced divergences, are provided. The obtained results imply some new inequalities corresponding to known divergences. Some examples, which show that these are the generalizations of Rényi, Tsallis, and Kullback–Leibler types of divergences, are provided in order to show a few applications of new divergences.

Keywords:

convex function; Csiszár f-divergence; Sharma–Mittal f-divergence; Jensen–Sharma–Mittal divergence; Jeffreys–Sharma–Mittal divergence

MSC:

94A17; 26D15; 15B51

1. Introduction

The Sharma–Mittal entropy was introduced as a new measure of information with two parameters [1]. It has previously been studied in the context of multi-dimensional harmonic oscillator systems [2]. This entropy could also be formulated in the form of exponential families, to which many usual statistical distributions including the Gaussians and discrete multinomials (that is, normalized histograms) belong. In physical applications it plays a major role in the field of thermo-statistics [3].

The Sharma–Mittal entropy is also applied for the analysis of the results of machine learning methods [4,5]. Additionally, the divergence based on considered entropy could be a cost function in the context of so-called the Twin Gaussian Processes [6].

It was originally showed by [7] that the Sharma–Mittal entropy generalized both Tsallis and Rényi entropy in the limiting cases of these two entropies. In [8], authors suggested a physical meaning of Sharma–Mittal entropy, which is the free energy difference between the equilibrium and the off-equilibrium distribution.

Recently, was published a manuscript showing, in opposition to the work [8], that Sharma–Mittal entropy besides the convenient thermodynamic systems does not reduce only to Kullback–Leibler entropy. In [9] Verma and Merigó present the use of Sharma–Mittal entropy under intuitionistic fuzzy environment. Additionally, in [5] Koltcov et al. demonstrate that Sharma–Mittal entropy is a tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do.

Another applications of considered entropy are interesting results in the cosmological setup, such as black hole thermodynamics [10]. Namely, it helps us to describe the current accelerated universe by using the vacuum energy in a suitable manner [11]. In addition [12] have established the relation between anomalous diffusion process and Sharma–Mittal entropy.

This paper is based on publications in which we introduced new types of f-divergences [13,14,15,16].

In this paper we generalize Sharma–Mittal types divergences in order to obtain new types of divergences and hence the inequalities from which it will be possible to derive new results and generalizations for known divergences in order to estimate the lower and upper bounds which determine the level of the uncertainty measure.

2. Sharma–Mittal Type Divergences

Throughout

R_{+}

and

R_{+ +}

denote the sets of non-negative and positive numbers, respectively, i.e.,

R_{+} = [0, \infty)

and

R_{+ +} = (0, \infty)

.

Let

p = (p_{1}, \dots, p_{n})

and

q = (q_{1}, \dots, q_{n})

with

p_{i}, q_{i} \geq 0

,

i = 1, \dots, n

. The relative entropy (also called Kullback–Leibler divergence) is defined by (see [17])

H_{1} (p, q) = \sum_{i = 1}^{n} p_{i} log \frac{p_{i}}{q_{i}} .

(1)

In the above definition, based on continuity arguments, we use a convention that

0 log (0 / q) = 0

and

p log (p / 0) = + \infty

. Additionally

0 log (0 / 0) = 0

.

Let f:

R_{+} \to R

be a convex function on

R_{+}

, and

p = (p_{1}, \dots, p_{n}) \in R_{+ +}^{n}

,

q = (q_{1}, \dots, q_{n}) \in R_{+}^{n}

.

The Csiszár f-divergence is defined by (see [15])

C_{f} (p, q) = \sum_{i = 1}^{n} p_{i} f (\frac{q_{i}}{p_{i}}) .

(2)

with the conventions

0 f (\frac{0}{0}) = 0

and

0 f (\frac{c}{0}) = c lim_{t \to \infty} \frac{f (t)}{t}

,

c > 0

(see [18,19,20]).

The Tsallis divergence of order

α

is defined by (see [17])

T_{α} (p, q) = \frac{1}{α - 1} (\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α} - 1) .

The Rényi divergence of order

α

is defined by (see [17,21])

H_{α} (p, q) = \frac{1}{α - 1} log \sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α} .

The Sharma–Mittal divergence of order

α

and degree

β

is defined by (see [4])

S M_{α, β} (p, q) = \frac{1}{β - 1} [{(\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α})}^{\frac{1 - β}{1 - α}} - 1],

(3)

for all

α > 0

,

α \neq 1

and

β \neq 1

.

Let

g : I \to R

be a convex function on an interval

I \subset R

. Let

x = (x_{1}, \dots, x_{n}) \in I^{n}

and

p_{i} \in [0, 1)

for

i = 1, \dots n

.

The Jensen’s inequality is as follows (see [22])

g (\sum_{i = 1}^{n} p_{i} x_{i}) \leq \sum_{i = 1}^{n} p_{i} g (x_{i}) .

(4)

When the function

k : R_{+} \to R

is convex and the function

l : R_{+} \to R

is convex and increasing then the composition of the functions

k \circ l : R_{+} \to R

is convex. We assume that the probabilities

p_{i} \geq 0

and

q_{i} > 0

for

i = 1, \dots, n

.

It is known (see [4]) that if

α = β \to 1 then S M_{α, β} (p, q) \to H_{1} (p, q),

β \to 1 and α \in R then S M_{α, β} (p, q) \to H_{α} (p, q),

β = α then S M_{α, β} (p, q) \to T_{α} (p, q) .

Let

h : R \to R

be the differentiable function. Then the Sharma–Mittal h-divergence is defined as follows:

S M_{h, α, β} (p, q) = \frac{h ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1},

(5)

for all

α > 0

,

α \neq 1

and

β \neq 1

.

If we assume that

h = i d

then (5) becomes Sharma–Mittal divergence.

When for all

t > 0

,

h (t) = log (t e)

then (5) becomes Rényi divergence of order

α

.

We substitute for

t = {[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}

and we have

h (t) = h ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}) = log ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}} e) = log {[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}} + 1 .

Hence, from (5)

S M_{h, α, β} (p, q) = \frac{log {[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}}{β - 1} = \frac{1}{α - 1} log \sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α} = H_{α} (p, q) .

Let

Ψ : R_{+} \times R_{+} \to R

be a differentiable function with respect to

β

and

Ψ (α, β) = h ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}) .

We assume that

h^{'} (1) = 1

and

Ψ (α, 1) = 1

. Then,

lim_{β \to 1} \frac{Ψ (α, β) - Ψ (α, 1)}{β - 1} = Ψ^{'} (α, 1) = h^{'} {({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}})}_{| β = 1} =

{\frac{\partial}{\partial β}}_{|_{β = 1}} h ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}) {\frac{\partial}{\partial β}}_{|_{β = 1}} ({[\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}]}^{\frac{1 - β}{1 - α}}) {\frac{\partial}{\partial β}}_{|_{β = 1}} (\frac{1 - β}{1 - α}) =

h^{'} (1) {(\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α})}^{\frac{1 - 1}{1 - α}} log (\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}) (\frac{- 1}{1 - α}) =

h^{'} (1) \frac{1}{α - 1} log (\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}) = \frac{1}{α - 1} log (\sum_{i = 1}^{n} p_{i}^{α} q_{i}^{1 - α}) .

Hence, the Sharma–Mittal h–divergence tends to Rényi divergence of order

α

.

Remark 1.

If, additionally, α tends to 1 then based on the proof of the Equation (11) from [16], Sharma–Mittal h-divergence tends to relative entropy (called Kullback–Leibler divergence).

Now we define a new generalized

(h, ϕ)

Sharma–Mittal divergence as follows

S M_{h, ϕ, α, β} (p, q) = \frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1},

(6)

where

ϕ : (0, + \infty) \times R_{+} \to R_{+}

is an increasing, non-negative and differentiable function for

β > 1

.

We assume that

F = {f_{α} : (0, \infty) \to R : α \in R}

is a given family of functions such that

\sum_{i = 1}^{n} q_{i} f_{α_{|_{α = 1}}} (\frac{p_{i}}{q_{i}}) = 1

for

α = 1

and which are increasing, non-negative for

α > 1

and such that for every

t \in (0, + \infty)

the function

α ⟼ f_{α} (t)

is differentiable.

According to [16] if we substitute the function

f_{α} (\frac{p_{i}}{q_{i}})

from the family

F

for

ϕ (\frac{p_{i}}{q_{i}}, α)

then it stands that

lim_{β \to 1} S M_{h, ϕ, α, β} (p, q) = R_{h, f_{α}} (p, q) .

We assume that

h (1) = 1

. Then,

lim_{β \to 1} S M_{h, ϕ, α, β} (p, q) = lim_{β \to 1} \frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} =

h^{'} ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{0}) \cdot {[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{0} [log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)] \frac{- 1}{1 - α} =

h^{'} (1) \frac{1}{α - 1} log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) = \frac{1}{α - 1} log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) =

\frac{1}{α - 1} log \sum_{i = 1}^{n} q_{i} f_{α} (\frac{p_{i}}{q_{i}}) = R_{h, f_{α}} (p, q) .

(7)

Remark 2.

If in (6)

β \to 1

and

ϕ (\frac{p_{i}}{q_{i}}, α) = f_{α} (\frac{p_{i}}{q_{i}})

then the generalized

(h, ϕ)

–Sharma–Mittal divergence tends to generalized

(h, F)

–Rényi divergence.

The function

ϕ

is the generalization of the function

f_{α} (\frac{p_{i}}{q_{i}})

which is used for example in Csiszár f-divergence. Condition

β \to 1

means that the limit of generalized Sharma–Mittal divergence is equal to generalized

(h, F)

–Rényi divergence. Hence we have implications for generalized forms of entropies.

Remark 3.

Additionally, when in (6)

α \to 1

and

ϕ (\frac{p_{i}}{q_{i}}, α) = {(\frac{p_{i}}{q_{i}})}^{α}

then the generalized

(h, ϕ)

–Sharma–Mittal divergence tends to Kullback–Leibler divergence, because we have from Remark 2

lim_{(α, β) \to (1, 1)} S M_{h, ϕ, α, β} (p, q) = lim_{α \to 1} \frac{log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) - log 1}{α - 1} = {\frac{\partial}{\partial α}}_{|_{α = 1}} log \sum_{i = 1}^{n} q_{i} {(\frac{p_{i}}{q_{i}})}^{α} = \frac{1}{\sum_{i = 1}^{n} p_{i}} \sum_{i = 1}^{n} q_{i} \frac{p_{i}}{q_{i}} log \frac{p_{i}}{q_{i}} = \sum_{i = 1}^{n} p_{i} log \frac{p_{i}}{q_{i}} = H_{1} (p, q) .

Remark 4.

In (6), when the parameter

β = α

, the function

h = i d

and

ϕ (\frac{p_{i}}{q_{i}}, α) = {(\frac{p_{i}}{q_{i}})}^{α}

then the generalized

(h, ϕ)

–Sharma–Mittal divergence tends to the Tsallis f-divergence or order α.

This work is more theoretical than practical. Therefore, the implications are formulated in the mathematical area that is from constructing general model which gives known specific cases.

3. Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Divergences

The Jensen–Shannon divergence (Jensen–Shannon entropy) is defined as follows (see [17]):

Jen (p, q) = \frac{1}{2} S M (p, \frac{p + q}{2}) + \frac{1}{2} S M (q, \frac{p + q}{2}) .

The Jeffreys divergence (Jeffreys entropy) is defined as follows (see [17]):

Jef (p, q) = S M (p, q) + S M (q, p) .

We introduce a new generalized

(h, ϕ)

Jensen–Sharma–Mittal divergence defined by

Jen S M_{α, β}^{h, ϕ} (p, q) = \frac{1}{2} S M_{h, ϕ, α, β} (p, \frac{p + q}{2}) + \frac{1}{2} S M_{h, ϕ, α, β} (q, \frac{p + q}{2})

(8)

with assumptions as before.

We similarly introduce a new generalized

(h, ϕ)

Jeffreys–Sharma–Mittal divergence as follows

Jef S M_{α, β}^{h, ϕ} (p, q) = S M_{h, ϕ, α, β} (p, q) + S M_{h, ϕ, α, β} (q, p) .

(9)

Taking into account inequality from [17]:

0 \leq Jen (p, q) \leq \frac{1}{2} Jef (p, q),

describing the relation between the Jensen–Shannon and Jeffreys divergences, we could formulate the following:

0 \leq Jen S M_{α, β}^{h, ϕ} (p, q) \leq \frac{1}{2} Jef S M_{α, β}^{h, ϕ} (p, q) .

(10)

We define the Jensen–Sharma–Mittal h-divergence where, in (8),

ϕ (\frac{p_{i}}{q_{i}}, α) = {(\frac{p_{i}}{q_{i}})}^{α}

. Then, it takes the form:

Jen S M_{α, β}^{h} (p, q) = \frac{1}{2} S M_{h, α, β} (p, \frac{p + q}{2}) + \frac{1}{2} S M_{h, α, β} (q, \frac{p + q}{2})

(11)

In the same way, we define the Jeffreys–Sharma–Mittal h–divergence:

Jef S M_{α, β}^{h} (p, q) = S M_{h, α, β} (p, q) + S M_{h, α, β} (q, p) .

(12)

Additionally, if the function

h (t) = t

then we define the Jensen–Sharma–Mittal and the Jeffreys–Sharma–Mittal divergences of order

α

and degree

β

, respectively.

Jen S M_{α, β} (p, q) = \frac{1}{2} S M_{α, β} (p, \frac{p + q}{2}) + \frac{1}{2} S M_{α, β} (q, \frac{p + q}{2}),

(13)

Jef S M_{α, β} (p, q) = S M_{α, β} (p, q) + S M_{α, β} (q, p) .

(14)

When in (8) and (9)

β \to 1

and we substitute for

ϕ (\frac{p_{i}}{q_{i}}, α) = f_{α} (\frac{p_{i}}{q_{i}})

then we obtain, defined in [16], the generalized

(h, F)

Jensen–Rényi and Jeffreys–Rényi divergences, respectively:

Jen S M_{α, 1}^{h, f_{α}} (p, q) = Jen R_{h, f_{α}} (p, q) = \frac{1}{2} R_{h, f_{α}} (p, \frac{p + q}{2}) + \frac{1}{2} R_{h, f_{α}} (q, \frac{p + q}{2}),

Jef S M_{α, 1}^{h, f_{α}} (p, q) = Jef R_{h, f_{α}} (p, q) = R_{h, f_{α}} (p, q) + R_{h, f_{α}} (q, p) .

(15)

The following theorem is the generalization and refinement of the inequalities for some known divergences and provides lower and upper bounds for the generalized

(h, ϕ)

Jeffreys–Sharma–Mittal divergence in order to a more accurate estimation of its uncertainty measure.

Theorem 1.

Let

p = (p_{1}, \dots, p_{n})

and

q = (q_{1}, \dots, q_{n})

be two discrete probability distributions with

p_{i} > 0

,

q_{i} > 0

,

\frac{q_{i}}{p_{i}} \in I_{0}

,

\frac{p_{i}}{q_{i}} \in I_{0}

,

i = 1, \dots, n

, where

I_{0} \subset R

is an interval, such that

1 \in I_{0}

. Let

ϕ : I_{0} \times R_{+} \to R_{+}

be an increasing, non-negative and differentiable function for which

\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) \geq 1

and

\sum_{i = 1}^{n} p_{i} ϕ (\frac{q_{i}}{p_{i}}, α) \geq 1

where

1 < β \leq α

,

α \in R_{+} ∖ {1}

and

h : I_{0} \to R

be a convex and increasing function on

I_{0}

.

Then, the following inequalities are valid:

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(ϕ (\frac{p_{i}}{q_{i}}, α))}^{q_{i}} {(ϕ (\frac{q_{i}}{p_{i}}, α))}^{p_{i}} \leq Jef S M_{α, β}^{h, ϕ} (p, q) \leq

\frac{1}{β - 1} (Jef C_{h \circ ϕ} (p, q) - 2) .

(16)

Proof.

Taking into account the assumptions, we could formulate the following inequality:

{[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}} \leq \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) .

(17)

The function h is increasing and convex, therefore, from (4) and (17) we obtain inequalities:

h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) \leq h (\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)) \leq \sum_{i = 1}^{n} q_{i} (h \circ ϕ) (\frac{p_{i}}{q_{i}}, α) .

(18)

In the same way, we obtain the following inequalities:

h ({[\sum_{i = 1}^{n} p_{i} ϕ (\frac{q_{i}}{p_{i}}, α)]}^{\frac{1 - β}{1 - α}}) \leq h (\sum_{i = 1}^{n} p_{i} ϕ (\frac{q_{i}}{p_{i}}, α)) \leq \sum_{i = 1}^{n} p_{i} (h \circ ϕ) (\frac{q_{i}}{p_{i}}, α) .

(19)

From (9) we have

Jef S M_{α, β}^{h, ϕ} (p, q) = S M_{h, ϕ, α, β} (p, q) + S M_{h, ϕ, α, β} (q, p) =

\frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} + \frac{h ({[\sum_{i = 1}^{n} p_{i} ϕ (\frac{q_{i}}{p_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1}

Taking into account (2), (18), (19) and the definition of Jeffreys divergence, it stands that:

Jef S M_{α, β}^{h, ϕ} (p, q) \leq \frac{\sum_{i = 1}^{n} q_{i} (h \circ ϕ) (\frac{p_{i}}{q_{i}}, α) - 1}{β - 1} + \frac{\sum_{i = 1}^{n} p_{i} (h \circ ϕ) (\frac{q_{i}}{p_{i}}, α) - 1}{β - 1} =

\frac{\sum_{i = 1}^{n} q_{i} (h \circ ϕ) (\frac{p_{i}}{q_{i}}, α) + \sum_{i = 1}^{n} p_{i} (h \circ ϕ) (\frac{q_{i}}{p_{i}}, α) - 2}{β - 1} = \frac{Jef C_{h \circ ϕ} (p, q) - 2}{β - 1} .

(20)

The above inequality is the upper bound for generalized

(h, ϕ)

Jeffreys–Sharma–Mittal divergence.

By using the convexity of the function h with

h (1) = 1

the following inequality is valid for

β > 1

:

\frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} \geq {\frac{\partial}{\partial β}}_{|_{β = 1}} h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) .

(21)

From (7) the above derivative function is equal to:

\frac{1}{α - 1} log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)

.

The function

f (t) = log t

is concave and increasing. Then, it stands that:

\frac{1}{α - 1} log \sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α) \geq \frac{1}{α - 1} \sum_{i = 1}^{n} q_{i} log (ϕ (\frac{p_{i}}{q_{i}}, α)) .

(22)

Hence, from (21) and (22) we have the inequality:

\frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} \geq \frac{1}{α - 1} \sum_{i = 1}^{n} q_{i} log (ϕ (\frac{p_{i}}{q_{i}}, α)) .

(23)

Similarly, we obtain the second inequality:

\frac{h ({[\sum_{i = 1}^{n} p_{i} ϕ (\frac{q_{i}}{p_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} \geq \frac{1}{α - 1} \sum_{i = 1}^{n} p_{i} log (ϕ (\frac{q_{i}}{p_{i}}, α)) .

(24)

We have from (6), (23) and (24) that:

S M_{h, ϕ, α, β} (p, q) \geq \frac{1}{α - 1} \sum_{i = 1}^{n} q_{i} log (ϕ (\frac{p_{i}}{q_{i}}, α)),

S M_{h, ϕ, α, β} (q, p) \geq \frac{1}{α - 1} \sum_{i = 1}^{n} p_{i} log (ϕ (\frac{q_{i}}{p_{i}}, α)) .

Then, by using the definition (9) we have:

Jef S M_{α, β}^{h, ϕ} (p, q) \geq \frac{1}{α - 1} (log \prod_{i = 1}^{n} {(ϕ (\frac{p_{i}}{q_{i}}, α))}^{q_{i}} + log \prod_{i = 1}^{n} {(ϕ (\frac{q_{i}}{p_{i}}, α))}^{p_{i}}) =

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(ϕ (\frac{p_{i}}{q_{i}}, α))}^{q_{i}} {(ϕ (\frac{q_{i}}{p_{i}}, α))}^{p_{i}} .

(25)

This result is the lower bound of the generalized

(h, ϕ)

Jeffreys–Sharma–Mittal divergence.

Combining (20) and (25) we obtain the expected inequalities (16). □

Corollary 1.

When we substitute for

ϕ (\frac{p_{i}}{q_{i}}, α) = {(\frac{p_{i}}{q_{i}})}^{α}

then from (16) we obtain the inequalities for Jeffreys–Sharma–Mittal h-divergence:

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(\frac{p_{i}}{q_{i}})}^{α (q_{i} - p_{i})} \leq Jef S M_{α, β}^{h} (p, q) \leq

\frac{\sum_{i = 1}^{n} q_{i} h ({(\frac{p_{i}}{q_{i}})}^{α}) + \sum_{i = 1}^{n} p_{i} h ({(\frac{q_{i}}{p_{i}})}^{α}) - 2}{β - 1} .

We now formulate the theorem thanks to which the estimation of the generalized

(h, ϕ)

Jensen–Sharma–Mittal divergence will be possible.

Theorem 2.

Let

p = (p_{1}, \dots, p_{n})

and

q = (q_{1}, \dots, q_{n})

be two discrete probability distributions with

p_{i} > 0

,

q_{i} > 0

,

\frac{q_{i}}{p_{i}} \in I_{0}

,

\frac{p_{i}}{q_{i}} \in I_{0}

,

i = 1, \dots, n

, where

I_{0} \subset R

is an interval such that

1 \in I_{0}

. Let

ϕ : I_{0} \times R_{+} \to R_{+}

be an increasing, non-negative and differentiable function for which

\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{2 p_{i}}{p_{i} + q_{i}}, α) \geq 1

and

\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{2 q_{i}}{p_{i} + q_{i}}, α) \geq 1

where

1 < β \leq α

,

α \in R_{+} ∖ {1}

and

h : I_{0} \to R

be a convex and increasing function on

I_{0}

.

Then, the following inequalities are valid:

\frac{1}{2 (α - 1)} log \prod_{i = 1}^{n} {(ϕ (\frac{2 p_{i}}{p_{i} + q_{i}}, α) ϕ (\frac{2 q_{i}}{p_{i} + q_{i}}, α))}^{\frac{p_{i} + q_{i}}{2}} \leq Jen S M_{α, β}^{h, ϕ} (p, q) \leq

\frac{\sum_{i = 1}^{n} (p_{i} + q_{i}) [(h \circ ϕ) (\frac{2 p_{i}}{p_{i} + q_{i}}, α) + (h \circ ϕ) (\frac{2 q_{i}}{p_{i} + q_{i}}, α)] - 4}{4 (β - 1)}

(26)

Proof.

Let’s consider the function

\frac{h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} .

(27)

Using the assumptions that the function h is differentiable, convex and

h (1) = 1

we could formulate the following inequality:

\frac{h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} \geq {\frac{\partial}{\partial β}}_{|_{β = 1}} h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}}) .

(28)

Then, (28) is equal to:

[log \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)] (\frac{- 1}{1 - α}) = \frac{1}{α - 1} log \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α) .

Taking into account concavity of the function log, we have that:

\frac{1}{α - 1} log \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α) \geq \frac{1}{α - 1} \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} log (ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)) .

Then, we obtain that (27) is greater than

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α))}^{\frac{p_{i} + q_{i}}{2}} .

(29)

We do the same with the function

\frac{h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{q_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} .

(30)

Hence, we have that (30) is greater than

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(ϕ (\frac{q_{i}}{\frac{p_{i} + q_{i}}{2}}, α))}^{\frac{p_{i} + q_{i}}{2}} .

(31)

Then, combining (27), (29)–(31), and using the definition (8) the following inequality occurs

Jen S M_{α, β}^{h, ϕ} (p, q) \geq \frac{1}{2 (α - 1)} log \prod_{i = 1}^{n} {(ϕ (\frac{2 p_{i}}{p_{i} + q_{i}}, α) ϕ (\frac{2 q_{i}}{p_{i} + q_{i}}, α))}^{\frac{p_{i} + q_{i}}{2}}

(32)

and it is the lower bound of the generalized

(h, ϕ)

Jensen–Sharma–Mittal divergence.

When we consider the function

h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}})

(33)

with

1 < β \leq α

then for the convex and increasing function h we have from (4) that (33) is smaller than

h (\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α)) \leq \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} (h \circ ϕ) (\frac{p_{i}}{\frac{p_{i} + q_{i}}{2}}, α) .

(34)

In a similar way we conclude the following inequality for the function

h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{q_{i}}{\frac{p_{i} + q_{i}}{2}}, α)]}^{\frac{1 - β}{1 - α}})

(35)

and we have

h (\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{q_{i}}{\frac{p_{i} + q_{i}}{2}}, α)) \leq \sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} (h \circ ϕ) (\frac{q_{i}}{\frac{p_{i} + q_{i}}{2}}, α) .

(36)

Then combining (33)–(36) and the definition (8) with the proper transformations we obtain the inequality

Jen S M_{α, β}^{h, ϕ} (p, q) \leq \frac{\sum_{i = 1}^{n} (p_{i} + q_{i}) [(h \circ ϕ) (\frac{2 p_{i}}{p_{i} + q_{i}}, α) + (h \circ ϕ) (\frac{2 q_{i}}{p_{i} + q_{i}}, α)] - 4}{4 (β - 1)}

(37)

which is the upper bound of the generalized

(h, ϕ)

Jensen–Sharma–Mittal divergence.

When we take into account (32) and (37), then we obtain (26). □

Corollary 2.

When we substitute for

ϕ (\frac{2 p_{i}}{p_{i} + q_{i}}, α) = {(\frac{2 p_{i}}{p_{i} + q_{i}})}^{α}

and for

ϕ (\frac{2 q_{i}}{p_{i} + q_{i}}, α) = {(\frac{2 q_{i}}{p_{i} + q_{i}})}^{α}

then from (26) we obtain the inequalities for Jensen–Sharma–Mittal h–divergence:

\frac{1}{2 (α - 1)} log \prod_{i = 1}^{n} {[\frac{4 p_{i} q_{i}}{{(p_{i} + q_{i})}^{2}}]}^{\frac{α (p_{i} + q_{i})}{2}} \leq Jen S M_{α, β}^{h} (p, q) \leq

\frac{\sum_{i = 1}^{n} (p_{i} + q_{i}) [h ({(\frac{2 p_{i}}{p_{i} + q_{i}})}^{α}) + h ({(\frac{2 q_{i}}{p_{i} + q_{i}})}^{α})] - 4}{4 (β - 1)} .

Remark 5.

It could be seen that the lower bounds for both Jeffreys (25) and Jensen (32) Sharma–Mittal

(h, ϕ)

divergences are independent of the function h.

Remark 6.

Taking into account the inequality (10) we obtain the alternative upper bound for the Jensen–Sharma–Mittal and the lower bound for the Jeffreys–Sharma–Mittal generalized

(h, ϕ)

divergences, respectively.

Jen S M_{α, β}^{h, ϕ} (p, q) \leq \frac{h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) + h ({[\sum_{i = 1}^{n} q_{i} ϕ (\frac{p_{i}}{q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 2}{2 (β - 1)},

Jef S M_{α, β}^{h, ϕ} (p, q) \geq \frac{h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{2 p_{i}}{p_{i} + q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} +

\frac{h ({[\sum_{i = 1}^{n} \frac{p_{i} + q_{i}}{2} ϕ (\frac{2 q_{i}}{p_{i} + q_{i}}, α)]}^{\frac{1 - β}{1 - α}}) - 1}{β - 1} .

4. Applications

In this section we show how our theory works.

4.1. Bounds for Sharma–Mittal Divergences

For the functions

h (t) = t

,

ϕ (t, α) = t^{α}

and based on Theorems 1 and 3 we obtain the lower and upper bounds for Jeffreys–Sharma–Mittal and Jensen–Sharma–Mittal divergences, respectively, as follows

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(\frac{p_{i}}{q_{i}})}^{α (q_{i} - p_{i})} \leq Jef S M_{α, β} (p, q) \leq \frac{\sum_{i = 1}^{n} {(p_{i} q_{i})}^{α} (q_{i}^{1 - 2 α} + p_{i}^{1 - 2 α}) - 2}{β - 1}

(38)

\frac{1}{2 (α - 1)} log \prod_{i = 1}^{n} {(\frac{2 \sqrt{p_{i} q_{i}}}{p_{i} + q_{i}})}^{α (p_{i} + q_{i})} \leq Jen S M_{α, β} (p, q) \leq \frac{\sum_{i = 1}^{n} {(\frac{p_{i} + q_{i}}{2})}^{1 - α} (p_{i}^{α} + q_{i}^{α}) - 2}{2 (β - 1)} .

(39)

Remark 7.

The above lower bounds (38) and (39) are the same for Rényi types divergences because they are independent of the parameter β which in that case approaches 1.

Remark 8.

Substituting different values for the parameters α, β, such that

1 < β \leq α

and taking into account the assumptions from the Theorems 1 and 3 about the functions h and ϕ we could formulate new types of divergences and related inequalities which are based on the generalized

(h, ϕ)

Sharma–Mittal divergence.

4.2. Bounds for Tsallis Divergences

When we make the same assumptions as for Sharma–Mittal divergences with additional that

β = α

we obtain the bounds for Tsallis type divergences as follows

\frac{1}{α - 1} log \prod_{i = 1}^{n} {(\frac{p_{i}}{q_{i}})}^{α (q_{i} - p_{i})} \leq Jef T_{α} (p, q) \leq \frac{\sum_{i = 1}^{n} {(p_{i} q_{i})}^{α} (q_{i}^{1 - 2 α} + p_{i}^{1 - 2 α}) - 2}{α - 1}

\frac{1}{2 (α - 1)} log \prod_{i = 1}^{n} {(\frac{2 \sqrt{p_{i} q_{i}}}{p_{i} + q_{i}})}^{α (p_{i} + q_{i})} \leq Jen T_{α} (p, q) \leq \frac{\sum_{i = 1}^{n} {(\frac{p_{i} + q_{i}}{2})}^{1 - α} (p_{i}^{α} + q_{i}^{α}) - 2}{2 (α - 1)} .

4.3. Bounds for Kullback–Leibler Divergences

When we have the same situation as in case of Tsallis divergence that is

h (t) = t

,

ϕ (t, α) = t^{α}

,

α = β

and additionally both

α

and

β

approach 1 then we obtain new upper bounds for Jeffreys and Jensen–Shannon divergences, respectively.

Jef S (p, q) \leq \sum_{i = 1}^{n} (p_{i} + q_{i}) log p_{i} q_{i},

Jen S (p, q) \leq \sum_{i = 1}^{n} [p_{i} log p_{i} + q_{i} log q_{i} - (p_{i} + q_{i}) log (p_{i} + q_{i})] .

The last inequality is equivalent to

Jen S (p, q) \geq 2 l o g 2

.

5. Summary

In this paper, new types of entropy have been defined, which are generalizations of others known and used so far in information theory.

The manuscript deals more with issues in the field of pure mathematics, therefore the standard axioms of entropy used in thermodynamics could, in this case, be extended by other assumptions and properties.

These divergences have been introduced for new physical interpretations which could be generated.

Generalized Sharma–Mittal and consequently Jensen–Sharma–Mittal and Jeffrey–Sharma–Mittal divergences have been defined for obtaining better estimates for known entropies, which will allow to more accurately determination of the dispersion measure of different distributions.

The derived inequalities have both upper and lower limits for the considered f-divergences. As a consequence, we obtain specific estimates for some new order measures. Hence they provide much wider interpretation possibilities in comparing probability distributions in the sense of mutual distances in different spaces.

In the era of advancing quantum mechanics, scientists are striving to build a quantum computer with very high computing power. The obtained results, despite their mathematical and analytical complexity, will very quickly generate specific numerical intervals which are an estimation of new introduced entropies. Therefore, such results as in this paper will be very useful in developing information theory issues.

This work is from the area of pure mathematics, therefore it is more theoretical than practical and makes it possible to find the existing known entropies by means of new defined generalizations. These generalizations can be used for interpreting various physical phenomena. The aim of this manuscript was to provide some new theoretical solutions for physicists who, with their knowledge and experience, will be able to look for new applications.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author wishes to thank anonymous referees for their helpful suggestions improving the readability of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sharma, B.D.; Mittal, D.P. New nonadditive measures of inaccuracy. J. Math. Sci. 1975, 10, 122–133. [Google Scholar]
Üzengi Aktürk, O.; Aktürk, E.; Tomak, M. Can Sobolev inequality be written for Sharma–Mittal entropy? Int. J. Theor. Phys. 2008, 47, 3310–3320. [Google Scholar] [CrossRef]
Naudts, J. Generalised Thermostatistics; Springer: Berlin, Germany, 2011. [Google Scholar]
Elhoseiny, M.; Elgammal, A. Generalized Twin Gaussian Processes using Sharma–Mittal Divergence. arXiv 2015, arXiv:1409.7480v5. [Google Scholar] [CrossRef] [Green Version]
Koltcov, S.; Ignatenko, V.; Koltsova, O. Estimating Topic Modeling Performance with Sharma–Mittal Entropy. Entropy 2019, 21, 660. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bo, L.; Sminchisescu, C. Twin gaussian processes for structured prediction. Int. J. Comput. Vis. 2010, 87, 28–52. [Google Scholar] [CrossRef]
Masi, M. A step beyond Tsallis and Rényi entropies. Phys. Lett. A 2005, 338, 217–224. [Google Scholar] [CrossRef] [Green Version]
Aktürk, E.; Bagci, G.; Sever, R. Is Sharma–Mittal entropy really a step beyond Tsallis and Rényi entropies? arXiv 2007, arXiv:cond-mat/0703277. [Google Scholar]
Verma, R.; Merigó, J.M. On Sharma–Mittal’s Entropy under Intuitionistic Fuzzy Environment. Cybern. Syst. 2021, 52, 498–521. [Google Scholar] [CrossRef]
Ghaffari, S.; Ziaie, A.H.; Moradpour, H.; Asghariyan, F.; Feleppa, F.; Tavayef, M. Black hole thermodynamics in Sharma–Mittal generalized entropy formalism. Gen. Relativ. Gravit. 2019, 51, 93. [Google Scholar] [CrossRef] [Green Version]
Demirel, E.C.G. Dark energy model in higher–dimensional FRW universe with respect to generalized entropy of Sharma and Mittal of flat FRW space–time. Can. J. Phys. 2019, 97, 1185–1186. [Google Scholar] [CrossRef]
Frank, T.; Daffertshofer, A. Exact time–dependent solutions of the Rényi Fokker–Planck equation and the Fokker–Planck equations related to the entropies proposed by Sharma and Mittal. Phys. A Stat. Mech. Appl. 2000, 285, 351–366. [Google Scholar] [CrossRef]
Kluza, P.A.; Niezgoda, M. Inequalities for relative operator entropies. Electron. J. Linear Algebra 2014, 27, 851–864. [Google Scholar] [CrossRef] [Green Version]
Kluza, P.A.; Niezgoda, M. Generalizations of Crooks and Lin’s results on Jeffreys-Csiszár and Jensen-Csiszár f-divergences. Phys. A Stat. Mech. Appl. 2016, 463, 383–393. [Google Scholar] [CrossRef]
Kluza, P.A.; Niezgoda, M. On Csiszár and Tsallis type f-divergences induced by superqudratic and convex functions. Math. Inequal. Appl. 2018, 21, 455–467. [Google Scholar]
Kluza, P.A. On Jensen–Rényi and Jeffreys-Rényi type f-divergences induced by convex functions. Phys. A Stat. Mech. Appl. 2020, 548, 1–10. [Google Scholar] [CrossRef]
Crooks, G.E. On Measures of Entropy and Information. Available online: http://threeplusone.com/info (accessed on 22 September 2018).
Csiszár, I. Information–type measures of differences of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memory-Less Systems; Academic Press: New York, NY, USA, 1981. [Google Scholar]
Dragomir, S.S. Inequalities for the Csiszár f-Divergence in Information Theory. RGMIA Monographs, Victoria University. 2000. Available online: http://rgmia.org/monographs/csiszar.htm (accessed on 1 February 2001).
Baez, J.C. Rényi entropy and free energy. arXiv 2011, arXiv:1102.2098. [Google Scholar]
Dragomir, S.S.; Pecaric, J.E.; Persson, L.E. Properties of some functionals related to Jensen’s inequality. Acta Math. Hungar. 1996, 70, 129–143. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kluza, P.A. Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences. Entropy 2021, 23, 1688. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121688

AMA Style

Kluza PA. Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences. Entropy. 2021; 23(12):1688. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121688

Chicago/Turabian Style

Kluza, Paweł A. 2021. "Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences" Entropy 23, no. 12: 1688. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121688

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inequalities for Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Type f–Divergences

Abstract

1. Introduction

2. Sharma–Mittal Type Divergences

3. Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Divergences

4. Applications

4.1. Bounds for Sharma–Mittal Divergences

4.2. Bounds for Tsallis Divergences

4.3. Bounds for Kullback–Leibler Divergences

5. Summary

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI