1. Introduction
The Sharma–Mittal entropy was introduced as a new measure of information with two parameters [
1]. It has previously been studied in the context of multi-dimensional harmonic oscillator systems [
2]. This entropy could also be formulated in the form of exponential families, to which many usual statistical distributions including the Gaussians and discrete multinomials (that is, normalized histograms) belong. In physical applications it plays a major role in the field of thermo-statistics [
3].
The Sharma–Mittal entropy is also applied for the analysis of the results of machine learning methods [
4,
5]. Additionally, the divergence based on considered entropy could be a cost function in the context of so-called the Twin Gaussian Processes [
6].
It was originally showed by [
7] that the Sharma–Mittal entropy generalized both Tsallis and Rényi entropy in the limiting cases of these two entropies. In [
8], authors suggested a physical meaning of Sharma–Mittal entropy, which is the free energy difference between the equilibrium and the off-equilibrium distribution.
Recently, was published a manuscript showing, in opposition to the work [
8], that Sharma–Mittal entropy besides the convenient thermodynamic systems does not reduce only to Kullback–Leibler entropy. In [
9] Verma and Merigó present the use of Sharma–Mittal entropy under intuitionistic fuzzy environment. Additionally, in [
5] Koltcov et al. demonstrate that Sharma–Mittal entropy is a tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do.
Another applications of considered entropy are interesting results in the cosmological setup, such as black hole thermodynamics [
10]. Namely, it helps us to describe the current accelerated universe by using the vacuum energy in a suitable manner [
11]. In addition [
12] have established the relation between anomalous diffusion process and Sharma–Mittal entropy.
This paper is based on publications in which we introduced new types of
f-divergences [
13,
14,
15,
16].
In this paper we generalize Sharma–Mittal types divergences in order to obtain new types of divergences and hence the inequalities from which it will be possible to derive new results and generalizations for known divergences in order to estimate the lower and upper bounds which determine the level of the uncertainty measure.
2. Sharma–Mittal Type Divergences
Throughout and denote the sets of non-negative and positive numbers, respectively, i.e., and .
Let
and
with
,
. The
relative entropy (also called
Kullback–Leibler divergence) is defined by (see [
17])
In the above definition, based on continuity arguments, we use a convention that and . Additionally .
Let f: be a convex function on , and , .
The
Csiszár f-divergence is defined by (see [
15])
with the conventions
and
,
(see [
18,
19,
20]).
The
Tsallis divergence of order
is defined by (see [
17])
The
Rényi divergence of order
is defined by (see [
17,
21])
The
Sharma–Mittal divergence of order
and degree
is defined by (see [
4])
for all
,
and
.
Let be a convex function on an interval . Let and for .
The
Jensen’s inequality is as follows (see [
22])
When the function is convex and the function is convex and increasing then the composition of the functions is convex. We assume that the probabilities and for .
It is known (see [
4]) that if
Let
be the differentiable function. Then the
Sharma–Mittal h-divergence is defined as follows:
for all
,
and
.
If we assume that
then (
5) becomes Sharma–Mittal divergence.
When for all
,
then (
5) becomes Rényi divergence of order
.
We substitute for
and we have
Let
be a differentiable function with respect to
and
We assume that
and
. Then,
Hence, the Sharma–Mittal h–divergence tends to Rényi divergence of order .
Remark 1. If, additionally, α tends to 1 then based on the proof of the Equation (11) from [16], Sharma–Mittal h-divergence tends to relative entropy (called Kullback–Leibler divergence). Now we define a new
generalized Sharma–Mittal divergence as follows
where
is an increasing, non-negative and differentiable function for
.
We assume that is a given family of functions such that for and which are increasing, non-negative for and such that for every the function is differentiable.
According to [
16] if we substitute the function
from the family
for
then it stands that
We assume that
. Then,
Remark 2. If in (6) and then the generalized –Sharma–Mittal divergence tends to generalized –Rényi divergence. The function is the generalization of the function which is used for example in Csiszár f-divergence. Condition means that the limit of generalized Sharma–Mittal divergence is equal to generalized –Rényi divergence. Hence we have implications for generalized forms of entropies.
Remark 3. Additionally, when in (6) and then the generalized –Sharma–Mittal divergence tends to Kullback–Leibler divergence, because we have from Remark 2 Remark 4. In (6), when the parameter , the function and then the generalized –Sharma–Mittal divergence tends to the Tsallis f-divergence or order α. This work is more theoretical than practical. Therefore, the implications are formulated in the mathematical area that is from constructing general model which gives known specific cases.
3. Jensen–Sharma–Mittal and Jeffreys–Sharma–Mittal Divergences
The Jensen–Shannon divergence (Jensen–Shannon entropy) is defined as follows (see [
17]):
The Jeffreys divergence (Jeffreys entropy) is defined as follows (see [
17]):
We introduce a new generalized
Jensen–Sharma–Mittal divergence defined by
with assumptions as before.
We similarly introduce a new generalized
Jeffreys–Sharma–Mittal divergence as follows
Taking into account inequality from [
17]:
describing the relation between the Jensen–Shannon and Jeffreys divergences, we could formulate the following:
We define the
Jensen–Sharma–Mittal h-divergence where, in (
8),
. Then, it takes the form:
In the same way, we define the
Jeffreys–Sharma–Mittal h–divergence:
Additionally, if the function
then we define the Jensen–Sharma–Mittal and the Jeffreys–Sharma–Mittal divergences of order
and degree
, respectively.
When in (
8) and (
9)
and we substitute for
then we obtain, defined in [
16], the generalized
Jensen–Rényi and Jeffreys–Rényi divergences, respectively:
The following theorem is the generalization and refinement of the inequalities for some known divergences and provides lower and upper bounds for the generalized Jeffreys–Sharma–Mittal divergence in order to a more accurate estimation of its uncertainty measure.
Theorem 1. Let and be two discrete probability distributions with , , , , , where is an interval, such that . Let be an increasing, non-negative and differentiable function for which and where , and be a convex and increasing function on .
Then, the following inequalities are valid: Proof. Taking into account the assumptions, we could formulate the following inequality:
The function
h is increasing and convex, therefore, from (
4) and (
17) we obtain inequalities:
In the same way, we obtain the following inequalities:
Taking into account (
2), (
18), (
19) and the definition of Jeffreys divergence, it stands that:
The above inequality is the upper bound for generalized Jeffreys–Sharma–Mittal divergence.
By using the convexity of the function
h with
the following inequality is valid for
:
From (
7) the above derivative function is equal to:
.
The function
is concave and increasing. Then, it stands that:
Hence, from (
21) and (
22) we have the inequality:
Similarly, we obtain the second inequality:
We have from (
6), (
23) and (
24) that:
Then, by using the definition (
9) we have:
This result is the lower bound of the generalized Jeffreys–Sharma–Mittal divergence.
Combining (
20) and (
25) we obtain the expected inequalities (
16). □
Corollary 1. When we substitute for then from (16) we obtain the inequalities for Jeffreys–Sharma–Mittal h-divergence: We now formulate the theorem thanks to which the estimation of the generalized Jensen–Sharma–Mittal divergence will be possible.
Theorem 2. Let and be two discrete probability distributions with , , , , , where is an interval such that . Let be an increasing, non-negative and differentiable function for which and where , and be a convex and increasing function on .
Then, the following inequalities are valid: Proof. Let’s consider the function
Using the assumptions that the function
h is differentiable, convex and
we could formulate the following inequality:
Taking into account concavity of the function log, we have that:
Then, we obtain that (
27) is greater than
We do the same with the function
Hence, we have that (
30) is greater than
Then, combining (
27), (
29)–(
31), and using the definition (
8) the following inequality occurs
and it is the lower bound of the generalized
Jensen–Sharma–Mittal divergence.
When we consider the function
with
then for the convex and increasing function
h we have from (
4) that (
33) is smaller than
In a similar way we conclude the following inequality for the function
and we have
Then combining (
33)–(
36) and the definition (
8) with the proper transformations we obtain the inequality
which is the upper bound of the generalized
Jensen–Sharma–Mittal divergence.
When we take into account (
32) and (
37), then we obtain (
26). □
Corollary 2. When we substitute for and for then from (26) we obtain the inequalities for Jensen–Sharma–Mittal h–divergence: Remark 5. It could be seen that the lower bounds for both Jeffreys (25) and Jensen (32) Sharma–Mittal divergences are independent of the function h. Remark 6. Taking into account the inequality (10) we obtain the alternative upper bound for the Jensen–Sharma–Mittal and the lower bound for the Jeffreys–Sharma–Mittal generalized divergences, respectively. 4. Applications
In this section we show how our theory works.
4.1. Bounds for Sharma–Mittal Divergences
For the functions
,
and based on Theorems 1 and 3 we obtain the lower and upper bounds for Jeffreys–Sharma–Mittal and Jensen–Sharma–Mittal divergences, respectively, as follows
Remark 7. The above lower bounds (38) and (39) are the same for Rényi types divergences because they are independent of the parameter β which in that case approaches 1. Remark 8. Substituting different values for the parameters α, β, such that and taking into account the assumptions from the Theorems 1 and 3 about the functions h and ϕ we could formulate new types of divergences and related inequalities which are based on the generalized Sharma–Mittal divergence.
4.2. Bounds for Tsallis Divergences
When we make the same assumptions as for Sharma–Mittal divergences with additional that
we obtain the bounds for Tsallis type divergences as follows
4.3. Bounds for Kullback–Leibler Divergences
When we have the same situation as in case of Tsallis divergence that is
,
,
and additionally both
and
approach 1 then we obtain new upper bounds for Jeffreys and Jensen–Shannon divergences, respectively.
The last inequality is equivalent to .
5. Summary
In this paper, new types of entropy have been defined, which are generalizations of others known and used so far in information theory.
The manuscript deals more with issues in the field of pure mathematics, therefore the standard axioms of entropy used in thermodynamics could, in this case, be extended by other assumptions and properties.
These divergences have been introduced for new physical interpretations which could be generated.
Generalized Sharma–Mittal and consequently Jensen–Sharma–Mittal and Jeffrey–Sharma–Mittal divergences have been defined for obtaining better estimates for known entropies, which will allow to more accurately determination of the dispersion measure of different distributions.
The derived inequalities have both upper and lower limits for the considered f-divergences. As a consequence, we obtain specific estimates for some new order measures. Hence they provide much wider interpretation possibilities in comparing probability distributions in the sense of mutual distances in different spaces.
In the era of advancing quantum mechanics, scientists are striving to build a quantum computer with very high computing power. The obtained results, despite their mathematical and analytical complexity, will very quickly generate specific numerical intervals which are an estimation of new introduced entropies. Therefore, such results as in this paper will be very useful in developing information theory issues.
This work is from the area of pure mathematics, therefore it is more theoretical than practical and makes it possible to find the existing known entropies by means of new defined generalizations. These generalizations can be used for interpreting various physical phenomena. The aim of this manuscript was to provide some new theoretical solutions for physicists who, with their knowledge and experience, will be able to look for new applications.