Next Article in Journal
Information-Theoretic Bound on the Entropy Production to Maintain a Classical Nonequilibrium Distribution Using Ancillary Control
Next Article in Special Issue
Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes
Previous Article in Journal
Modeling and Performance Evaluation of a Context Information-Based Optimized Handover Scheme in 5G Networks
Previous Article in Special Issue
Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Extractable Shared Information

1
Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
2
Frankfurt Institute for Advanced Studies, 60438 Frankfurt, Germany
*
Author to whom correspondence should be addressed.
Submission received: 31 May 2017 / Accepted: 22 June 2017 / Published: 3 July 2017

Abstract

:
We consider the problem of quantifying the information shared by a pair of random variables X 1 , X 2 about another variable S. We propose a new measure of shared information, called extractable shared information, that is left monotonic; that is, the information shared about S is bounded from below by the information shared about f ( S ) for any function f. We show that our measure leads to a new nonnegative decomposition of the mutual information I ( S ; X 1 X 2 ) into shared, complementary and unique components. We study properties of this decomposition and show that a left monotonic shared information is not compatible with a Blackwell interpretation of unique information. We also discuss whether it is possible to have a decomposition in which both shared and unique information are left monotonic.

1. Introduction

A series of recent papers have focused on the bivariate information decomposition problem [1,2,3,4,5,6]. Consider three random variables S, X 1 , X 2 with finite alphabets S , X 1 and X 2 , respectively. The total information that the pair ( X 1 , X 2 ) convey about the target S can have aspects of shared or redundant information (conveyed by both X 1 and X 2 ), of unique information (conveyed exclusively by either X 1 or X 2 ), and of complementary or synergistic information (retrievable only from the the joint variable ( X 1 , X 2 ) ). In general, all three kinds of information may be present concurrently. One would like to express this by decomposing the mutual information I ( S ; X 1 X 2 ) into a sum of nonnegative components with a well-defined operational interpretation. One possible application area is in the neurosciences. In [7], it is argued that such a decomposition can provide a framework to analyze neural information processing using information theory that can integrate and go beyond previous attempts.
For the general case of k finite source variables ( X 1 , , X k ) , Williams and Beer [3] proposed the partial information lattice framework that specifies how the total information about the target S is shared across the singleton sources and their disjoint or overlapping coalitions. The lattice is a consequence of certain natural properties of shared information (sometimes called the Williams–Beer axioms). In the bivariate case ( k = 2 ), the decomposition has the form
I ( S ; X 1 X 2 ) = S I ( S ; X 1 , X 2 ) shared + C I ( S ; X 1 , X 2 ) complementary + U I ( S ; X 1 \ X 2 ) unique ( X 1 wrt X 2 ) + U I ( S ; X 2 \ X 1 ) unique ( X 2 wrt X 1 ) ,
I ( S ; X 1 ) = S I ( S ; X 1 , X 2 ) + U I ( S ; X 1 \ X 2 ) ,
I ( S ; X 2 ) = S I ( S ; X 1 , X 2 ) + U I ( S ; X 2 \ X 1 ) ,
where S I ( S ; X 1 , X 2 ) , U I ( S ; X 1 \ X 2 ) , U I ( S ; X 2 \ X 1 ) , and C I ( S ; X 1 , X 2 ) are nonnegative functions that depend continuously on the joint distribution of ( S , X 1 , X 2 ) . The difference between shared and complementary information is the familiar co-information [8] (or interaction information [9]), a symmetric generalization of the mutual information for three variables,
C o I ( S ; X 1 , X 2 ) = I ( S ; X 1 ) I ( S ; X 1 | X 2 ) = S I ( S ; X 1 , X 2 ) C I ( S ; X 1 , X 2 ) .
Equations (1) to (3) leave only a single degree of freedom, i.e., it suffices to specify either a measure for S I , for C I or for U I .
Williams and Beer not only introduced the general partial information framework, but also proposed a measure of S I to fill this framework. While their measure has subsequently been criticized for “not measuring the right thing” [4,5,6], there has been no successful attempt to find better measures, except for the bivariate case ( k = 2 ) [1,4]. One problem seems to be the lack of a clear consensus on what an ideal measure of shared (or unique or complementary) information should look like and what properties it should satisfy. In particular, the Williams–Beer axioms only put crude bounds on the values of the functions S I , U I and C I . Therefore, additional axioms have been proposed by various authors [4,5,6]. Unfortunately, some of these properties contradict each other [5], and the question for the right axiomatic characterization is still open.
The Williams–Beer axioms do not say anything about what should happen when the target variable S undergoes a local transformation. In this context, the following left monotonicity property was proposed in [5]:
(LM) S I ( S ; X 1 , X 2 ) S I ( f ( S ) ; X 1 , X 2 ) for any function f.      (left monotonicity)
Left monotonicity for unique or complementary information can be defined similarly. The property captures the intuition that shared information should only decrease if the target performs some local operation (e.g., coarse graining) on her variable S. As argued in [2], left monotonicity of shared and unique information are indeed desirable properties. Unfortunately, none of the measures of shared information proposed so far satisfy left monotonicity.
In this contribution, we study a construction that enforces left monotonicity. Namely, given a measure of shared information S I , define
S I ¯ ( S ; X 1 , X 2 ) : = sup f : S S S I ( f ( S ) ; X 1 , X 2 ) ,
where the supremum runs over all functions f : S S from the domain of S to an arbitrary finite set S . By construction, S I ¯ satisfies left monotonicity, and S I ¯ is the smallest function bounded from below by S I that satisfies left monotonicity.
Changing the definition of shared information in the information decomposition framework Equations (1)–(3) leads to new definitions of unique and complementary information:
U I ¯ * ( S ; X 1 \ X 2 ) : = I ( S ; X 1 ) S I ¯ ( S ; X 1 , X 2 ) , U I ¯ * ( S ; X 2 \ X 1 ) : = I ( S ; X 2 ) S I ¯ ( S ; X 1 , X 2 ) , C I ¯ * ( S ; X 1 , X 2 ) : = I ( S ; X 1 X 2 ) S I ¯ ( S ; X 1 , X 2 ) U I ¯ * ( S ; X 1 \ X 2 ) U I ¯ * ( S ; X 2 \ X 1 ) .
In general, U I ¯ * ( S ; X 1 \ X 2 ) U I ¯ ( S ; X 1 \ X 2 ) : = sup f : S S U I ( f ( S ) ; X 1 \ X 2 ) . Thus, our construction cannot enforce left monotonicity for both U I and S I in parallel.
Lemma 2 shows that S I ¯ , U I ¯ * and C I ¯ * are nonnegative and thus define a nonnegative bivariate decomposition. We study this decomposition in Section 4. In Theorem 1, we show that our construction is not compatible with a decision-theoretic interpretation of unique information proposed in [1]. In Section 5, we ask whether it is possible to find an information decomposition in which both shared and unique information measures are left monotonic. Our construction cannot directly be generalized to ensure left monotonicity of two functions simultaneously. Nevertheless, it is possible that such a decomposition exists, and in Proposition 1, we prove bounds on the corresponding shared information measure.
Our original motivation for the definition of S I ¯ was to find a bivariate decomposition in which the shared information satisfies left monotonicity. However, one could also ask whether left monotonicity is a required property of shared information, as put forward in [2]. In contrast, Harder et al. [4] argue that redundancy can also arise by means of a mechanism. Applying a function to S corresponds to such a mechanism that singles out a certain aspect from S. Even if all the X i share nothing about the whole S, they might still share information about this aspect of S, which means that the shared information will increase. With this intuition, we can interpret S I ¯ not as an improved measure of shared information, but as a measure of extractable shared information, because it asks for the maximal amount of shared information that can be extracted from S by further processing S by a local mechanism. More generally, one can apply a similar construction to arbitrary information measures. We explore this idea in Section 3 and discuss probabilistic generalizations and relations to other information measures. In Section 6, we apply our construction to existing measures of shared information.

2. Properties of Information Decompositions

2.1. The Williams–Beer Axioms

Although we are mostly concerned with the case k = 2 , let us first recall the three axioms that Williams and Beer [3] proposed for a measure of shared information for arbitrarily many arguments:
(S) S I ( S ; X 1 , , X k ) is symmetric under permutations of X 1 , , X k ,    (Symmetry)
(SR) S I ( S ; X 1 ) = I ( S ; X 1 ) ,   (Self-redundancy)
(M) S I ( S ; X 1 , , X k 1 , X k ) S I ( S ; X 1 , , X k 1 ) ,
with equality if X i = f ( X k ) for some i < k and some function f. (Monotonicity)
Any measure of S I satisfying these axioms is nonnegative. Moreover, the axioms imply the following:
(RM) S I ( S ; X 1 , , X k ) S I ( S ; f 1 ( X 1 ) , , f k ( X k ) ) for all functions f 1 , , f k . (right monotonicity)
Williams and Beer also defined a function
I min ( S ; X 1 , , X k ) = s P S ( s ) min i x i P X i | S ( x i | s ) log P S | X i ( s | x i ) P S ( s )
and showed that I min satisfies their axioms.

2.2. The Copy Example and the Identity Axiom

Let X 1 , X 2 be independent uniformly distributed binary random variables, and consider the copy function C O P Y ( X 1 , X 2 ) : = ( X 1 , X 2 ) . One point of criticism of I min is the fact that X 1 and X 2 share I min ( C O P Y ( X 1 , X 2 ) ; X 1 , X 2 ) = 1 bit about C O P Y ( X 1 , X 2 ) according to I min , even though they are independent. Harder et al. [4] argue that the shared information about the copied pair should equal the mutual information:
(Id)   S I ( C O P Y ( X 1 , X 2 ) ; X 1 , X 2 ) = I ( X 1 ; X 2 ) .      (Identity)
Ref. [4] also proposed a bivariate measure of shared information that satisfies (Id). Similarly, the measures of bivariate shared information proposed in [1] satisfies (Id). However, (Id) is incompatible with a nonnegative information decomposition according to the Williams–Beer axioms for k 3 [2].
On the other hand, Ref. [5] uses an example from game theory to give an intuitive explanation how even independent variables X 1 and X 2 can have nontrivial shared information. However, in any case the value of 1 bit assigned by I min is deemed to be too large.

2.3. The Blackwell Property and Property (*)

One of the reasons that it is so difficult to find good definitions of shared, unique or synergistic information is that a clear operational idea behind these notions is missing. Starting from an operational idea about decision problems, Ref. [1] proposed the following property for the unique information, which we now propose to call Blackwell property:
(BP) For a given joint distribution P S X 1 X 2 , U I ( S ; X 1 \ X 2 ) vanishes if and only if there exists a random variable X 1 such that S X 2 X 1 is a Markov chain and P S X 1 = P S X 1 .    (Blackwell property)
In other words, the channel S X 1 is a garbling or degradation of the channel S X 2 . Blackwell’s theorem [10] implies that this garbling property is equivalent to the fact that any decision problem in which the task is to predict S can be solved just as well with the knowledge of X 2 as with the knowledge of X 1 . We refer to Section 2 in [1] for the details.
Ref. [1] also proposed the following property:
(*) S I and U I depend only on the marginal distributions P S X 1 and P S X 2 of the pairs ( S , X 1 ) and ( S , X 2 ) .
This property was in part motivated by (BP), which also depends only on the channels S X 1 and S X 2 and thus on P S X 1 and P S X 2 . Most information decompositions proposed so far satisfy property (*).

3. Extractable Information Measures

One can interpret S I ¯ as a measure of extractable shared information. We explain this idea in a more general setting.
For fixed k, let I M ( S ; X 1 , , X k ) be an arbitrary information measure that measures one aspect of the information that X 1 , , X k contain about S. At this point, we do not specify what precisely an information measure is, except that it is a function that assigns a real number to any joint distributions of S , X 1 , , X k . The notation is, of course, suggestive of the fact that we mostly think about one of the measures S I , U I or C I , in which the first argument plays a special role. However, I M could also be the mutual information I ( S ; X 1 ) , the entropy H ( S ) , or the coinformation C o I ( S ; X 1 , X 2 ) . We define the corresponding extractable information measure as
I M ¯ ( S ; X 1 , , X k ) : = sup f I M ( f ( S ) ; X 1 , , X k ) ,
where the supremum runs over all functions f : S S from the domain of S to an arbitrary finite set S . The intuition is that I M ¯ is the maximal possible amount of I M one can “extract” from ( X 1 , , X k ) by transforming S. Clearly, the precise interpretation depends on the interpretation of I M .
This construction has the following general properties:
  • Most information measures satisfy I M ( O ; X 1 , , X k ) = 0 when O is a constant random variable. Thus, in this case, I M ¯ ( S ; X 1 , , X k ) 0 . Thus, for example, even though the coinformation can be negative, the extractable coinformation is never negative.
  • Suppose that I M satisfies left monotonicity. Then, I M ¯ = I M . For example, entropy H and mutual information I satisfy left monotonicity, and so H ¯ = H and I ¯ = I . Similarly, as shown in [2], the measure of unique information U I ˜ defined in [1] satisfies left monotonicity, and so U I ˜ ¯ = U I ˜ .
  • In fact, I M ¯ is the smallest left monotonic information measure that is at least as large as I M .
The next result shows that our construction preserves monotonicity properties of the other arguments of I M . It follows that, by iterating this construction, one can construct an information measure that is monotonic in all arguments.
Lemma 1.
Let f 1 , , f k be fixed functions. If I M satisfies I M ( S ; f 1 ( X 1 ) , , f k ( X k ) ) I M ( S ; X 1 , , X k ) for all S, then I M ¯ ( S ; f 1 ( X 1 ) , , f k ( X k ) ) I M ¯ ( S ; X 1 , , X k ) for all S.
Proof. 
Let f * = arg max f I M ( f ( S ) ; f 1 ( X 1 ) , , f k ( X k ) ) . Then,
I M ¯ ( S ; f 1 ( X 1 ) , , f k ( X k ) ) = I M ( f * ( S ) ; f 1 ( X 1 ) , , f k ( X k ) ) ( a ) I M ( f * ( S ) ; X 1 , , X k ) sup f I M ( f ( S ) ; X 1 , , X k ) = I M ¯ ( S ; X 1 , , X k ) ,
where (a) follows from the assumptions. ☐
As a generalization to the construction, instead of looking at “deterministic extractability,” one can also look at “probabilistic extractability” and replace f by a stochastic matrix. This leads to the definition
I M ¯ ( S ; X 1 , , X k ) : = sup P S | S I M ( S ; X 1 , , X k ) ,
where the supremum now runs over all random variables S that are independent of X 1 , , X k given S. The function I M ¯ is the smallest function bounded from below by I M that satisfies
(PLM) I M ( S ; X 1 , X 2 ) I M ( S ; X 1 , X 2 ) whenever S is independent of X 1 , X 2 given S.      (probabilistic left monotonicity)
An example of this construction is the intrinsic conditional information I ( X ; Y Z ) : = min P Z | Z I ( X ; Y | Z ) , which was defined in [11] to study the secret-key rate, which is the maximal rate at which a secret can be generated by two agents knowing X or Y, respectively, such that a third agent who knows Z has arbitrarily small information about this key. The min instead of the max in the definition implies that I ( X ; Y Z ) is “anti-monotone” in Z.
In this paper, we restrict ourselves to the deterministic notions, since many of the properties we want to discuss can already be explained using deterministic extractability. Moreover, the optimization problem (6) is a finite optimization problem and thus much easier to solve than Equation (7).

4. Extractable Shared Information

We now specialize to the case of shared information. The first result is that when we apply our construction to a measure of shared information that belongs to a bivariate information decomposition, we again obtain a bivariate information decomposition.
Lemma 2.
Suppose that S I is a measure of shared information, coming from a nonnegative bivariate information decomposition (satisfying Equations (1) to (3)). Then, S I ¯ defines a nonnegative information decomposition; that is, the derived functions
U I ¯ * ( S ; X 1 \ X 2 ) : = I ( S ; X 1 ) S I ¯ ( S ; X 1 , X 2 ) , U I ¯ * ( S ; X 2 \ X 1 ) : = I ( S ; X 2 ) S I ¯ ( S ; X 1 , X 2 ) , and C I ¯ * ( S ; X 1 , X 2 ) : = I ( S ; X 1 X 2 ) S I ¯ ( S ; X 1 , X 2 ) U I ¯ * ( S ; X 1 \ X 2 ) U I ¯ * ( S ; X 2 \ X 1 )
are nonnegative. These quantities relate to the original decomposition by
a ) S I ¯ ( S ; X 1 , X 2 ) S I ( S ; X 1 , X 2 ) , b ) C I ¯ * ( S ; X 1 , X 2 ) C I ( S ; X 1 , X 2 ) , c ) U I ( f * ( S ) ; X 1 \ X 2 ) U I ¯ * ( S ; X 1 \ X 2 ) U I ( S ; X 1 \ X 2 ) ,
where f * is a function that achieves the supremum in Equation (4).
Proof. 
a ) S I ¯ ( S ; X 1 , X 2 ) S I ( S ; X 1 , X 2 ) 0 , b ) C I ¯ * ( S ; X 1 , X 2 ) = S I ¯ ( S ; X 1 , X 2 ) C o I ( S ; X 1 , X 2 ) S I ( S ; X 1 , X 2 ) C o I ( S ; X 1 , X 2 ) C I ( S ; X 1 , X 2 ) 0 , c ) U I ¯ * ( S ; X 1 \ X 2 ) = I ( S ; X 1 ) S I ¯ ( S ; X 1 , X 2 ) I ( S ; X 1 ) S I ( S ; X 1 , X 2 ) = U I ( S ; X 1 \ X 2 ) , U I ¯ * ( S ; X 1 \ X 2 ) = I ( S ; X 1 ) S I ¯ ( S ; X 1 , X 2 ) I ( f * ( S ) ; X 1 ) S I ( f * ( S ) ; X 1 , X 2 ) = U I ( f * ( S ) ; X 1 \ X 2 ) 0 ,
where we have used the data processing inequality. ☐
Lemma 3.
  • If S I satisfies (*), then S I ¯ also satisfies (*).
  • If S I is right monotonic, then S I ¯ is also right monotonic.
Proof. 
(1) is direct, and (2) follows from Lemma 1. ☐
Without further assumptions on S I , we cannot say much about when S I ¯ vanishes. However, the condition that U I ¯ * vanishes has strong consequences.
Lemma 4.
Suppose that U I ¯ * ( S ; X 1 \ X 2 ) vanishes, and let f * be a function that achieves the supremum in Equation (4). Then, there is a Markov chain X 1 f * ( S ) — S. Moreover, U I ( f * ( S ) ; X 1 \ X 2 ) = 0 .
Proof. 
Suppose that U I ¯ * ( S ; X 1 \ X 2 ) = 0 . Then, I ( S ; X 1 ) = S I ¯ ( S ; X 1 , X 2 ) = S I ( f * ( S ) ; X 1 , X 2 ) I ( f * ( S ) ; X 1 ) I ( S ; X 1 ) . Thus, the data processing inequality holds with equality. This implies that X 1 f * ( S ) S is a Markov chain. The identity U I ( f * ( S ) ; X 1 \ X 2 ) = 0 follows from the same chain of inequalities. ☐
Theorem 1.
If U I has the Blackwell property, then U I ¯ * does not have the Blackwell property.
Proof. 
As shown in the example in the appendix, there exist random variables S, X 1 , X 2 and a function f that satisfy
  • S and X 1 are independent given f ( S ) .
  • The channel f ( S ) X 1 is a garbling of the channel f ( S ) X 2 .
  • The channel S X 1 is not a garbling of the channel S X 2 .
We claim that f solves the optimization problem (4). Indeed, for an arbitrary function f ,
S I ( f ( S ) ; X 1 , X 2 ) I ( f ( S ) ; X 1 ) I ( S ; X 1 ) = I ( f ( S ) ; X 1 ) = S I ( f ( S ) ; X 1 , X 2 ) .
Thus, f solves the maximization problem (4).
If U I satisfies the Blackwell property, then (2) and (3) imply U I ( f ( S ) ; X 1 \ X 2 ) = 0 and U I ( S ; X 1 \ X 2 ) > 0 . On the other hand,
U I ¯ * ( S ; X 1 \ X 2 ) = I ( S ; X 1 ) S I ¯ ( S ; X 1 , X 2 ) = I ( S ; X 1 ) S I ( f ( S ) ; X 1 , X 2 ) = I ( S ; X 1 ) I ( f ( S ) ; X 1 ) + U I ( f ( S ) ; X 1 \ X 2 ) = 0 .
Thus, U I ¯ * does not satisfy the Blackwell property. ☐
Corollary 1.
There is no bivariate information decomposition in which U I satisfies the Blackwell property and S I satisfies left monotonicity.
Proof. 
If S I satisfies left monotonicity, then S I ¯ = S I . Thus, U I = U I ¯ * cannot satisfy the Blackwell property by Theorem 1. ☐

5. Left Monotonic Information Decompositions

Is it possible to have an extractable information decomposition? More precisely, is it possible to have an information decomposition in which all information measures are left monotonic? The obvious strategy of starting with an arbitrary information decomposition and replacing each partial information measure by its extractable analogue does not work, since this would mean increasing all partial information measures (unless they are extractable already), but then their sum would also increase. For example, in the bivariate case, when S I is replaced by a larger function S I ¯ , then U I needs to be replaced by a smaller function, due to the constraints (2) and (3).
As argued in [2], it is intuitive that U I be left monotonic. As argued above (and in [5]), it is also desirable that S I be left monotonic. The intuition for synergy is much less clear. In the following, we restrict our focus to the bivariate case and study the implications of requiring both S I and U I to be left monotonic. Proposition 1 gives bounds on the corresponding S I measure.
Proposition 1.
Suppose that S I , U I and C I define a bivariate information decomposition, and suppose that S I and U I are left monotonic. Then,
S I ( f ( X 1 , X 2 ) ; X 1 , X 2 ) I ( X 1 ; X 2 )
for any function f.
Before proving the proposition, let us make some remarks. Inequality (8) is related to the identity axiom. Indeed, it is easy to derive Inequality (8) from the identity axiom and from the assumption that S I is left monotonic. Although Inequality (8) may not seem counterintuitive at first sight, none of the information decompositions proposed so far satisfy this property (the function I from [12] satisfies left monotonicity and has been proposed as a measure of shared information, but it does not lead to a nonnegative information decomposition).
Proof. 
If S I is left monotonic, then
S I ( f ( X 1 , X 2 ) ; X 1 , X 2 ) S I C O P Y ( X 1 , X 2 ) ; X 1 , X 2 ) = I ( C O P Y ( X 1 , X 2 ) ; X 1 ) U I ( C O P Y ( X 1 , X 2 ) ; X 1 \ X 2 ) .
If U I is left monotonic, then
U I ( C O P Y ( X 1 , X 2 ) ; X 1 \ X 2 ) U I ( X 1 ; X 1 \ X 2 ) = I ( X 1 ; X 1 ) S I ( X 1 ; X 1 , X 2 ) .
Note that I ( X 1 ; X 1 ) = H ( X 1 ) = I ( C O P Y ( X 1 , X 2 ) ; X 1 ) and
S I ( X 1 ; X 1 , X 2 ) = I ( X 1 ; X 2 ) U I ( X 1 ; X 2 \ X 1 ) = I ( X 1 ; X 2 ) .
Putting these inequalities together, we obtain S I ( f ( X 1 , X 2 ) ; X 1 , X 2 ) I ( X 1 ; X 2 ) . ☐

6. Examples

In this section, we apply our construction to Williams and Beer’s measure, I min [3], and to the bivariate measure of shared information, S I ˜ , proposed in [1].
First, we make some remarks on how to compute the extractable information measure (under the assumption that one knows how to compute the underlying information measure itself). The optimization problem (4) is a discrete optimization problem. The search space is the set of functions from the support S of S to some finite set S . For the information measures that we have in mind, we may restrict to surjective functions f, since the information measures only depend on events with positive probabilities. Thus, we may restrict to sets S with | S | | S | . Moreover, the information measures are invariant under permutations of the alphabet S . Therefore, the only thing that matters about f is which elements from S are mapped to the same element in S . Thus, any function f : S S corresponds to a partition of S , where s , s S belong to the same block if and only if f ( s ) = f ( s ) , and it suffices to look at all such partitions. The number of partitions of a finite set S is the Bell number B | S | .
The Bell numbers increase super-exponentially, and for larger sets S , the search space of the optimization problem (4) becomes quite large. For smaller problems, enumerating all partitions in order to find the maximum is still feasible. For larger problems, one would need a better understanding about the optimization problem. For reference, some Bell numbers include:
n 3 4 6 10 B n 5 15 203 115975 .
As always, symmetries may help, and so in the Copy example discussed below, where | S | = 4 , it suffices to study six functions instead of B 4 = 15 .
We now compare the measure I ¯ min , an extractable version of Williams and Beer’s measure I min (see Equation (5) above), to the measure S I ˜ ¯ , an extractable version of the measure S I ˜ proposed in [1]. For the latter, we briefly recall the definitions. Let Δ be the set of all joint distributions of random variables ( S , X 1 , X 2 ) with given state spaces S , X 1 , X 2 . Fix P = P S X 1 X 2 Δ . Define Δ P as the set of all distributions Q S X 1 X 2 that preserves the marginals of the pairs ( S , X 1 ) and ( S , X 2 ) , that is,
Δ P : = Q S X 1 X 2 Δ : Q S X 1 = P S X 1 , Q S X 2 = P S X 2 , ( S , X 1 , X 2 ) Δ .
Then, define the functions
U I ˜ ( S ; X 1 \ X 2 ) : = min Q Δ P I Q ( S ; X 1 | X 2 ) , U I ˜ ( S ; X 2 \ X 1 ) : = min Q Δ P I Q ( S ; X 2 | X 1 ) , S I ˜ ( S ; X 1 \ X 2 ) : = max Q Δ P C O I Q ( S ; X 1 , X 2 ) , C I ˜ ( S ; X 1 \ X 2 ) : = I ( S ; X 1 \ X 2 ) min Q Δ P I Q ( S ; X 1 X 2 ) ,
where the index Q in I Q or C o I Q indicates that the corresponding quantity is computed with respect to the joint distribution Q. The decomposition corresponding to S I ˜ satisfies the Blackwell property and the identity axiom [1]. U I ˜ is left monotonic, but S I ˜ is not [2]. In particular, S I ˜ ¯ S I ˜ . S I ˜ can be characterized as the smallest measure of shared information that satisfies property (*). Therefore, S I ˜ ¯ is the smallest left monotonic measure of shared information that satisfies property (*).
Let X 1 = X 2 = { 0 , 1 } and let X 1 , X 2 be independent uniformly distributed random variables. Table 1 collects values of shared information about f ( X 1 , X 2 ) for various functions f (in bits).
The function f 1 : { 00 , 01 , 10 , 11 } { 0 , 1 , 2 } is defined as
f 1 ( X 1 , X 2 ) : = X 1 , if X 2 = 1 , 2 , if X 2 = 0 .
The Sum function is defined as f ( X 1 , X 2 ) : = X 1 + X 2 . Table 1 contains (up to symmetry) all possible non-trivial functions f. The values for the extractable measures are derived from the values of the corresponding non-extractable measures. Note that the values for the extractable versions differ only for C O P Y from the original ones. In these examples, I ¯ min = I min , but as shown in [5], I min is not left monotonic in general.

7. Conclusions

We introduced a new measure of shared information that satisfies the left monotonicity property with respect to local operations on the target variable. Left monotonicity corresponds to the idea that local processing will remove information in the target variable and thus should lead to lower values of measures which quantify information about the target variable. Our measure fits the bivariate information decomposition framework; that is, we also obtain corresponding measures of unique and synergistic information. However, we also have shown that left monotonicity for the shared information contradicts the Blackwell property of the unique information, which limits the value of a left monotonic measure of shared information for information decomposition.
We also presented an alternative interpretation of the construction used in this paper. Starting from an arbitrary measure of shared information S I (which need not be left monotonic), we interpret the left monotonic measure S I ¯ as the amount of shared information that can be extracted from S by local processing.
Our initial motivation for the construction of S I ¯ was the question to which extent shared information originates from the redundancy between the predictors X 1 and X 2 or is created by the mechanism that generated S. These two different flavors of redundancy were called source redundancy and mechanistic redundancy, respectively, in [4]. While S I ¯ cannot be used to completely disentangle source and mechanistic redundancy, it can be seen as a measure of the maximum amount of redundancy that can be created from S using a (deterministic) mechanism. In this sense, we believe that it is an important step forward towards a better understanding of this problem and related questions.

Acknowledgments

We thank the participants of the PID workshop at FIAS in Frankfurt in December 2016 for many stimulating discussions on this subject. Nils Bertschinger thanks H. C. Maucher for funding his position.

Author Contributions

The research was initiated by Jürgen Jost and carried out by all authors, with main contributions of Johannes Rauh and Pradeep Kr. Banerjee. The manuscript was written by Johannes Rauh, Pradeep Kr. Banerjee and Eckehard Olbrich. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Counterexample in Theorem 1

Consider the joint distribution
f ( s ) s x 1 x 2 P f ( S ) S X 1 X 2
00001/4
01011/4
00101/8
01101/8
12111/4
and the function f : { 0 , 1 , 2 } { 0 , 1 } with f ( 0 ) = f ( 1 ) = 0 and f ( 2 ) = 1 . Then, X 1 and X 2 are independent uniform binary random variables, and f ( S ) = A n d ( X 1 , X 2 ) . In addition, S f ( S ) X 1 is a Markov chain. By symmetry, the joint distributions of the pairs ( f ( S ) , X 1 ) and ( f ( S ) , X 2 ) are identical, and so the two channels f ( S ) X 1 and f ( S ) X 2 are identical, and, hence, trivially, one is a garbling of the other. However, one can check that the channel S X 1 is not a garbling of the channel S X 2 .
This example is discussed in more detail in Rauh et al. [13].

References

  1. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
  2. Rauh, J.; Bertschinger, N.; Olbrich, E.; Jost, J. Reconsidering unique information: Towards a multivariate information decomposition. Proceedings of 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 2232–2236. [Google Scholar]
  3. Williams, P.; Beer, R. Nonnegative Decomposition of Multivariate Information. arXiv, 2010; arXiv:1004.2515v1. [Google Scholar]
  4. Harder, M.; Salge, C.; Polani, D. A Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed]
  5. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of the European Conference on Complex Systems 2012, Brussels, Belgium, 2–7 September 2012; pp. 251–269. [Google Scholar]
  6. Griffith, V.; Koch, C. Quantifying Synergistic Mutual Information. In Guided Self-Organization: Inception; Prokopenko, M., Ed.; Springer: Berlin/Heidelberg, Germary, 2014; Volume 9, pp. 159–190. [Google Scholar]
  7. Wibral, M.; Priesemann, V.; Kay, J.W.; Lizier, J.T.; Phillips, W.A. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 2017, 112, 25–38. [Google Scholar] [CrossRef] [PubMed]
  8. Bell, A.J. The Co-Information Lattice. In Proceedings of the Fourth International Workshop on Independent Component Analysis and Blind Signal Separation (ICA 03), Nara, Japan, 1–4 April 2003; pp. 921–926. [Google Scholar]
  9. McGill, W. Multivariate information transmission. IRE Trans. Inf. Theory 1954, 4, 93–111. [Google Scholar] [CrossRef]
  10. Blackwell, D. Equivalent Comparisons of Experiments. Ann. Math. Stat. 1953, 24, 265–272. [Google Scholar] [CrossRef]
  11. Maurer, U.; Wolf, S. The intrinsic conditional mutual information and perfect secrecy. Proceedings of 1997 IEEE International Symposium on Information Theory, Ulm, Germany, 29 June–4 July 1997. [Google Scholar]
  12. Griffith, V.; Chong, E.K.P.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection Information Based on Common Randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef]
  13. Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N.; Wolpert, D. Coarse-graining and the Blackwell order. arXiv, 2017; arXiv:1701.07805. [Google Scholar]
Table 1. Shared information about f ( X 1 , X 2 ) for various functions f (in bits).
Table 1. Shared information about f ( X 1 , X 2 ) for various functions f (in bits).
f I min I ¯ min SI ˜ SI ˜ ¯
Copy110 1 / 2
And/Or 3 / 4 log 4 / 3 3 / 4 log 4 / 3 3 / 4 log 4 / 3 3 / 4 log 4 / 3
Xor0000
Sum 1 / 2 1 / 2 1 / 2 1 / 2
X 1 0000
f 1 1 / 2 1 / 2 00

Share and Cite

MDPI and ACS Style

Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N. On Extractable Shared Information. Entropy 2017, 19, 328. https://0-doi-org.brum.beds.ac.uk/10.3390/e19070328

AMA Style

Rauh J, Banerjee PK, Olbrich E, Jost J, Bertschinger N. On Extractable Shared Information. Entropy. 2017; 19(7):328. https://0-doi-org.brum.beds.ac.uk/10.3390/e19070328

Chicago/Turabian Style

Rauh, Johannes, Pradeep Kr. Banerjee, Eckehard Olbrich, Jürgen Jost, and Nils Bertschinger. 2017. "On Extractable Shared Information" Entropy 19, no. 7: 328. https://0-doi-org.brum.beds.ac.uk/10.3390/e19070328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop