Next Article in Journal
Thermal and Structural Analysis of Mn49.3Ni43.7Sn7.0 Heusler Alloy Ribbons
Next Article in Special Issue
Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases
Previous Article in Journal
Study on Droplet Size and Velocity Distributions of a Pressure Swirl Atomizer Based on the Maximum Entropy Formalism
Previous Article in Special Issue
The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Foundational Approach to Generalising the Maximum Entropy Inference Process to the Multi-Agent Context

School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
Submission received: 1 December 2014 / Revised: 10 December 2014 / Accepted: 13 January 2015 / Published: 2 February 2015
(This article belongs to the Special Issue Maximum Entropy Applied to Inductive Logic and Reasoning)

Abstract

:
The present paper seeks to establish a logical foundation for studying axiomatically multi-agent probabilistic reasoning over a discrete space of outcomes. We study the notion of a social inference process which generalises the concept of an inference process for a single agent which was used by Paris and Vencovská to characterise axiomatically the method of maximum entropy inference. Axioms for a social inference process are introduced and discussed, and a particular social inference process called the Social Entropy Process, or SEP, is defined which satisfies these axioms. SEP is justified heuristically by an information theoretic argument, and incorporates both the maximum entropy inference process for a single agent and the multi–agent normalised geometric mean pooling operator.

1. Introduction

In this introduction we describe briefly the context to the conceptual framework first sketched in [1], and which is developed further in the present work. In section 1.1 we explain how the present paper is structured, while in the remaining sections of the chapter we introduce some necessary background ideas and technical prerequisites. We also indicate in some places in this chapter details which can be omitted by readers who are only interested in some aspects of the present work.

1.1. Overall Structure

Intuitively a social inference process is just a general method for aggregating the partially defined probabilistic beliefs of a finite number of agents into a single probabilistic belief function. While the probabilistic beliefs of each individual agent are assumed to be consistent, it is not assumed that the union of the beliefs of any two or more agents is consistent.
The notion of a social inference process includes as special cases two much older, but quite distinct, concepts from probabilistic reasoning: the notion of a single agent inference process of [2] and [3], and the notion of a multi–agent discrete probabilistic pooling operator familiar from decision theory (see [4] or [5]). Both of the these older notions have been studied intensively from an axiomatic standpoint, with some considerable success, particularly in the case of inference processes.
One aim of this paper is to illustrate how the use of the axiomatic method applied to social inference processes can illuminate the study of particular examples of such processes. In particular it can illustrate how an initially attractive, but fundamentally ad hoc, definition of a social inference process may fail some quite basic desideratum. On the other hand the formalisation inherent in the axiomatic study of social inference processes may perhaps dissuade researchers from naively criticising a social inference process for failing to satisfy a combination of desiderata which cannot in fact be satisfied by any social inference process. There is an interesting historic parallel here with the case of (single agent) inference processes: the centre of mass inference process CM was well-known and popular 25 years ago for presumably pragmatic reasons, yet in [3] it was shown that it fails to satisfy some quite elementary desiderata such as Language Invariance, which had not previously been formulated. On the other hand at the time of the first rigorous axiomatic treatment of the notion of inference process in [2] and [3] the maximum entropy inference process ME was often criticised for its failure to satisfy a desideratum known as representation independence, a superficially attractive principle which Paris [3] showed with a simple proof to be incoherent, since it cannot be satisfied by any inference process. The historical point being made here is that had the axiomatic approach to inference processes been formulated earlier this would have spared extensive, but pointless, criticisms of ME on the grounds that it was “representation dependent”1.
The necessary background material and notation covering inference processes and pooling operators respectively is covered briefly in sections 1.2 and 1.3 below, while in section 1.4 the notion of a social inference process is formally introduced.
Chapter 2 is devoted to developing an axiomatic framework in order to capture the intended intuitive notion corresponding to the formal representation of a social inference process. This requires some considerable care in first formulating informally exactly what notion it is that we are trying to capture. We may then test the consequences of our subsequent axiomatic formalisation against our intuitions and experience in a process which may later be refined and iterated. Such an approach to the foundations of a mathematically tractable domain of thought is sometimes referred to by logicians and philosophers of mathematics as informal rigour2. Accordingly the first two sections of Chapter 2 are devoted to a detailed analysis of the heuristics and assumptions lying behind our approach. Although these sections are important in terms of justifying and explaining our methodology, they are not required for the formal development, and may therefore be omitted by readers who are only interested in the latter. In section 2.3 we develop a set of principles which we believe that any social inference process should satisfy on the basis of the assumptions explained in sections 2.1 and 2.2.
Chapter 3 is devoted to the particular social inference process SEP, the Social Entropy Process, first defined in [1]. In section 3.1, we formally define SEP, making clear the information theoretic intuitions behind the definition, and establishing certain structural properties, including the relationship of SEP to ME, minimum cross–entropy, and the normalised geometric mean pooling operator. A number of technical results are necessary to this development to ensure that it makes sense mathematically. The reader who wishes to skip these details on a first reading can glean the bare definition of SEP from definitions 8, 10 and 12.
In section 3.2 we prove that SEP satisfies all the principles formulated in Section 2.3. In section 3.3 we consider briefly certain other principles for an inference process resulting from possible generalisations of principles satisfied by ME.
Our definition of SEP in 3.1 proceeds in two stages. At the first stage the probabilistic information K from all the agents is merged by a natural and informationally conservative process Δ to form a non–empty closed convex set of probability functions Δ ( K ), which can be considered as the preferred set of possible probabilistic belief functions of the collective, or “collective knowledge base”. At the second stage the unique probability function from the set Δ ( K ) which has maximum entropy is chosen to be the definitive belief function of the collective, and is denoted by ME Δ ( K ). At first sight the necessity to impose such a second stage in order to extract a unique probabilistic belief function might seem like an ad hoc artifice. However in Chapter 4 we show that the second stage of the definition can be eliminated by imagining that the agents collectively appoint a new, unbiased, and self–effacing agent as a chairman, whose own personal belief function assigns equal probability to each possible outcome. The chairman then seeks to minimise her own influence by imagining that each of the other agents has been replaced by n clones where n large. If the chairman then calculates the first stage procedure for the entire virtual set of agents including herself, and lets n tend to infinity, the result converges to the same single probability function as that defined by SEP, thus eliminating any direct use of ME. The technical theorem corresponding to this result is stated and proved in 4.2.
In Chapter 5 we give a brief critical evaluation of our work, suggest directions for future research, and list a number of open problems.

1.2. Basic Concepts and Notation for Single Agent Probabilistic Inference

The framework and terminology which we introduce in this section is in essence that of Paris and Vencovská [2,3], which we will extend in section 1.4 to the multi–agent context.
In order to fix notation let At = {α1, α2, … αJ} denote some fixed finite set of mutually exclusive and exhaustive atomic events, or, as we prefer to think of them in a logical framework, atoms of some finite Boolean algebra of propositions. We shall refer to At = {α1, α2, … αJ} as atoms. A probability function w on At is a function w: At [0, 1] such that j = 1 J w ( α j ) = 1. Slightly abusing notation we will identify w with the vector of values < w1…wJ > where wj denotes w(αj) for j = 1…J. The set of all such vectors is denoted by D J. All other more complex events considered are equivalent to disjunctions of the αj and are represented by the Greek letters θ, ϕ, ψ etc. A probability function w is assumed to extend so as to take values on complex events in the standard way, i.e., for any θ
w ( θ ) = α j θ w ( α j )
where ⊨ denotes the classical notion of logical implication, and whenever a sentence θSL is not satisfiable we set w(θ) = 0. Conditional probabilities are defined in the usual manner
w ( θ | ϕ ) = w ( θ ϕ ) w ( ϕ )
when w(ϕ) ≠ 0 and are left undefined otherwise.
If some w D J represents the subjective belief of an individual A in the outcomes of At we refer to w as A’s belief function. We note that in this paper the use the term “belief function” will always denote a probability function in the above sense.
Remark 1. We should note that in the framework of Paris and Vencovská the atoms α1, α2,… αJ of At are usually taken to be the atoms of the Boolean (Lindenbaum) algebra generated by a finite language of the propositional calculus L = {p1pk} where the pi are the propositional variables. Thus up to logical equivalence the atoms are just the 2k sentences of the form
i = 1 k ± p i
where ±pi denotes either pi or ¬pi. In such a presentation J is 2k and so is necessarily a power of 2. More complex “events” are just sentences of the language, which by the disjunctive normal form theorem are logically equivalent to disjunctions of atoms. This addition of an extra semantic layer in the form of an underlying language L which generates the atoms has important conceptual advantages in the formulation and justification of certain natural principles such as the Language Invariance and Irrelevant Information principles of [3]. However since we shall only consider principles of this latter type in sections 3.3 and 5, we may otherwise assume that the mutually exclusive and exhaustive atoms α1, α2,… αJ are given a priori, rather than being generated as atoms of a propositional language L, and we may then allow J to take any positive integral value3.
The problematic of Paris and Vencovská is that of a single individual A whose belief function w is in general not completely specified, but whose set of beliefs is instead regarded as a set of constraints K on the possible values which the vector w may take. The constraint set K therefore defines a certain subregion of D J, denoted by VK, consisting of all vectors w D J which satisfy the constraints in K. In the special case when K is the empty set of constraints, the corresponding region VK is just D J itself. We say that K is consistent if VK ≠ ∅, and that w is consistent with K if w ∈ VK.
It is assumed that the constraint sets K which we consider are consistent, and are such that VK has pleasant geometrical properties. More precisely, the exact requirement on a set of constraints K is that the set VK forms a non-empty closed convex region of Euclidean space. Throughout the rest of this paper all constraint sets to which we refer will be assumed to satisfy this requirement, and we shall refer to such constraint sets as nice constraint sets. This formulation ensures that linear equality constraint conditions such as w(θ) = a, w(ϕ) = b w(ψ), and w(ψ | θ) = c, where a, b, c ∈ [0, 1] and θ, ϕ, and ψ are Boolean combinations of the αj’s, are all permissible in K provided that the resulting constraint set K is consistent. Here a conditional constraint such as w(ψ | θ) = c is interpreted as w(ψθ) = c w(θ) which is always a well-defined linear constraint, albeit vacuous when w(θ) = 0. See e.g. [3] for further details.
We should perhaps remark here that while we have allowed the notion of a nice set of constraints to include more general constraints which do not have the form of linear equalities of the type above, the philosophical justification for the approach which we develop in the present paper is most clearly applicable when the constraints have this form. This observation does not however affect in any way the validity of the formal mathematical results.
A nice set of constraints K as above is called a knowledge base. Where these constraints correspond to an individual A’s probabilistic beliefs, we say that A has knowledge base K. Note that if K1 and K2 are knowledge bases, then V K 1 K 2 = V K 1 V K 2, and that K1 ∪ K2 is also a knowledge base provided that it is consistent.
Paris and Vencovská ask the question: given that an individual A’s belief function is subject to the constraint set K, by what rational principles should A choose her belief function w consistent with K, in the absence of any other information?
A rule I which for every such K chooses such a w ∈ VK is called an inference process. Given K we denote the belief function w chosen by I by I ( K ). The question above can then be reformulated as: what self-evident general principles should an inference process I satisfy? This question has been intensively studied over the last twenty–five years, and much is known. In particular in [2], Paris and Vencovská found an elegant set of principles which uniquely characterise the maximum entropy inference process4.
ME, which is defined as follows: given K as above, ME(K) chooses that unique belief function w which maximises the Shannon entropy of w, defined as
j = 1 J w j log w j
subject to the condition that w ∈ VK. Although some of the principles used to characterise ME may individually be open to philosophical challenge, they are sufficiently convincing overall to give ME the appearance of a gold standard, in the sense that no other known inference process satisfies an equally convincing set of principles. Other popular inference processes which satisfy many, but not all, of these principles are the minimum distance inference process, MD, the limit centre of mass process, CM, all Renyi inference processes, and the Maximin process of [6]5. The Paris-Vencovská axiomatic characterisation of ME is particularly striking because it is quite independent of historically much earlier justifications of ME which stem either from ideas in statistical mechanics (see [79], or from axiomatic treatments of the concept of information itself (as in [1012]). While both of the latter kinds of treatment are conceptually attractive it might be argued that they carry more philosophical baggage than does a purely axiomatic treatment of the desiderata to be satisfied by an abstract notion of inference process.

1.3. Pooling Operators

An apparently very different framework of probabilistic inference, this time in the multi–agent context, has been much studied in the decision theoretic literature. Given the set of possible atoms At as before, let {Ai | i = 1…m} be a finite set of agents each of whom possesses her own particular probabilistic belief function w(i) on At, and let us suppose that these w(i) have already been determined. How then should these individual belief functions be aggregated so as to yield a single probabilistic belief function v which most accurately represents the collective beliefs of the agents? We call such an aggregated belief function a social belief function, and a general method of aggregation a pooling operator. Again we can ask: what principles should a pooling operator satisfy? In this framework various plausible principles have been investigated extensively in the literature, and have in particular been used to characterise two popular, but very different pooling operators LinOp and LogOp. LinOp takes v to be the arithmetic mean of the w(i)
v j = 1 m i = 1 m w j ( i ) for each j = 1 J
whereas LogOp chooses v to be the normalised geometric mean given by
v j = ( i = 1 m w j ( i ) ) 1 m k = 1 J ( i = 1 m w k ( i ) ) 1 m for each j = 1 J
Various continua of other pooling operators related to LinOp and LogOp have also been investigated. However the existing axiomatic analysis of pooling operators, while technically simpler than the analysis of inference processes, is also more ambiguous and perhaps less intellectually satisfying in its conclusions than the analysis of inference processes developed within the Paris-Vencovská framework; in the former case one arrives at rival, apparently plausible, axiomatic characterisations of various pooling operators, including in particular LinOp and LogOp, without any very convincing foundational criteria for deciding, within the limited context of the framework, which operator is justified, if any6. Strictly from a logician’s point of view LogOp has by far the most attractive invariance properties of pooling operators which have been studied, but it has one major drawback from the perspective of decision theory or AI: it allows a single agent to have a completely disproportionate influence over the social belief function in the case when the agent’s belief in some event is zero or close to zero. For this reason LogOp and its variants tend to be eschewed by decision theorists in favour of “softer” operators such as LinOp. We will argue in this paper that from a foundational point of view such pragmatism is misconceived. The solution to the conundrum lies rather in a deeper analysis of the semantics underlying the notion of a pooling operator. By embedding the concept of a pooling operator in the broader framework of social inference processes, we are able to see where the problem lies, and the outlines of possible solutions, a matter to which we return to in our concluding chapter.

1.4. The Multi-agent Problematic

In the present paper we seek to extend the Paris-Vencovská notion of inference process to the multi–agent case, thereby encompassing both the Paris-Vencovská framework of inference processes and the framework of pooling operators as special, or marginal, cases. To this end we consider, for any m ≥ 1, a set M consisting of m individuals A1Am, each of whom possesses her own nice set of constraints, respectively K1Km, on her possible belief function on the set of outcomes {α1, α2, … αJ}. (Note that we are only assuming here that the beliefs of each individual are consistent, not that the beliefs of different individuals are jointly consistent). We shall refer to such a set M of individuals as a college. The intuitive problem now is how the college M should choose a single belief function which best represents the totality of information conveyed by K1Km.
Definition 1. Let C denote a given fixed class of constraints sets. A social inference process for C is a function, F, which chooses, for any m ≥ 1 and constraint sets K1KmC, a probability function on At, denoted by F ( K 1 K m ), which we refer to as the social belief function defined by F acting on K1Km.
When considering general properties of unspecified social inference processes, we may not specify exactly what the class C is, but in general we shall always assume that C is a class of nice constraint sets.
Note that, trivially, provided that when m = 1 F ( K ) V K for all K C, F marginalises to an inference process. On the other hand, in the special case where K1Km are such that V K i is a singleton for all i = 1…m, then F marginalises to a pooling operator. The new framework therefore encompasses naturally as special cases the two classical frameworks described in sections 1.2 and 1.3 above.
Again we can ask: what principles would we wish such a social inference process F to satisfy in the absence of any further information? Is there any social inference process F which satisfies them? If so, to which inference process and to which pooling operator does such an F marginalise? It turns out that merely by posing these questions in the right framework, and by making certain simple mathematical observations, we can gain considerable insight.

2. An Axiomatic Framework for a Social Inference Process

2.1. Background Heuristics: Rational Norms for Collective Probabilistic Reasoning

Our approach to multi–agent probabilistic reasoning is both rational and normative: we are concerned with how an independent external chairman of a college of agents should by some objective process aggregate the probabilistic information declared to her by members of the college into an optimal single belief function, on the assumption that the chairman herself has no other information about the agents than that which they declare. However, in order to place ourselves in a position to formulate rational criteria for such a process to satisfy, we are compelled to make certain idealising assumptions analogous to those made in the classical treatment of inference processes in [2,3,13], but with a somewhat more complex analysis owing to the multi–agent context. We present three such assumptions in subsections 2.1.1, 2.1.2 and 2.1.3 below. The first two assumptions are close to those made in the classical framework of inference processes, but the third assumption is specific to the multi–agent context.
We stress that our approach in this paper is strictly foundational. We insist on the importance of the qualification above that the chairman, with whose viewpoint we identify, is given no further information than that stated in the problem. In particular the chairman knows nothing about the expertise or reliability of the agents, or about the independence of their opinions. Nor will we be concerned with limitations on computability. However the very fact that we make the qualification above forces us to clarify more precisely the idealising assumptions which the chairman must make about her relationship to the information provided by the agents.
In spite of the fact that the idealising assumptions are unrealistic in practice, in line with Chomsky’s criticism of the dominant methodology in artificial intelligence [14], we believe that this is the correct initial foundational approach if a general theory of multi–agent probabilistic inference is to have any chance of success. One should start from the simplest theoretical problematic; only when one has understood such a simple case does it make sense to try to deepen our understanding by progressively introducing other factors which make the problematic more realistic. In the present context, examples of such second level factors may be limitations on computational complexity, or the level of trust which we assign to the information conveyed by particular agents. The implications of taking into account the question of trust are discussed briefly in Chapter 5.1.

2.1.1. The Total Evidence Principle

Of crucial importance in our general problematic is the assumption above that all the relevant communicable probabilistic knowledge of an individual agent is incorporated in the given formal representation K of her probabilistic knowledge base. This or a similar assumption is sometimes referred to as the Principle of Total Evidence7. As was pointed out forcefully by Jaynes in his work justifying the use of maximum entropy inference, in order to avoid hopeless confusion, it is essential that an assumption of this kind be studiously respected in any formal study of the general axiomatic or logical characteristics of a mode of probabilistic inference: otherwise the intrinsic meaning of a formalised problem can be surreptitiously changed by sleight of hand, resulting in the generation of an inexhaustible supply of phony paradoxes or inconsistencies (cf. [7,8,15,16] ). However as pointed out by Adamčík and the author in [17] the practical exigencies demanded in the study of particular probabilistic problems arising from the real world have tended to result in a lack of attention being paid to more foundational studies which would require the total evidence principle to be taken seriously:
“…when applied to the formalisation of any real life problem considered by a human agent, the Principle of Total Evidence is never observed in practice. This banal fact of life has historically bedevilled theoretical discussion of probabilistic inference, because it is often extremely hard to give any real world example to illustrate an abstract principle of probabilistic inference without an opponent being tempted to challenge one’s reasoning using implicit or intuitive background information concerning the example, which has not been included in its formal representation. In the context of multi–agent probabilistic inference this situation has resulted in a heavy concentration of research on computationally pragmatic approaches to specialised problems of probabilistic inference, and a notable neglect of the study of more abstract axiomatic or foundational frameworks. This neglect appears to the authors to be unfortunate, not least because the foundations of artificial intelligence would seem to demand that the Principle of Total Evidence be taken seriously.”
Note that the Total Evidence Principle is also assumed in the justification of the classical inference process framework of [3].

2.1.2. Assumption of Logical and Computational Closure

We assume that there are no restrictions on the ability of individual agents to calculate the probabilistic consequences of any given constraint set K, and that consequently there is no essential semantic difference between the status of the probabilistic knowledge represented by K and that represented by its representation VK in Euclidean space. Consequently if K and K′ are constraint sets such that VK = VK we shall regard them as equivalent knowledge bases from the point of view of any agent. Under this assumption we may therefore informally identify an agent’s knowledge base K with its representation VK. Notice that under this assumption an agent will be aware whether or not a set of constraints is consistent, and from this point of view our previously stated requirement that a knowledge base be consistent seems reasonable. Of course, as is well known, unaided individual human agents’ assessments are notoriously inconsistent in practice [18], and furthermore if we assume that PNP then the calculations which are required for the present assumption are in general infeasible (cf. Chapter 10 of [3]). Nevertheless, as in the case of inference processes, this does not diminish the value of our assumption as a normative tool.

2.1.3. The Intersubjectivity Assumption

For the rest of this paper we will assume for ease of exposition that the college M appoints an independent chairman A0, whom we may suppose to be a mathematically trained philosopher, and whose only task is to aggregate the knowledge bases of the agents in the college into a social belief function v according to strictly rational criteria, but ignoring any personal beliefs which A0 herself may hold.
The Intersubjectivity Assumption states that in performing the function above the chairman treats the knowledge base provided by each agent as if it represented intersubjective probabilistic information8. By this we mean that whatever the unknown background observations or introspections might be from which any particular agent Ai’s knowledge base Ki arises, the process by which Ki arose is assumed to be in conformity with the laws of probability, and intersubjective in the sense that any other agent with exactly the same background information and experience as, say, Ai would arrive at a set of constraints equivalent to Ki. The fact that the union of the knowledge bases K1 and K2 of respective agents A1 and A2 may be inconsistent does not in any way contradict this assumption, since the limited background observations or introspections of each agent may be different, which may result in the agents’ probabilistic assessments being incompatible. While the intersubjectivity assumption might seem grossly unrealistic, particularly in the case of human agents, it is nevertheless a valuable idealisation, not least because it helps to identify which features of an agent’s possible relation to her reported information we are not taking into account.
The information which the agents report is thus taken at face value and treated as if it were totally trustworthy by chairman A0, even though the chairman may recognise that such trust is not merited. While this fact of itself indicates one of the principal limitations of the initial framework of rational collective reasoning which we are attempting to formulate, as we will outline in Chapter 5 it also suggests natural ways in which a notion of degree of trust could later be incorporated into the framework, thus mitigating the effects of this limitation. Incorporating such a notion would allow a social inference process to accommodate the less than complete trust which a chairman might actually hold in the information provided by individual agents.

2.2. Towards a Framework for Rational Collective Probabilistic Reasoning

While particular examples of special social inference processes can be found in many places in the literature (see e.g. [1925]), the abstract idea of a social inference process was first formulated in [1] and has been recently studied further in [17,26,27] where the properties of a number of different social inference processes are considered. However most of the earlier work published on particular social inference processes has, with few exceptions, been pragmatically motivated, and has not considered broader foundational questions or logical justifications. This is due in some cases to a concern to find a computationally practical solution to a more specialised problem, and in other cases to a tempting reductionism, which would see the problem of finding a social inference process as a two stage process in which a favored classical inference process I is first chosen and applied to the constraints Ki of each agent i to yield a belief function w(i) appropriate to that agent, and a preferred pooling operator is then applied to the set of w(i) to yield a social belief function. Following the terminology of Adamčík [26] we shall call a social inference process which has this special reductionist form obdurate. Of course from a reductionist point of view the concept of a social inference process is not particularly interesting foundationally, since we could hardly expect an analysis of such social inference processes to tell us anything fundamentally new about collective probabilistic reasoning9. A notable exception to such approaches is found in the work of Williamson [28], which offers a detailed philosophical analysis of the principles underlying the merging of probabilistic evidence from an objective Bayesian perspective, which is not reductionist in the sense above, but which is somewhat different from our own10.
Our approach here is radically non-reductionist. We reject the two stage approach above on the grounds that the classical notion of an inference process applies to an isolated single individual, and is valid only on the assumption that that individual has absolutely no knowledge or beliefs other than those specified by her personal constraint set. Indeed the preliminary point should be made that in the case of an isolated individual A, whereas A’s constraint set K is subjective and personal to that individual, the actual passage from K to A’s assumed belief function w via an inference process should be made using rational or normative principles, and should therefore be considered to have an objective character. Nor should we confuse the epistemological status of w with that of K. By hypothesis K represents the sum total of A’s beliefs; ipso facto K also represents, in general, a description of the extent of A’s ignorance. While w may be regarded as the belief function which best represents A’s subjective beliefs, it must not be confused with those beliefs themselves, since in the passage from K to w it is clear that certain “information” has been discarded11; thus, while w is determined by K once an inference process is given and applied, neither K nor VK can be recaptured from w. As a trivial example we may note that specifying that A’s constraint set K is empty, i.e., that A claims total ignorance, is informationally very different from specifying that K is such that V K = { < 1 J , 1 J 1 J > }, although the application of ME, or of any other reasonable inference process, yields w = < 1 J , 1 J 1 J > in both cases. This example of an agent who is totally ignorant has an illustrative force which we return to later.
From this point of view the situation of an individual who is a member of a college whose members seek to elicit an optimal “social” belief function to best represent the belief of the collective seems quite different from that of an isolated individual. Indeed in the collective context it appears more natural to assume as a normative principle that, if the social belief function is to be optimal, then each individual member Ai should be deemed to choose her personal belief function w(i) so as to take account of the information provided by the other individuals, in such a way that w(i) is consistent with her own knowledge base Ki, while being informationally as close as possible to the social belief function F ( K 1 K m ) which is to be defined. We will show in chapter 3 that this suggestive, but imprecise, idea can be made mathematically coherent, and can be used to define a particular social inference process with pleasing properties. Notice however that it is not necessary to assume that a given Ai subjectively or consciously holds the particular personal belief function w(i) which is attributed to her by the procedure above: such an w(i) is viewed as nothing more than the belief function which Ai ought rationally to hold, given the personal knowledge base Ki which represents her own beliefs, together with the extra information which would be available to her if she were to be made aware of the knowledge bases of the remaining members of the college. Just as in the case of an isolated individual, the passage from Ai’s actual subjective belief set Ki to her notional subjective belief function w(i) has an intersubjective or normative character: however the calculation of w(i) now depends not only on Ki but on the knowledge bases of all the other members of the college.
Considerations similar to the above also give rise to an important general principle which we believe a social inference process should satisfy, which we will call Collegiality. In the next section we shall introduce this principle together with some other desiderata for a social inference process to satisfy. The latter are either natural symmetry principles or fairly straightforward generalisations of familiar desiderata from the Paris-Vencovská framework of inference processes.

2.3. Desiderata for a Social Inference Process

The Equivalence Principle
If for all i = 1 … m V K i = V K i then
F ( K 1 K m ) = F ( K 1 K m )
Otherwise expressed the Equivalence Principle states that substituting constraint sets which are equivalent in the sense that the set of belief functions which satisfy them is unchanged will leave the values of invariant. This principle is a familiar one adopted from the theory of inference processes (cf. [3]), and is in line with our assumption in section 2.1.2. In this paper we shall always consider only social inference processes (or inference processes) which satisfy the Equivalence Principle. For this reason we may occasionally allow a certain sloppiness of notation in the sequel by identifying a constraint set K with its set of solutions VK where the meaning is clear and this avoids an awkward notation. In particular if Δ is a non-empty closed convex set of belief functions then we may write ME(Δ) to denote the unique w ∈ Δ which maximises the Shannon entropy function.
The Anonymity Principle
For any permutation σ of 1, …, m
F ( K σ ( 1 ) K σ ( m ) ) = F ( K 1 K m )
A consequence of the above principle is that F ( K 1 K m ) depends only on the multiset of knowledge bases { K 1 K m } and not on the order in which the Ki’s are listed.
The following natural principle ensures that F does not choose a belief function which violates the beliefs of some member of the college unless there is no alternative. The principle also ensures that F behaves like a classical inference process in the special case when m = 1.
The Consistency Principle
If K1…Km are such that
i = 1 m V K i
then
F ( K 1 K m ) i = 1 m V K i
Let σ denote a permutation of the atoms of At. Such a σ induces a corresponding permutation on the coordinates of probability distributions <w1wJ>, and on the corresponding coordinates of variables occurring in the constraints of constraint sets Ki, which we denote below with an obvious notation. The following principle is again a familiar one satisfied by classical inference processes (see [3]):
The Atomic Renaming Principle
For any permutation σ of the atoms of At, and for all K1Km
F ( σ ( K 1 ) σ ( K m ) ) = σ ( F ( K 1 K m ) )
The following principle is characteristic of the non-reductionist approach which we described in section 2.2:
The Collegiality Principle
A social inference process F satisfies the Collegiality Principle (abbreviated to Collegiality) if for any m ≥ 1 and A1Am with respective knowledge bases K1Km, if for some k < m F ( K 1 K k ) is consistent with Kk+1Kk+2 ∪ … ∪ Km, then
F ( K 1 K m ) = F ( K 1 K k )
Collegiality may be interpreted as stating the following: if the social belief function v generated by some subset of the college is consistent with the individual beliefs of the remaining members, then v is also the social belief function of the whole college. The following immediate consequence of collegiality is worth a special mention:
Corollary 2 (The Ignorance Principle). For any m ≥ 1 and all knowledge bases K1Km
F ( K 1 K m ) = F ( K 1 K m , )
where ∅ denotes the knowledge base with empty set of constraints.
Proof. This follows at once from the collegiality principle.
The ignorance principle just states that adding to the college a new agent who declares that she has no probabilistic knowledge concerning At will leave the social belief function unchanged. The ignorance principle is of interest firstly because it seems particularly hard to challenge, and secondly because it seems to encapsulate the essence of the difference in information between an agent asserting that she has empty knowledge base, and the same agent asserting that her knowledge base is α 1 = α 2 = = α J = 1 J. Indeed this observation leads to the conclusion that obdurate social inference processes have serious credibility problems, since any obdurate F which satisfies atomic renaming must either fail to satisfy the ignorance principle or else must marginalise to a pooling operator with pathological behaviour. In particular the social inference process of Kern-Isberner and Rödder defined12 in [23] is easily shown not to satisfy the ignorance principle. Furthermore in [29] Adamčík shows that a very large class of obdurate social inference processes, including that of [23] cannot satisfy the consistency principle either13.
The consistency and collegiality principles together immediately imply that F satisfies the following unanimity property:
Lemma 3 (Unanimity Principle). If F satisfies Consistency and Collegiality then for any K
F ( K K ) = F ( K ) .
Proof. Immediate from definitions. □
Our next axiom goes to the heart of certain basic intuitions concerning probability. For expository reasons we will consider first the case when m = 1, in which case we are essentially discussing a principle to be satisfied by a classical inference process. First we introduce some fairly obvious terminology.
Let w denote A1’s belief function. (Since we are considering the case when m = 1 we will drop the superscript from w(1) for ease of notation). For some non-empty set of atoms { α j 1 α j t } let ϕ denote the event V r = 1 t α j r. Suppose that K denotes a set of constraints on the variables w j 1 w j t which defines a non-empty closed convex region of t-dimensional Euclidean space with r = 1 t w j r 1 and all w j r 0. We shall refer to such a K as a nice set of constraints about ϕ. Such a set of constraints K may also be thought of as a constraint set on the w which determines a closed convex region VK of D J defined by
V K = { w D J | < w j 1 w j t > satisfies K } .
Now let w ^ r denote w ( α j r | ϕ ) for r = 1 … t, with the w ^ r undefined if w(ϕ) = 0. Then w ^ = < w ^ 1 w ^ t > is a probability distribution provided that w(ϕ) ≠ 0. Let K be a nice set of constraints on the probability distribution w ^: we shall refer to such a K as a nice set of constraints conditioned on ϕ. In line with our previous conventions we shall consider such K to be trivially satisfied in the case when w(ϕ) = 0.
Again an important point here is that while a nice set of constraints K conditioned on ϕ as above is given as a set of constraints on w ^ it can equally well be interpreted as defining a certain equivalent set of constraints on w instead, and it is easy to see that, with a slight abuse of notation, the corresponding region VK of D J defined by
V K = { w | w ^ satisfies K }
is both convex and closed.
In what follows we may regard both a nice set of constraints conditioned on some event ϕ, and a nice set of constraints about some event ϕ, as if they defined constraints on the probability function w, as explained above.
Notice that while a nice set of constraints conditioned on ϕ can say nothing about the value of belief in ϕ itself, a nice set of constraints about ϕ may do so, and may even fix belief in ϕ at a particular value.
The following principle captures a basic intuition about probabilistic reasoning which is valid for all standard inference processes:
The Locality Principle (for an Inference Process)
An inference process I satisfies the locality principle if for all sentences ϕ and θ, every nice set of constraints K conditioned on ϕ, and every nice set of constraints K* about ¬ϕ,
I ( K K * ) ( θ | ϕ ) = I ( K ) ( θ | ϕ )
provided that I ( K K * ) ( ϕ ) 0 and I ( K ) ( ϕ ) 0
Let us refer to the set of all events which logically imply the event ϕ as the world of ϕ. Then the Locality Principle may be roughly paraphrased as saying that if K contains only information about the relative size of probabilistic beliefs about events in the world of ϕ, while K* contains only information about beliefs concerning events in the world of ¬ϕ, then the values which the inference process I calculates for probabilities of events conditioned on ϕ should be unaffected by the information in K*, except in the trivial case when belief in ϕ is forced to take the value 0. Put rather more more succinctly: beliefs about the world of ¬ϕ should not affect beliefs conditioned on ϕ. Note that we cannot expect to satisfy a strengthened version of this principle which would have belief in the events in the world of ϕ unaffected by K* since the constraints in K* may well affect belief in ϕ itself. Thus the Locality Principle asserts that, ceteris paribus, rationally derived relative probabilities between events inside a “world” are unaffected by information about what happens strictly outside that world.
The Locality Principle is in essence a combination of both the Relativisation Principle14 of Paris [3] and the Homogeneity Axiom of Hawes [6]. The following theorem, which demonstrates that the most commonly accepted inference processes all satisfy Locality, is very similar to results proved previously, especially to results in [6]. It follows from the theorem below that if we reject the Locality Principle for an inference process, then we are in effect forced to reject not just ME, but also all currently known plausible inference processes, including all inference processes derived by maximising a generalized notion of entropy. This is an important point heuristically when we come to extend the Locality Principle to the multi–agent case15.
Theorem 4. The inferences processes ME, CM, MD (minimum distance), together with all Renyi inference processes16, and the Maximin inference process of [6], all satisfy the Locality Principle.
Proof. Let F be a real valued function defined on the domain
J + [ 0 , 1 ] J
by
F ( w ) = j = 1 J f ( w j )
for some function f : [0, 1] ℝ.
We will say that F is deflation proof if for every J ∈ ℕ+, all w, u D J, and every λ ∈ (0, 1)
F ( λ w ) < F ( λ v ) if and only if F ( w ) < F ( v )
Here λw denotes the scalar multiplication of w by λ. Note that λw will not be a vector in D J in the above case since its coordinates sum to λ instead of 1.
We will see below that any inference process I such that I ( K ) is defined to be that point vVK which maximises a strictly convex deflation proof function F of the above form satisfies the locality principle.
We first note the following lemma:
Lemma 5. The inference processes listed in the statement of Theorem 4, with the exception of CM and Maximin, may all be defined by the maximisation of deflation proof strictly convex functions of the form (1) above.
Proof. The inference process ME is defined by maximising
F ( w ) = j = 1 J w j log w j
subject to the given constraints. Now for w D J
F ( λ w ) = j = 1 J λ w j log λ w j = λ log λ + λ F ( w )
from which (2) follows at once.
The Renyi inference process RENr, where r is a fixed positive real parameter not equal to 1, is given by maximising the function
F ( w ) = j = 1 J ( w j ) r
for wVK in the case when r > 1, and by maximising
F ( w ) = j = 1 J ( w j ) r
for wVK in the case when 0 < r < 1.
Since for the above functions F (λw) = λr F (w), they also trivially satisfy (2) and so are deflation proof. Note that the minimum distance inference process MD is just REN2. The functions F defined above are all strictly convex (see e.g. [3])) and so the lemma follows.
Returning to the main proof, let I be an inference process such that I ( K ) is defined by the maximisation of a deflation proof strictly convex function F of the form as in (1) above. Let ϕ θ, K, and K* be as in the statement of the locality principle. Without loss of generality we may assume for notational convenience that the atoms are so ordered that for some k with 1 ≤ k < J
ϕ j = 1 k α j and ¬ ϕ j = k + 1 J α j
Let u = I ( K ) and let v = I ( K K * ). Let u(ϕ) = a and let v(ϕ) = b. By hypothesis we know that a and b are non-zero. It suffices for us to show that
< v 1 b v k b > = < u 1 a u k a >
Now notice that since the constraints of K* refer only to coordinates k + 1 … J while the constraints of K refer only to coordinates 1 … k, the solution v which by definition maximizes j = 1 J f ( w j ) subject to the condition that w ∈ VK∪K*, must also satisfy the condition that <v1vk> is that vector <w1wk> which maximizes j = 1 k f ( w j ) subject to < w 1 b w k b > satisfying the constraints of K together with the constraint that j = 1 k w j = b. Now changing variables by setting y j = w j b with y = < y1yk> this is equivalent to maximizing
F ( b y ) = j = 1 k f ( b y j )
subject to y D k and y satisfying the constraints of K. However since F is deflation proof (and strictly convex) the unique y D k which achieves this maximisation does not depend on b and by setting b = 1 we see that it is just the unique vector y D k maximising F (y) and satisfying the constraints in K. Since this definition is independent of both K* and b, it follows by replacing K* by the empty set of constraints and b by a that equation (3) holds, which completes the proof for the case of inference processes defined by the maximisation of a deflation proof strictly convex function of the form (1) above. By lemma 5 the theorem follows for all the inference processes mentioned except for CM and Maximin.
The fact that the limit centre of mass inference process, CM, satisfies locality may either be proved using the standard definition of CM in [3], and slightly modifying the idea of the proof above, or simply by observing that by a result of Hawes [6], for any knowledge base K
CM ( K ) = lim r 0 + REN r ( K )
and then applying the results above already proved for RENr.
The result for Maximin also follows easily from results in Hawes [6]. This completes the proof of Theorem 4.
While Theorem 4 above merely provides very strong corroborating evidence in favour of accepting the Locality Principle for an inference process, an interesting aspect of the intuition underlying the principle is that the justification for it appears no less cogent when we attempt to generalise it to the context of a social inference process. If we accept the intuition in favour of the Locality Principle in the case of a single individual then it is hard to see why we should reject analogous arguments in the case of a social belief function which is derived by considering the beliefs of m individuals each of whom has knowledge bases of the type considered above. The argument is a general informational one: if information about probabilities conditioned on ϕ is unaffected by information about the world of ¬ϕ, then, ceteris paribus, this should be true regardless of whether the information is obtained from one agent or from many agents. Accordingly we may formulate more generally
The General Locality Principle (for a social inference process F)
For any m ≥ 1 let M be a college of m individuals A1Am. If for each i = 1…m Ki is a nice set of constraints conditioned on ϕ, and K i * is a nice set of constraints about ¬ϕ, then for every event θ
F ( K 1 K 1 * , , K m K m * ) ( θ | ϕ ) = F ( K 1 , , K m ) ( θ | ϕ )
provided that F ( K 1 K 1 * , , K m K m * ) ( ϕ ) 0 and F ( K 1 , K m ) ( ϕ ) 0.
At this point we make a simple observation. In the very special marginal case when for each i the knowledge bases K i K i * are such as to completely determine Ai’s belief function, so that the task of F reduces to that of a pooling operator, the locality principle above reduces to a condition closely related to the well-known condition on a pooling operator that it be externally Bayesian17. We will not discuss this further here except to note the important point that if F is taken to satisfy General Locality, then this fact alone seriously restricts those pooling operators to which it is possible for F to marginalise. Thus while LogOp satisfies the relevant cases of General Locality, as follows from Theorem 14 below, the popular pooling operator LinOp does not do so. The following provides a simple counterexample:
Example 1 (Counterexample to General Locality for LinOp).
Proof. Let J = 3, let θ = α1α2 and let
K 1 = { w ( α 1 | θ ) = 2 3 } and K 1 * = { w ( ¬ θ ) = 1 4 } K 2 = { w ( α 1 | θ ) = 1 3 } and K 2 * = { w ( ¬ θ ) = 5 6 }
Then the unique belief function satisfying K 1 K 1 * is w ( 1 ) = < 1 2 , 1 4 , 1 4 > while the unique belief function satisfying K 2 K 2 * is w ( 2 ) = < 1 18 , 1 9 , 5 6 >.
Applying LinOp we obtain
LinOp ( K 1 K 1 * , K 2 K 2 * ) ( α 1 | θ ) = 20 33
If we now set
K 1 * * = { w ( ¬ θ ) = 3 4 } and K 2 * * = { w ( ¬ θ ) = 1 2 }
then the unique belief function satisfying K 1 K 1 * * is w ( 1 ) = < 1 6 , 1 12 , 3 4 > while the unique belief function satisfying K 2 K 2 * * is w ( 2 ) = < 1 6 , 1 3 , 1 2 >.
Applying LinOp gives
LinOp ( K 1 K 1 * * , K 2 K 2 * * ) ( α 1 | θ ) = 4 9 LinOp ( K 1 K 1 * , K 2 K 2 * ) ( α 1 | θ )
showing that General Locality fails for any F which marginalises to the pooling operator LinOp. By contrast it is easily verified that
LinOp ( K 1 K 1 * * , K 2 K 2 * * ) ( α 1 | θ ) = 1 2 = LinOp ( K 1 K 1 * , K 2 K 2 * ) ( α 1 | θ )
as expected.
Related facts concerning LinOp and LogOp have been widely noted in the literature on pooling operators; what we are noting that is new here is that arguments in favour of the General Locality Principle in the far broader context of a social inference process give a quite new perspective on the relative acceptability of classical pooling operators such as LogOp and LinOp.
Our final axiom relates to a hypothetical situation where several exact copies of a college are amalgamated into a single college.
A clone of a member Ai of M is a member Ai whose set of belief constraints on her belief function is identical to that of Ai: i.e., Ki = Ki. Suppose now that each member Ai of M is replaced by n clones of Ai, so that we obtain a new college M* with nm members. M* may equally be regarded as k copies of M amalgamated into a single college; so since the social belief function associated with each of these copies of M would be the same, we may argue that surely the result of amalgamating the copies into a single college M* should again yield the same social belief function.
For any knowledge base K let nK stand for a a sequence of n copies of K. Then the heuristic argument above generates the following:
The Proportionality Principle
For any integer n ≥ 1
F ( n K 1 , n K 2 , , n K m ) = F ( n K 1 , K 2 , , K m )
Notice that for the single agent case m = 1 this principle reduces to the Unanimity Principle 3. The Proportionality Principle looks rather innocent. Nevertheless we shall see in Theorem 17 of chapter 4 that a slight variant of the same idea, formulated as a limiting version, has some unexpected consequences.

3. The Social Entropy Process SEP

3.1. Definition of SEP

In this section we introduce a natural social inference process, SEP, which extends both the inference process ME and the pooling operator LogOp. Our heuristic derivation of SEP will be purely information theoretic. We prove certain important structural properties necessary to show that SEP is well-defined, and we show in Theorem 14 that SEP satisfies the seven principles introduced in the previous section.
In order to avoid problems with our definition of SEP however, we are forced to add a slight further restriction to the set of m knowledge bases K1Km which respectively represent the beliefs sets of the individuals A1Am. We assume in this section that the constraints are such that there exists at least one atom α j 0 such that no knowledge base Ki forces α j 0 to take belief 0. In the special case when each Ki specifies a unique probability distribution, the condition corresponds to that necessary to ensure that LogOp is well-defined.
In order to motivate the definition of SEP heuristically, let us consider again the task of the college chairman A0. Following the reasoning elaborated in sections sections 2.1.3 and 2.2, A0 decides that as an initial criterion she will choose a social belief function v = <v1vJ> in such a manner as to minimize the average informational distance between <v1vJ> and the m belief functions w ( i ) = < w 1 ( i ) w J ( i ) > of the members of M, where the w(i) are all simultaneously chosen in such a manner as to minimize this quantity subject to the relevant sets of belief constraints Ki of the members of the college.
The standard measure of informational distance between probability distributions v and u is the well-studied notion of Kullback-Leibler divergence [30], sometimes known as cross-entropy, given by
KL ( v , u ) = j = 1 J v j log v j u j
where the convention is observed that v j log v j u j takes the value 0 if vj = 0 and the value +∞ if vj ≠ 0 and uj = 0.
We recall that Kullback-Leibler divergence is not a symmetric function; intuitively in the context of updating for a single agent KL(v, u) represents the informational distance from old belief function u to new belief function v. Using this notion of informational distance A0’s idea is therefore to choose v and w(1)w(m) with each w(i) satisfying Ki, so as to minimize
1 m i = 1 m KL ( v , w ( i ) )
We will see below that, while such a procedure will not by itself always produce unique belief functions for v and the associated w(1)w(m), the set of possible belief functions satisfying these criteria has both a pleasant characterisation and a tight mathematical structure.
A fundamental property of Kullback-Leibler divergence which we shall need is
Lemma 6 (Gibbs Inequality). For all belief functions v and u
KL ( v , u ) 0
with equality holding if and only if v = u.
Proof. See [30] or [3].
The next lemma allows us to express A0’s criterion above in a much more convenient mathematical form.
Lemma 7. Let K1Km be constraint sets on belief functions w(1)w(m) respectively. Then the following are equivalent:
  • The belief functions v, w(1),…w(m) minimize the quantity
    1 m i = 1 m KL ( v , w ( i ) )
    subject to the given constraints.
  • The belief functions w(1)w(m) maximize the quantity
    j = 1 J [ i = 1 m w j ( i ) ] 1 m
    subject to the given constraints, and
    v j = [ i = 1 m w j ( i ) ] 1 m j = 1 J [ i = 1 m w j ( i ) ] 1 m
    for all j = 1 … J.
Proof. We note first that by our assumptions concerning the constraint sets, the minimum value of (4) must be finite. For by assumption there exists some j0 and some u ( i ) V K i such that u j 0 ( i ) 0 for all i = 1 … m; then by replacing each w(i) by u(i) and setting vj0 = 1 and all other vj equal to zero gives (4) a finite value. From this it follows that for any j if vj is non-zero then w j ( i ) is non-zero for all i = 1 … m. Thus we can rewrite (4) as
j = 1 J v j log v j [ i = 1 m w j ( i ) ] 1 m
or, equivalently, as
j = 1 J v j log v j ( [ i = 1 m w j ( i ) ] 1 m j = 1 J [ i = 1 m w j ( i ) ] 1 m ) log j = 1 J [ i = 1 m w j ( i ) ] 1 m
which, by the Gibbs inequality, will for any given w(1) … w(m) take its minimum value when the first term vanishes and v is given by the expression at (6). On the other hand the second term is minimized when j = 1 J [ i = 1 m w j ( i ) ] 1 mis maximized. It follows that the minimum possible value of (4) is obtained by first maximizing j = 1 J [ i = 1 m w j ( i ) ] 1 msubject to the constraints, and then letting v be determined by the equation (6). □
The above lemma shows that Chairman A0’s initial criterion for selecting appropriate v for consideration as the social belief function can be reduced to the problem of finding those sequences of belief functions w(1) … w(m) which maximize j = 1 J [ i = 1 m w j ( i ) ] 1 m , subject to each w(i) satisfying the relevant set of constraints Ki. Notice that the function being maximized above is just a sum of geometric means. Since this function is bounded and continuous and the space over which it is being maximized is by assumption closed, a maximum value is certainly attained.
In order to make our presentation more readable we shall in future abbreviate K1 Km by K .
Definition 8. For a sequence of knowledge bases K we define
M K = M a x { j = 1 J [ i = 1 m w j ( i ) ] 1 m } | w ( i ) V K i f o r a l l i = 1 m
It is now easy to see that
Lemma 9. Given knowledge bases K1 Km and M K defined as above then 0 < M K 1. Furthermore the value M K = 1 occurs if and only if for every j = 1 … J and for all i, i′ϵ {1 … m} w j ( i ) = w j ( i ) . Hence given K1 Km the following are equivalent:
  • M K = 1
  • Every w(1) … w(m) which generates the value M K satisfies w(1) = = w(m) = v.
  • The knowledge bases K1 Km are jointly consistent: i.e., there exists some belief function which satisfies all of them.
Proof. Let w(1) … w(m) be belief functions satisfying K1 Km respectively, and which generate the value M K . First note that by assumption for some j0 no Ki forces the probability given to atom αj0 to be zero, and hence M K > 0 , since it is possible to choose belief functions u(1) u(m) respectively consistent with K1 Km such that [ i = 1 m u j 0 ( i ) ] 1 m > 0.
Now by applying the arithmetic-geometric mean inequality m times we get
M K = j = 1 J [ i = 1 m w j ( i ) ] 1 m j = 1 J 1 m i = 1 m w j ( i ) = 1 since i = 1 m j = 1 J w j ( i ) = m .
Moreover since equality for any of the arithmetic-geometric mean inequalities occurs just when all the terms are equal, the case M K = 1 occurs if and only if w(1) = w(1) = = w(m) = v. This suffices to prove the lemma. □
Now it is obvious from the above that Chairman A0’s proposed method of choosing v will not in general result in a uniquely defined social belief function. Indeed if i = 1 m V K i ϕ then any point w in this intersection, if adopted as the belief function of each member, will generate the maximum possible value for M K of 1 and so will be a possible candidate for a social belief function v. Moreover even if i = 1 m V K i = ϕ the process above may not result in a unique choice of either the w(i) or of v.
Chairman A0 now reasons as follows: if the result of the above operation of minimizing the average Kullback-Leibler divergence does not result in a unique solution for v, then the best rational recourse which she has left is to choose that v which has maximum entropy from the set of possible v previously obtained, assuming of course that such a choice is well-defined. Chairman A0 reasons that by adopting this procedure she is treating the set of v defined by minimizing the average Kullback-Leibler divergence of v with possible belief functions of college members as if that were the set of her own possible belief functions, and then choosing a belief function from that set by applying the ME inference process, as she would if that were indeed the case.
However in order to show that this procedure is well-defined, Chairman A0 needs to prove certain technical results.
Definition 10. For knowledge bases K we define
Γ ( K ) = { < w ( 1 ) w ( m ) > i = 1 m V K i | j = 1 J [ i = 1 m w j ( i ) ] 1 m = M K }
By Lemma 7, each point < w(1) … w(m) > in Γ ( K ) gives rise to a uniquely determined corresponding social belief function v whose j’th coordinate is given by
v j = 1 M K [ i = 1 m w j ( i ) ] 1 m
We will refer to the v thus obtained from <w(1) … w(m) > as
LogOp ( w ( 1 ) w ( m ) )
and we let
Δ ( K ) = { LogOp ( w ( 1 ) w ( m ) ) | < w ( 1 ) w ( m ) > Γ ( K ) }
Δ ( K ) is thus the candidate set of possible social belief functions from which Chairman A0 wishes to make her final choice by selecting the point in this set which has maximum entropy.
From now on we shall abbreviate a typical point < w(1) … w(m) > in i = 1 m V K i by w . For any such w we denote the vector < w j ( 1 ) w j ( m ) > by wj. Thus we may think of w as an m × J matrix with rows w(i), columns wj, and individual entries w j ( i ) .
Our problem is to analyze the linked structures of Γ ( K ) and Δ ( K ), and in particular to show that Δ ( K ) is convex. A slight complicating factor in this analysis turns out to be the possibility that some entries in a matrix w Γ ( K ) may turn out to be zero. Notice that the corresponding social belief function v will have j’th coordinate vj equal to zero if and only if some entry in the column vector wj is equal to zero. Such zero entries vj may be classified as of two possible kinds: either v j = 0 because for some i the knowledge base K i forces w j ( i ) = 0 , or, when this is not the case, because for some i it just so happens that w j ( i ) = 0. The first case is in a certain sense trivial since for an arbitrary w i = 1 m V K i the columns wj corresponding to such j will make zero contribution to the function to be maximised. For this reason it is convenient to introduce a notation which allows us to eliminate such j from consideration. Accordingly, for given K , we define the set of significant j, Sig K by:
Sig K = { j | for no i is it the case that w j ( i ) = 0 for all w ( i ) V K i }
Notice that by our initial assumption about K at the beginning of this section Sig K is non-empty.
For any w i = 1 m V K iwe now define w Sig K to be the projection of w on to those coordinates (i, j) such that j Sig K ; i.e., w Sig K may be viewed as the matrix obtained from the matrix w by deleting those columns j for which j Sig K . Similarly for any probability function w we define w Sig K to be the projection of w to a vector obtained by deleting those coordinates which are not in Sig K .(Notice however that the effect of this is that the sum of the components of such a w Sig K may be less than unity). Similarly we define
Γ Sig ( K ) = { w Sig K | w Γ ( K ) }
and
Δ Sig ( K ) = { v Sig K | v Δ ( K ) }
Note that in contrast to the situation for the row vectors of a matrix in Γ Sig ( K ) , the components of any vector in Δ Sig ( K ) do sum to unity, and that there is therefore a trivial homeomorphism between Δ Sig ( K ) and Δ ( K ) .
The next theorem18, which guarantees that Chairman A0’s plan is realisable, provides a crucial structure theorem for Γ ( K ) and Δ ( K ) , which depends strongly on the concavity properties of the geometric mean function and of sums of such functions.
Theorem 11 (Structure of Γ ( K ) and Δ ( K )).
Let K be a fixed vector of knowledge bases such that Δ ( K ) is not a singleton.
  • Let w Γ S i g ( K ) , and let v be the corresponding point in Δ S i g ( K ) .
    Then for each j S i g K then either w j ( i ) = 0 for all i = 1 … m or w j ( i ) is nonzero for all i = 1 … m.
    Furthermore in the case when w j ( i ) is nonzero for all i = 1 … m, if w is any other point in Γ S i g ( K ) with corresponding point v′ in Δ S i g ( K ) , then
    w j = ( 1 + μ j ) w j f o r s o m e μ j w i t h μ j 1.
    and hence also
    v j = ( 1 + μ j ) v j
  • There is a point w Γ S i g ( K ) with corresponding v Δ S i g ( K ) such that for every other point w Γ S i g ( K ) with corresponding v Δ S i g ( K ) , for each j S i g K there exists μj ≥ −1 such that
    w j = ( 1 + μ j ) w j
    and
    v j = ( 1 + μ j ) v j
  • The regions Γ S i g ( K ), Δ S i g ( K ), Γ ( K ) , and Δ ( K ) are all compact and convex.
  • If LogOpSig denotes the function defined on Γ S i g ( K ) by restricting the definition of the LogOp function defined on Γ ( K ) in 3.5 above to those j which are in S i g K , then
    LogOp S i g : Γ S i g ( K ) Δ S i g ( K )
    is a continuous bijection.
Proof. Define the function F : i = 1 m D J : by
F ( w ) = j = 1 J [ i = 1 m w j ( i ) ] 1 m
This is this function which is to be maximised for w i = 1 m V K i in order to define the points in the region Γ ( K ) . We note first of all that for non-negative arguments the geometric mean function is always concave (see e.g. [31]) and hence a sum of such functions is also concave. Since the region i = 1 m V K i is convex and compact by its definition, it follows that F attains a maximum value and hence that Γ ( K ) is non-empty. Moreover it is an easy consequence of the definition of a concave function that the set of points which give maximal value to such a function over a compact convex region itself forms a compact convex set. Thus Γ ( K ) is compact and convex. Since both compactness and convexity are preserved by projections in Euclidean space it follows that Γ S i g ( K ) is also compact and convex.
Let [ i = 1 m V K i ] S i g K denote the projection of i = 1 m V K i onto those coordinates with j Sig K . This region is also compact and convex. Then if we define FSig for any w i = 1 m V K i by
F Sig ( w Sig K ) = j Sig K [ i = 1 m w j ( i ) ] 1 m
then it is clear that
F Sig ( w Sig K ) = F ( w )
so that it suffices for us to confine our analysis to FSig acting on the points in [ i = 1 m V K i ] S i g K .
Now let us consider a general point a Γ S i g ( K ) . We will show that for every j Sig K we cannot have that a j ( i ) = 0 while a j ( i ) 0 for some i, i′ ∈ {1 … m}. Suppose for contradiction that such j, i and i′ exist. We first note that there exists some b [ i = 1 m V K i ] S i g K such that b j ( i ) 0for all i = 1 … m and all j Sig K . This follows from the convexity of [ i = 1 m V K i ] S i g K since for each particular i and j we can by our assumptions choose some x [ i = 1 m V K i ] S i g K such that x j ( i ) 0 and by convexity we can then form a suitable b by taking the arithmetic mean of all these. So let us fix some such b .
Let u = b a . Then by convexity, for any λ [ 0 , 1 ] , the point a + λ u is in [ i = 1 m V K i ] S i g K . Note that by the definition of b , for all i and j if a j ( i ) = 0 then u j ( i ) > 0..
Consider the behaviour of F Sig ( a + λ u ) as λ → 0. Now differentiating with respect to λ we get
d F Sig d λ ( a + λ u ) = 1 m j Sig K [ i = 1 m ( a j ( i ) + λ u j ( i ) ) ] 1 m i = 1 m u j ( i ) a j ( i ) + λ u j ( i )
As λ → 0+ we see that all terms on the right hand side are bounded except in the case of those i, j where a j ( i ) = 0 and at least one a j ( i ) is non-zero for some i′ ≠ i, in which case that term tends to +. Since we are supposing that such j, i and i′ do exist, it follows that FSig is increasing as a + λ u moves away from a , and hence since FSig is continuous at a , a , cannot be a maximum point of [ i = 1 m V K i ] S i g K , contradicting hypothesis. Thus we have shown that for any point w in Γ S i g ( K ) if some column vector of w has a zero entry then that column vector is identically zero, which establishes the first part of (i).
The second part of (i) follows directly from (ii), so we will prove (ii) instead.
By (i) and the convexity of Γ S i g ( K ) there exists an a such that if there exists any b in Γ S i g ( K ) for which for some j in Sig K bj is not a zero vector then all the entries of aj are non-zero. Let us fix such an a and let b be any other point in Γ S i g ( K ) . Again we consider u = b a for λ ∈ [0, 1], noting that in this case by the convexity of Γ S i g ( K ) , a + λ u is a point of Γ S i g ( K ) , and hence F Sig ( a + λ u ) = M K and so has constant value.
Let Sig K * denote { j | j Sig K and a j 0 } . Then by the definition of a and of Sig K *
F Sig ( a + λ u ) = j Sig K * ( i = 1 m ( a j ( i ) + λ u j ( i ) ) ) 1 m
Noting that all the a j ( i ) occurring on the right are by definition non-zero, differentiating twice with respect to λ we have
2 F Sig λ 2 ( a + λ u ) = 1 m 2 j Sig K * [ i = 1 m ( a j ( i ) + λ u j ( i ) ) ] 1 m [ [ i = 1 m u j ( i ) a j ( i ) + λ u j ( i ) ] 2 m i = 1 m ( u j ( i ) ) 2 ( a j ( i ) + λ u j ( i ) ) 2 ]
Since FSig is constant for λ ∈ [0, 1], setting the above expression equal to 0 for λ = 0 we get
1 m 2 j Sig K * [ i = 1 m a j ( i ) ] 1 m [ [ i = 1 m u j ( i ) a j ( i ) ] 2 m i = 1 m [ u j ( i ) a j ( i ) ] 2 ] = 0
from which we obtain
j Sig K * [ i = 1 m a j ( i ) ] 1 m i , i = 1 m [ u j ( i ) a j ( i ) u j ( i ) a j ( i ) ] 2 = 0
From the negative definite form of the above expression we deduce that for all j Sig K * and all i, i′ = 1 … m
u j ( i ) a j ( i ) = u j ( i ) a j ( i )
whence for all j Sig K * and all i, i′ = 1 … m
b j ( i ) a j ( i ) = b j ( i ) a j ( i )
which suffices to establish part(ii) of the theorem.
To show (iv) note that the function LogOp Sig : Γ Sig ( K ) Δ Sig ( K ) is by definition continuous and surjective. However by (ii) it is also clearly injective. Finally to show part (iii) we have already noted that Γ ( K ) and Γ Sig ( K ) are compact and convex. Since Δ ( K ) and Δ Sig ( K ) are the continuous images of these compact sets under LogOp and LogOpSig respectively, it follows that Δ ( K ) and Δ Sig ( K ) are also compact. From the convexity of Γ Sig ( K ) the convexity of Δ Sig ( K ) follows by (ii), while the convexity of Δ ( K ) follows immediately from that of Δ Sig ( K ) . This completes the proof of Theorem 11. □
Now since Δ ( K ) is a compact convex set by 11(iii) and since the entropy function
j = 1 J v j log v j
is strictly concave and bounded over this set, the set contains a unique point vME at which the entropy function achieves its maximum value. It follows at once that the following formal definition of the social inference process SEP defines, for every K satisfying the conditions of this section, a unique social belief function.
Definition 12. The Social Entropy Process, SEP, is the social inference process defined by
SEP ( K ) = ME ( Δ ( K ) )
where ME ( Δ ( K ) ) denotes the unique maximum entropy point in Δ ( K ) .
We remark that it follows immediately from the definition above that the social inference process SEP marginalises to the inference process ME and to the pooling operator LogOp.
It is worth noting that Theorem 11(i) at once provides a simple sufficient condition for Δ ( K )a singleton and thus for the application of ME in the definition of SEP ( K ) to be redundant:
Corollary 13. If K1 Km are such that for each j = 1 … J except possibly at most one there exists some i with 1 ≤ i ≤ m such that the condition w ( i ) V K i forces w j ( i ) to take a unique value, then Δ(K1 Km) is a singleton. In particular this occurs if for some i V K i is a singleton.

3.2. Principles Satisfied by SEP

Theorem 14. SEP satisfies the seven principles of section 2.3: Equivalence, Anonymity, Atomic Renaming, Consistency, Collegiality, General Locality, and Proportionality.
Proof. The fact that principles of Equivalence, Anonymity, and Atomic Renaming hold for SEP follows easily from the basic symmetry properties of the definition of SEP.
To prove that SEP satifies Consistency, suppose that K = K 1 K m are such that
i = 1 m V K i ϕ
Then for any u i = 1 m V K i , if we set
v = w ( 1 ) = = w ( m ) = u
Then
j = 1 J ( i = 1 m w j ( i ) ) 1 m = 1
and since by Lemma 9 M K 1 , it follows that M K = 1 , and hence that u Δ ( K ) . Conversely by Lemma 9, since M K = 1 , then for any v Δ ( K ) if some w Γ ( K ) generates v, v=w(1)=…=w(m), and so u i = 1 m V K i . It follows that
SEP ( K 1 K m ) i = 1 m V K i
as required.
To prove Collegiality suppose that K1 Km are such that for some k with 1 < k < m
SEP ( K 1 K k ) i = k + 1 m V K i
Let v = SEP ( K 1 K k ) and let v ^ = SEP ( K 1 K m ) .
Let < w ( 1 ) w ( k ) > Γ ( K 1 K k ) be such that v ^ = LogOp ( w ( 1 ) w ( k ) ) .
Similarly let < w ( 1 ) w ( m ) > Γ ( K 1 K m ) be such that v ^ = LogOp ( w ( 1 ) w ( m ) ) . Then by definition
i = 1 k j = 1 J v j log v j w j ( i )
takes its minimum possible value for w(1) … w(k) subject to the constraints K1 Kk when < w ( 1 ) w ( k ) > = < w ( 1 ) . . . w ( k ) > and v = LogOp ( w ( 1 ) w ( m ) ) . We denote this value by Min1. Similarly
i = 1 m j = 1 J v j log v j w j ( i )
takes its minimum possible value for w(1) … w(k) subject to the constraints K1 Kk when < w ( 1 ) w ( k ) > = < w ( 1 ) . . . w ( k ) > and v = LogOp ( w ( 1 ) w ( m ) ) . We denote this value by Min1. Similarly
i = 1 m j = 1 J v j log v j w j ( i )
takes its minimum possible value for w(1) … w(m) subject to the constraints K1 Km when < w ( 1 ) w ( m ) > = < w ^ ( 1 ) . . . w ^ ( m ) > and v = LogOp ( w ^ ( 1 ) w ^ ( m ) ) . We denote this value by Min2.
We now define ŵ(i) to be equal to v ^ for k + 1 ≤ i ≤ m. Notice that by hypothesis ŵ(1) ŵ(m) now satisfy respectively the constraints K1 Km. Hence we have by the definitions above
M i n 2 i = 1 m j = 1 J v j log v j w j ( i ) = i = 1 k j = 1 J v j log v j w j ( i ) = M i n 1 .
Similarly we also have
M i n 2 = i = 1 m j = 1 J v ^ j log v ^ j w ^ j ( i ) i = 1 k j = 1 J v ^ j log v ^ j w ^ j ( i ) M i n 1 .
It follows that the six quantities appearing in (8) and (9) above are all equal, and hence that
v and v ^ are both in Δ ( K 1 K k ) Δ ( K 1 K m ) .
However by definition v is the unique belief function with the highest entropy in Δ(K1 Kk), while v ^ is the unique belief function with the highest entropy in Δ(K1 Km). Hence v = v ^ as required.
To prove General Locality, consider a college with members A1 Am initially having respective knowledge bases K1 Km, where each Ki is a nice set of constraints conditioned on some fixed non-contradictory sentence ϕ. Now for each i = 1 … m let K i * be a nice set of constraints about ¬ϕ. We are given that
SEP ( K 1 K 1 , , * K m * ) ( ϕ ) 0 and that SEP ( K 1 , K m ) ( ϕ ) 0 . We must show that for any sentence θ
SEP ( K 1 K 1 , , * K m K m * ) ( θ | ϕ ) = SEP ( K 1 , , K m ) ( θ | ϕ ) .
Clearly for this purpose it suffice to show that for any atom α such that α |= ϕ
SEP ( K 1 K 1 , , * K m K m * ) ( α | ϕ ) = SEP ( K 1 , , K m ) ( α | ϕ ) .
Notice that while we assume about each Ki that it determines a closed convex set of probability functions conditioned on ϕ, such a Ki when interpreted as a set of constraints about beliefs in the original atoms α1, α2, … αJ also determines a closed convex region of ⅅJ which as usual we denote by V K i. Hence V K i K i * , is also a closed convex region of ⅅJ. Furthermore the conditions imply that for each i = 1 m K i K i * , is consistent, and hence the above applications of SEP are legitimately made.
Without loss of generality we may assume as in the proof of Theorem 4 that the atoms are so ordered that for some k with 1 ≤ k < J
ϕ j = 1 k α j and ¬ ϕ j = k + 1 J α j
Let u = SEP(K1, … Km) be generated by x Γ ( K 1 , , K m ), and let v = SEP ( K 1 K 1 * , , K m K m * ) be generated by
y Γ ( K 1 K 1 * , , K m K m * ). For each i = 1 … m, let j = 1 k x j ( i ) = a ( i ), and let j = 1 k y j ( i ) = b ( i ). Note that a(i) and b(i) are non-zero for all i since otherwise ϕ would get social belief zero contradicting hypotheses.
Now consider the point z i = 1 m V K i given for each i = 1 … m by
z j ( i ) = { y j ( i ) a ( i ) b ( i ) for j = 1 , , k x j ( i ) for j = k + 1 , , J
By the definition of the point x we know that
j = 1 J [ i = 1 m z j ( i ) ] 1 m j = 1 J [ i = 1 m x j ( i ) ] 1 m
from which it follows that
j = 1 k [ i = 1 m y j ( i ) a ( i ) b ( i ) ] 1 m j = 1 k [ i = 1 m x j ( i ) ] 1 m
Dividing both sides by [ i = 1 m a ( i ) ] 1 m we obtain that
j = 1 k [ i = 1 m y j ( i ) b ( i ) ] 1 m j = 1 k [ i = 1 m x j ( i ) a ( i ) ] 1 m .
However by repeating a similar argument, but this time with x and y interchanged we obtain the reverse inequality, from which it follows that
j = 1 k [ i = 1 m y j ( i ) b ( i ) ] 1 m = j = 1 k [ i = 1 m x j ( i ) a ( i ) ] 1 m = M 1 say .
Note that the above equality implies that the value M1 does not depend on the K i * in any way.
Let j = k + 1 J [ i = 1 m y j ( i ) ] 1 m = M 2 and let [ i = 1 m b ( i ) ] 1 m = B.
Then from (3) we know that j = 1 k [ i = 1 m y j ( i ) ] 1 m = M 1 B.
Let us denote by C the quantity
j = 1 J [ i = 1 m y j ( i ) ] 1 m = M 1 B + M 2
and we note that by definition C is the maximal value which can be taken by j = 1 J [ i = 1 m t j ( i ) ] 1 m for any t i = 1 m V K i K i *. We now consider those t of this form for which t j ( i ) = y j ( i ) for all j = k + 1, … , J and all i = 1, … , m. Then, since for each j = 1 k υ j = C 1 [ i = 1 m y j ( i ) ] 1 m the definition of y ensures that the column vectors y1yk are of the form t1tk where
j = 1 k C 1 [ i = 1 m t j ( i ) ] 1 m log ( C 1 [ i = 1 m t j ( i ) ] 1 m )
is maximised subject to the conditions that for each i the probability distribution < t 1 ( i ) b ( i ) t k ( i ) b ( i ) > satisfies the knowledge base Ki, that
j = 1 k [ i = 1 m t j ( i ) ] 1 m = M 1 B
and that for each i
j = 1 k t j ( i ) = b ( i ) .
Using some elementary algebra and (13) above we can rewrite the quantity in (12) which is to be maximised as
M 1 B C log B C B C j = 1 k [ i = 1 m t j ( i ) b ( i ) ] 1 m log [ i = 1 m t j ( i ) b ( i ) ] 1 m
Now since B, C, and M1, are positive constants for the t under consideration, it follows that maximising (15), or equivalently (12), under the given constraints, is equivalent to maximising
j = 1 k [ i = 1 m t j ( i ) b ( i ) ] 1 m log [ i = 1 m t j ( i ) b ( i ) ] 1 m
Hence, writing w j ( i ) for t j ( i ) b ( i ), if follows that this is in turn equivalent to maximising
j = 1 k [ i = 1 m w j ( i ) ] 1 m log [ i = 1 m w j ( i ) ] 1 m
subject to the constraints that each k-dimensional row vector w ( i ) < w 1 ( i ) w k ( i ) > sums to 1 and satisfies Ki when interpreted as a probability function conditioned on ϕ, and that
j = 1 k [ i = 1 m w j ( i ) ] 1 m = M 1
Now by the remark following (10), the value M1 must be the largest possible which can be attained by j = 1 k [ i = 1 m w j ( i ) ] 1 m for the w(i) probability functions satisfying the Ki. Hence since the Ki are nice constraint sets, it follows by the fact that SEP is well-defined that any solution for w to the above maximisation problem generates the unique SEP(K1,…, Km) solution given by
SEP ( K 1 , , K m ) ( α j | ϕ ) = [ i = 1 m w j ( i ) ] 1 m r = 1 k [ i = 1 m w r ( i ) ] 1 m
for j = 1 … k.
However by the definition of the above w j ( i ) and the uniqueness of the SEP values, it follows that for such a solution w , for each j = 1 … k
[ i = 1 m w j ( i ) ] 1 m = [ i = 1 m y j ( i ) b ( i ) ] 1 m
whence for each j = 1 … k
SEP ( K 1 , , K m ) ( α j | ϕ ) = [ i = 1 m y j ( i ) b ( i ) ] 1 m r = 1 k [ i = 1 m y r ( i ) b ( i ) ] 1 m = C 1 [ i = 1 m y j ( i ) ] 1 m C 1 r = 1 k [ i = 1 m y r ( i ) ] 1 m = SEP ( K 1 K 1 * , , K m K m * ) ( α j | ϕ )
as required. This concludes the proof of General Locality.
It remains for us to prove that SEP satisfies Proportionality.
Let K1,…, Km be knowledge bases and for each r = 1 … n let Kir denote a copy of the knowledge base Ki, so that V K i r = V K i. As a shorthand we denote the sequence Ki1Kin by nKi. Clearly it suffices for us to prove that
Δ ( n K 1 , n K 2 , , n K m ) = Δ ( K 1 , K 2 , , K m )
Let υ ∈ Δ(nK1, nK2, …, nKm) be generated by some w Γ ( n K 1 , n K 2 , , n K m ). Then letting
r = 1 n i = 1 m j = 1 J υ j log υ j w j ( i r ) = D
by definition D is minimal subject only to the constraints that w ( i r ) V K i for all r = 1 … n and i = 1 … m, (but no constraints on υ). Then for each r = 1 … n
i = 1 m j = 1 J υ j log υ j w j ( i r ) = D n
and
D n is the minimum value which can be taken by
i = 1 m j = 1 J υ j log υ j z j ( i ) for z ( i ) V K i
(22) holds because otherwise we would have that for some r0 with 1 ≤ r0 ≤ n
i = 1 m j = 1 J υ j log υ j w j ( i r 0 ) < D n
and if we then define y by
y j ( i r ) = w j ( i r 0 )
for all i, j and r, we would have that r = 1 n i = 1 m j = 1 J υ j log υ j y j ( i r ) < D contradicting the definition of D in (21). The same argument shows also that (23) holds. From (22) and (23) it follows that υ ∈ Δ(K1, K2, …, Km).
Conversely if some υ ∈ Δ(K1, K2, …, Km) is generated by a z Γ ( K 1 , K 2 , , K m ) then it is easy to see that
i = 1 m j = 1 J υ j log υ j z j ( i ) = D n
where D is the minimal value defined at (14) since the value of i = 1 m j = 1 J υ j log υ j z j ( i ) cannot be smaller than D n by the same argument used to show (22) and (23). However if we now define w by w j ( i r ) = z j ( i ) then by (24) the equation (21) holds and so υ ∈ Δ(nK1, nK2, …, nKm).
Thus Δ(nK1, nK2, …, nKm) = Δ(K1, K2, …, Km) as required.
This concludes the proof of Theorem 14. □
Remark 2. We note that Savage [32] has shown that a certain form of converse of Collegiality holds for SEP. Namely
if K1Km are such that for each j = 1 … J SEP(K1Km)(αj) ≠ 0, then if SEP ( K 1 K m 1 ) V K m then SEP(K1Km−1) ≠ SEP(K1Km).
Further related properties of a social inference process may be found in [17] and [26].

3.3. Some Other Possible Principles

We end this chapter with some brief remarks concerning possible generalisations to the context of a social inference process, and in particular to SEP, of some remaining key principles which were identified by Paris and Vencovská ([3], [2]) as characterising the ME inference process.
One such key principle satisfied by ME, is that of Open Mindedness. An inference process I satisfies Open Mindedness if for every knowledge base K, for all j = 1 … J I ( K ) ( α j ) 0 unless wj = 0 for all wVK. The most obvious way of extending this principle to the case of a social inference process F would seem to be to propose that for all j = 1 … J and for all K1, K2, … Km, F (K1, K2, … Km)(αj) ≠ 0 unless for some i w j ( i ) = 0 for all w ( i ) V K ( i ). It iseasy to see however that such a principle cannot hold for any F which satisfies the Consistency Principle (cf. [26]). For if we take the example where there are three atoms α1, α2, α3, and K 1 = { w ( α 1 ) = 1 3 }, while K 2 = { w ( α 2 ) = 2 3 }, then by the consistency principle the only possible social belief function is given by v = < 1 3 , 2 3 , 0 > despite the fact that neither K1 nor K2 on their own force belief in α3 to be zero. Furthermore, at least in the case of the inference process SEP, it is easy to show that similar counterexamples K 1 and K 2 to such a principle can be found where the union of the constraint sets K 1 and K 2 is not consistent. It seems reasonable to conclude therefore that Open Mindedness, at least in this formulation, is not a reasonable principle for a social inference process. Nevertheless SEP does satisfy the following weak form of Open Mindedness:
Theorem 15 (Weak Open Mindedness for SEP). For any atom α and vector of knowledge bases K , if SEP ( K )(α) = 0 then at least one of the following holds:
  • For some i = 1 … m w(α) = 0 for all w V K i
  • For every i = 1 … m w(α) = 0 for some w V K i
Proof. Since by its definition SEP ( K ) is obtained by applying LogOp to a certain point in Γ ( K ), the result follows easily from part (i) of theorem 11, the structure theorem for Γ ( K ) and Δ ( K ). □
The above result can be rephrased by saying that SEP ( K )(α) will be non-zero unless either some Ki forces w(α) to be zero, or for each i it is consistent with Ki that w(α) = 0. In the converse direction it is clear that the first condition suffices to ensure that SEP ( K )(α) = 0, but that the second condition does not. Note that we have formulated Weak Open Mindedness for SEP in terms that would make sense (but would not necessarily hold) for an arbitrary social inference process, since we do not explicitly refer to Γ ( K ) or Δ ( K ). However it is worth noting that in the case of SEP, theorem 15 still holds if we replace the second condition by the much stronger condition:
2 . F o r a l l i = 1 m a n d a l l w Γ ( K ) w ( i ) ( α ) = 0 ,
and moreover, in the converse direction, this condition obviously implies the statement that SEP ( K )(α) = 0.
Another pleasing property of the inference process ME, identified in [3] is that of continuity with respect to the Blaschke topology. At present we do not know whether an analogous formulation of this continuity principle holds for SEP, although it seems likely that this is the case.
The Obstinacy Principle for an inference process I states that if K and K′ are knowledge bases such that I ( K )VK′ then I ( K K ) = I ( K ). While this principle is satisfied by ME, and indeed by nearly all standard inference processes (see [3]), an appropriate straightforward generalisation of the principle to the multi–agent context is not apparent. On the other hand it may be noted that the Collegiality Principle bears a certain formal resemblance to Obstinacy.
The important remaining principles characteristic of ME, as formulated in [2], [3] are those of Language Invariance, Irrelevant Information, and Independence. These important properties have in common the fact that they are most naturally stated in a context where the atoms of At arise as the Boolean atoms of a finite propositional language L as in Remark 1. In the formulations which follow we shall therefore assume this context.
The Language Invariance Principle
For an inference process I this principle states that, for any K, I ( K ) does not depend on the on the underlying language L in which K is formulated. A fine point here is that strictly speaking an inference process or social inference process is formulated for a fixed language (or set of atoms). So more formally we should refer to an inference process I L rather than just I in order to make explicit the underlying language L. However it is clear that any general formulation of an inference process, such as ME, will in reality represent a family of inference processes I L, one for each possible L. Then the following problem naturally arises. Suppose the knowledge base K is formulated in the language L, and the language L is now extended to a larger propositional language L′ by the addition of some new propositional variables. Then K is also a knowledge base for the language L′. Intuitively if θ is a sentence (i.e., disjunction of atoms) of L, we would expect that the mere expansion of the language from L to L′ without the addition of any new constraints should not change the belief which is accorded to θ. So following [2] we say that I satisfies Language Invariance if for all L, L′, K and θ as above
I L [ K ] ( θ ) = I L [ K ] ( θ )
A large number of inference processes, including ME, satisfy Language Invariance (cf. [6] p. 213). Moreover Language Invariance seems as natural a principle for social inference processes as it does for inference processes: we say that F satisfies Language Invariance if
F L [ K ] ( θ ) = F L [ K ] ( θ )
for all LL′, sequences of knowledge bases K formulated in L, and sentences θ of L.
Pleasingly SEP does satisfy Language Invariance, as was shown in [27].
The Irrelevant Information Principle
An inference process I satisfies the Irrelevant Information Principle if whenever a finite propositional language L is the union of two disjoint languages L1 and L2, and K1 and K2 are knowledge bases for the languages L1 and L2 respectively, then
I L [ K 1 K 2 ] ( θ ) = I L [ K 1 ] ( θ )
for all sentences θ of the language L1.
Intuitively this is also a very natural principle which roughly says that adding additional knowledge formulated in a language L2 should not affect our beliefs in sentences formulated in a disjoint language L1 formed on the basis of knowledge about L1. It is however a principle which is hard to satisfy, and excluding some artificial constructions, only two inferences processes are known to satisfy it: ME and the Maximin process of [6].
The principle has a natural generalisation to social inference processes. We say that a social inference process F satisfies Irrelevant Information if for any L, L1 and L2 as above, and for any sequences of knowledge bases K1,… Km and K1,…. K′m for L1 and L2 respectively
F L [ K 1 K 1 K m K m ] ( θ ) = F L [ K 1 K m ] ( θ )
for all sentences θ of the language L1.
However SEP does not satisfy the Irrelevant Information Principle, although it is known to satisfy a weak form; specifically, with the above notation, the extra conditions are required that (i) K1 ∪…∪ Km is consistent and that (ii) Δ L 1 [ K 1 K m ] is a singleton (see [27]). It is not known if the second condition can be dropped.
Independence
The inference process ME satisfies very natural independence properties which can be expressed in a number of different ways. When considering a formulation which it might be appropriate to generalise to the case of a social inference process, the following property of an inference process I, which is satisfied by ME, seems particularly natural:
Let L = L1L2 where L1 and L2 are disjoint propositional languages. Let K1 and K2 be knowledge bases in L1 and L2 respectively. Then
I L [ K 1 K 2 ] = I L 1 [ K 1 ] . I L 2 [ K 2 ]
where the multiplication of the two belief functions on the right is performed in the obvious way to yield a belief function on L.
If an inference process I has the above property for all L, L1, L2, K1 and K2 as above then we say that I satisfies Strong Independence.
The fact that ME satisfies strong independence is proved in [3]. Notice that provided that I satisfies Language Invariance it follows easily that if I satisfies Strong Independence then it satisfies Irrelevant Information. However even in the presence of Language Invariance the converse implication does not hold as the Maximin inference process of [6] attests.
We may generalise the property to a social inference F by defining F to satisfy Strong Independence if for any L, L1 and L2 as above, and for any sequences of knowledge bases K1, Km and K1,… K′m for L1 and L2 respectively
F L [ K 1 K 1 K m K m ] = F L 1 [ K 1 K m ] . F L 2 [ K 1 K m ]
Again, provided that F satisfies Language Invariance, it follows easily that if F satisfies Strong Independence, then it satisfies Irrelevant Information. Since SEP satisfies Language Invariance but not Irrelevant Information, SEP does not satisfy Strong Independence. However, as is noted in [26], there do exist F which satisfy both Language Invariance and Strong Independence, but the only ones known so far are obdurate.

4. An Alternative Definition of SEP

4.1. The Self–Effacing Chairman

A remarkable characteristic of SEP is that the use of maximum entropy at the second second stage of the defining process, which is included in order to force the choice of a social belief function to be unique in cases when this would not otherwise hold, can actually be eliminated by insisting that the social inference process satisfy a limiting variant of the axiom of proportionality. Such an argument counters a possible objection that the invocation of maximum entropy at the second stage of the definition is somewhat artificial. To be precise it is possible to substitute the following procedure to define SEP. We will explain and justify the procedure heuristically before formally stating and proving the corresponding theorem.
In order to calculate a unique social belief function v for a college M with vector of knowledge bases K = K 1 K m, Chairman A0 recognises that she may have to use a casting knowledge base of her own in order to eliminate ambiguities caused by the failure of the agreed process of minimising the sum of Kullback-Leibler divergences to provide a unique social belief function. However, as a good chairman, she wishes to intervene in a manner which (a) demonstrates that she is completely unbiased, and (b) reduces to an absolute minimum the effect which her own opinion may have on the outcome. In order to fulfil (a) it seems clear to her that she should choose her casting knowledge base K0 to be a constraint set I with
V I = { < 1 J , 1 J 1 J > }
Her only other possible choice would seem to be to take K0 to be the empty set of constraints, but by Collegiality this would clearly not resolve any ambiguity. On the other hand Chairman A0 worries that if she simply adds in her knowledge base I as a single extra member of the opinion forming body, she may be exerting more influence than is necessary or appropriate, if other opinions are finely balanced. She therefore resolves to dilute her influence in the following manner. Inspired by the Proportionality Principle, she imagines that, for some large finite number n, each member of the college except herself is replaced by exactly n clones, each clone having exactly the same set of constraints as the member replaced; and to this virtual new college of nm members A0 adds herself as a single additional member with knowledge base I as above.
The vector of sets of constraints of the members of the new college of nm + 1 members now looks as follows:
K 1 , , K 1 , K 2 , , K 2 , , K m , , K m , I
Chairman A0 notices that since VI is a singleton, by Corollary 13 the result of minimising the sum of Kullback-Leibler divergences subject to these constraint sets will, for any given n always yield a unique social belief function. She reasons that if as n the resulting sequence of social belief functions converges to a belief function v then this should be an optimal choice of social belief function since her own influence on the process will surely have become as diluted as possible, thus satisfying the condition (b) above. We will prove in Theorem 17 below that not only does this sequence of belief functions converge, but that the resulting limiting belief function v will in fact always be SEP ( K ). This is true whether or not Δ ( K ) is a singleton. Consequently Chairman A0 can reason that her use of ME in the definition of SEP is fully justified by the above heuristic.

4.2. Weak SEP and The Chairman’s Theorem

In order to state formally and prove the result stated above we introduce the following definition:
Definition 16. The Weak Social Entropy Process, WSEP, is defined by
WESP ( K ) = { v i f Δ ( K ) i s t h e sin g l e t o n { v } , u n d e f i n e d o t h e r w i s e
WSEP is of course not a true social inference process since it is only partially defined. Obviously however WSEP ( K ) = SEP ( K ) whenever the former is defined.
We will denote the knowledge bases of the college of nm + 1 members
K 1 , , K 1 , K 2 , , K 2 , , K m , , K m , I
in abbreviated form by
n K , I .
Theorem 17 (The Chairman’s Theorem). For any K and any n ∈ ℕ+
WSEP ( n K , I ) = SEP ( n K , I )
and furthermore
lim n WSEP ( n K , I ) = SEP ( K )
Proof. Since VI is a singleton, by Corollary 13 Δ ( n K , I ) is always a singleton, and so WSEP ( n K , I ) is a well-defined point for any n ∈ ℕ+, from which the first part of the theorem follows trivially. It does not follow from this that Γ ( n K , I ) is a singleton, but we will show below that nevertheless “significant” coordinates are uniquely determined.
For now let us fix n. Then if WSEP ( n K , I ) = v say, and noting that
Sig n K , I = Sig K ,
then for every jJ
v j = 0 if and only if j Sig K
This is true because if w is a point in Γ ( n K , I ) which generates v, then if j Sig K then since w j ( m n + 1 ) = 1 J it follows from Theorem 11(i) that every entry in the column vector w j is non-zero, so v j is non-zero.
Furthermore for any such w in Γ ( n K , I ) it is clear that the first n rows, i.e., with i = 1 n , which correspond to the members with knowledge base K1, must all be identical for those entries w j ( i ) with j Sig K . For if two of these rows were not so identical then, if they differed in the j’th entry for some j Sig K , we could interchange them to obtain a different point w in Γ ( n K , I ): however the j’th column w j could not then be a multiple of w j, contradicting Theorem 11. Moreover exactly the same argument works for the second and subsequent blocks of n rows, up to the m’th block of n rows.
From the above observations it follows that finding an w in Γ ( n K , I ) is essentially the same problem as finding an x i = 1 m V K i for which
j = 1 J [ 1 J i = 1 m ( x j ( i ) ) n ] 1 n m + 1 is maximal ,
or equivalently, for which the function defined by
H ( n ) ( x ) = j = 1 J [ [ i = 1 m x j ( i ) ] 1 m ] ( 1 ϵ ( n ) ) is maximal ,
where
( n ) = 1 m n + 1 0 as n .
Note that for any ϵ(n) as above the values of [ i = 1 m x j ( i ) ] 1 m for which H ( n ) ( x ) is maximal are uniquely determined for each j = 1…J and are non-zero if and only if j Sig K .
In order to make what follows more readable, we shall temporarily write ϵ instead of ϵ(n) and suppress the dependence of ϵ on n.
For any such ϵ as above we denote the vector of unique values of [ i = 1 m x j ( i ) ] 1 m as defined above by
y = < y 1 , y J , >
and we denote the maximal value of H ( x ) by m , so that
m = j = 1 J ( y j , ) 1 ϵ
and let
m = j = 1 J y j ,
We need to examine the behaviour of yϵ as ϵ → 0, i.e., as n → ∞.
Define M0 to be M K , i.e the maximum possible value of j = 1 J [ i = 1 m x j ( i ) ] 1 m . By our initial assumptions M0 > 0.
A straight forward consequence of the above definitions is the following:
Lemma 18.
M M 0 m f o r a l l ϵ ( 0 , 1 )
Lemma 19.
M M 0 a s 0 + .
Proof. We show first that the function y 1 ϵ converges uniformly to y as ϵ 0+ in the sense that there exists some positive real valued function T (ϵ) such that for all y ∈ [0, 1] and all ϵ with 0 < ϵ < 1 2
y 1 ϵ < y + T ( ϵ ) and l i m ϵ 0 + T ( ϵ ) = 0.
Now
y 1 ϵ = y e s log y
whence, expanding the exponential function as a power series, multiplying by y, and taking out a factor of ϵ, we get
y 1 ϵ y = ϵ k = 1 ϵ k 1 y ( log y ) k k !
The absolute value of y (log y)k is at a maximum when y = e−k and hence the absolute value of the k’th term of the above series is bounded by ϵ k k ! [ k e ] k . Since this bound decreases for decreasing ϵ, we have that for all ϵ with 0 < ϵ < 1 2 and y [ 0 , 1 ]
y 1 ϵ y < ϵ k = 1 [ 1 2 ] k 1 [ k e ] k 1 k !
and since the sum converges by d’Alembert’s ratio test, the right hand side provides the required function T (ϵ).
To complete the proof of 4.4 we note that, using 4.3 and the above,
M ϵ M 0 m ϵ = j = 1 ( y j , ϵ ) 1 ϵ j = 1 y j , ϵ + T ( ϵ ) = M ϵ + T ( ϵ ) .
Hence, letting ϵ tend to zero, we obtain the required result. □
We now note that for fixed ϵ an equivalent definition of yϵ is as that vector of values which maximises the function
G ϵ ( y ) = 1 ϵ log [ j Sig K ( y j ) ( 1 ϵ ) M ϵ ]
subject to the conditions that
y j = [ i = 1 m ( x j ( i ) ) ] 1 m for j Sig K , and x i = 1 m V K i .
For fixed ϵ we will now consider the behaviour of Gϵ(y) for general y satisfying conditions (29) above. Actually we are only interested in those y which are either of the form yϵ(n) for some n or which are such that j = 1 J y j = M 0, and from now on we shall assume that y is of this kind. We note that for j Sig K 0 < y j 1 , and that for such yj for any k +|yj(log yj)k| is uniformly bounded above by ( k e ) k ,(as in the proof of 4.4).
By 4.4 it follows that
J j = 1 J y j > c > 0
for some fixed bound c for all such y.
Now
( y j ) 1 ϵ = ( y j ) e ϵ log y j
= y j ϵ y j log y j + k = 2 y j ( ϵ log y j ) k
whence
j Sig K ( y j ) 1 ϵ = j Sig K y j [ 1 ϵ j Sig K y j log y j j Sig K y j + O ( ϵ 2 ) ]
where the term O(ϵ2) is such that its modulus is, by the argument in the proof of 4.4, uniformly bounded by ϵ2D for some positive constant D.
Rewriting the equation (4) we now have
G ϵ ( y ) = 1 ϵ [ log [ 1 ϵ j Sig K y j log y j j Sig K y j + O ( ϵ 2 ) ] + log j Sig K y j M ϵ ]
Expanding the logarithm as a power series and using (6) we obtain
G ϵ ( y ) = j Sig K y j log y j j Sig K y j + ϵ R ( ϵ , y ) + 1 ϵ log j Sig K y j M ϵ
where | R(ϵ, y) | has a uniform bound independent of y and of ϵ.
Now notice the following facts about equation (35):
  • For given ϵ = ϵ (n) corresponding to a specific value of n, the vector yϵ satisfies
    G ϵ ( y ϵ ) = j Sig K ¯ y j , ϵ log y j , ϵ M ϵ + ϵ R ( ϵ , y ϵ )
    since the final term vanishes.
  • For any y for which j Sig K ¯ y j = M 0 the final term of (35) is positive since Mϵ ≤ M0 by 4.3.
Let us denote by z that unique y for which
j Sig K ¯ y j = M 0 and j Sig K ¯ y j log y j is maximal .
Then
SEP ( K ) = < z 1 M 0 z J M 0 >
since
j Sig K ¯ y j log y j is maximal if and only if j Sig K ¯ y j M 0 log y j M 0 is maximal
.
To complete the proof of the theorem we need to show that
lim n < y 1 , ϵ ( n ) M ϵ ( n ) y J , ϵ ( n ) M ϵ ( n ) > = < z 1 M 0 z J M 0 >
Since by 4.4 MϵM0 as ϵ 0, it suffices to show that yϵ z as ϵ 0.
Now since all the y are in [0, 1]J, by compactness the sequence of yϵ(n) for n ∈ ℕ has a convergent subsequence, say yϵ(ρ(n)), where ϵ(ρ(n)) 0 as n → ∞.
Let
lim n y ϵ ( ρ ( n ) ) = y * .
Then from (36) above and the fact that Mϵ→ M0, it follows that
lim n G ϵ ( ρ ( n ) ) ( y ϵ ( ρ ( n ) ) ) = j ϵ Sig K y j * log y j * M 0
and
j ϵ Sig K y j * = M 0
We now show that y* = z.
For suppose for contradiction that this were not so. Let
1 M 0 [ j ϵ Sig K z j log z j + j ϵ Sig K y j * log y j * ] = d
Then d > 0 since y* and z both have sum M0 and z is the unique maximum entropy point. Now by (35)
G ϵ ( ρ ( n ) ) ( z ) = d j Sig K y j * log y j * M 0 + ϵ ( ρ ( n ) ) R ( ϵ ( ρ ( n ) ) , z ) + 1 ϵ ( ρ ( n ) ) log M 0 M ϵ ( ρ ( n ) ) d j Sig K y j * log y j * M 0 + ϵ ( ρ ( n ) ) R ( ϵ ( ρ ( n ) ) , z )
However for large enough the right hand-side is strictly greater than Gϵ(ρ(n))(yϵ(ρ(n))) by (36), (40), and the boundedness of R.
This is impossible since then Gϵ(ρ(n))(z) > Gϵ(ρ(n))(yϵ(ρ(n))) which contradicts the definition of yϵ(ρ(n)). Thus we have shown that y* = z.
It remains to show that the whole sequence of the yϵ(n) converges to z as n → ∞. If this were not the case then there would be some δ > 0 such that there exists an infinite subsequence yϵ(τ(n)) of the yϵ(n) such that the yϵ(τ(n)) are bounded away from z by Euclidean distance |yϵ(τ(n))z| > δ for all n ∈ ℕ. However now by compactness again this subsequence yϵ(τ(n)) itself has an infinite convergent subsequence which converges to a point say y**. By the same argument as for y* we must have that y** = z ; on the other hand by its definition y** is bounded away from z by distance at least δ, which gives a contradiction. Thus we have established (38) and the proof of Theorem 17 is complete.
Remark 3. It is worth noting that in the very special case when there is only a single member A1 of the college apart from the Chairman A0, the explanation of Theorem 17 given at the beginning of this section provides a new interpretation of an old technical result. For in this special case, for any n ∈ ℕ, WSEP(nK1, I) returns that probability function v which satisfies the constraints K1 and which maximises the function
j = 1 J v j ( n n + 1 )
In other words for a given n ∈ ℕ this gives the same result as applying the Renyi inference process RENr with parameter r = ( n n + 1 ). Now it is an old result (see e.g. [33] or [6]) that as r → 1 the result of applying the Renyi process RENr to a given set of constraints K1 tends to the maximum entropy solution for K1. So the heuristic explanation underlying Theorem 17 may be regarded as a generalised interpretation of this classical result, albeit from a quite new perspective.
Remark 4. The reader might wonder whether perhaps a rather more general “limit proportionality” theorem than theorem 17 might hold for SEP, which would assert that for any vectors of knowledge bases K and K
lim n SEP ( n K , K ) = SEP ( K )
Such an assertion is however easily seen to be false even in the simplest of cases, simply by choosing K to comprise a single knowledge base which is empty, and K to comprise a single knowledge base which specifies a single belief function distinct from < 1 J , 1 J 1 J >.

5. Conclusion

5.1. A Critical Evaluation and Future Directions

In the present paper we first introduced the general notion of a Social Inference Process, which provides an axiomatic framework for the study of how to elicit a single optimally representative probability function derived from partial information about the probabilistic beliefs of several different agents. We have examined in some detail the properties of the particular social inference process SEP. In particular we have noted that SEP satisfies eight important desiderata: Equivalence, Anonymity, Atomic Renaming, Consistency, Collegiality, General Locality, Proportionality, and Language Invariance19. Some of these desiderata are relatively easy for a Social Inference Process to satisfy, but Collegiality, and in particular General Locality, are harder to satisfy.
SEP was initially defined in two stages: a merging stage consisting of a merging operator Δ which, given a vector of knowledge bases K , yields a merged knowledge base Δ ( K ) . From this merged knowledge base Δ ( K ) the social belief function is then extracted by an application of ME. In Chapter 4 we proved a technical result which shows that the application of ME at the second stage fits in a very natural way with the first stage operator Δ ( K ) . The merging operator Δ ( K )itself has interesting properties which we have not considered here, but which have been analyzed in [17,26,29]. In particular in [17] it is shown that Δ ( K ) satisfies a set of conditions for a merging operator close to those defined by Konieczny and Pino Pérez in [34].
Our work on SEP raises many unsolved problems. The definition of SEP illustrates the information theoretic connections between ME, minimum cross entropy, and LogOp in the context of multi–agent reasoning. However, whether or not one accepts SEP as being the ideal generalisation of ME to a social inference process, there are independent reasons to believe that, if such a correct generalisation F exists, F should marginalise to the LogOp pooling operator. One such reason is the remarkable manner in which ME and LogOp “fit together”, which can be seen by considering results concerning the obdurate social inference process, OSEP, defined by
OSEP[K 1 , , K m ] = LogOp[ME ( K 1 ) , , ME(K m ) ]
Adamčík showed in [26] that this simple amalgam of ME and LogOp satisfies the Strong Independence Principle and Language Invariance, and hence, a fortiori, the Irrelevant Information Principle. The reason why Strong Independence holds is interesting. The Strong Independence property satisfied by ME states that if K and K′ are constraint sets formulated in disjoint propositional languages L and L′ respectively and if KK′denotes the combined constraint sets in the language LL′, then ME(KK′) is just the product of the two probability functions ME(K) and ME(K′). Now Adamčík observed that the product condition
w ( i ) ( α j Λ α j ) = w ( i ) ( α j ) . w ( i ) ( α j )
where αj and α j range over all the respective atoms of the languages L and L′, is actually preserved by the LogOp pooling operation, which suffices to prove the Irrelevant Information result. This independence preservation property may surprise some, owing to a general belief that LogOp does not “preserve independence” because of an old result of Genest and Wagner [35]. However while the result of [35] is technically perfectly correct, that paper misses the interesting logical property of LogOp because the authors did not formulate the independence condition appropriately in terms of belief functions on distinct propositional languages. In fact, from a logician’s viewpoint, it can be argued that there is no good intuitive reason why the more general “formula-wise” notion of independence preservation used in [35] should actually hold. Indeed the fact noted in [35] that the “formula-wise” notion is preserved in the special case of a language with four atoms, which appears to be just an anomalous result in that paper, can easily be seen from a logician’s perspective to be just a special case of Adamčík’s observation above. The situation here is comparable to the controversy over representation independence cited in section 1.1, and again illustrates the value of a foundational approach.
The observations above leave us however with a puzzling situation. We know that there does exist a social inference extending ME which satisfies the Strong Independence Principle: namely OSEP. This fact is significant because the Strong Independence Principle is hard to satisfy even for an inference process. (Indeed ME is the only reasonable inference process known to satisfy it).
Moreover in addition to Strong Independence, OSEP also satisfies six of the eight principles satisfied by SEP as listed at the beginning of section 5.1: Equivalence, Anonymity, Atomic Renaming, General Locality, Proportionality, and Language Invariance20. However OSEP suffers from the usual, and in the opinion of the author, fatal, drawbacks of obdurate inference processes: it satisfies neither Consistency nor Collegiality, nor even the (weaker) Ignorance Principle.
On the other hand, while SEP satisfies most of our desiderata for a social inference principle, it fails to satisfy Strong Independence or the Irrelevant Information Principle, although it does satisfy a rather weak version of the latter (cf. [27] and section 3.3 above). This raises the obvious question as to whether there exists any social inference process F extending ME which satisfies the same desiderata as those which SEP satisfies and which also satisfies the Strong Independence, or even the weaker Irrelevant Information Principle. It seems very likely that if such an F exists it would have to marginalise to LogOp. However a non-existence proof would also be very interesting and would certainly strengthen any claim of SEP to foundational optimality.
At this point we should address another obvious criticism of SEP which derives from the previously lauded fact that it marginalises to the pooling operator LogOp. As a consequence however SEP inevitably inherits any criticisms attached to LogOp. The most obvious criticism of LogOp is its extreme behaviour in the case when any of the agents has belief in some particular atom α close to zero. Indeed an agent Ai with belief w(i)(α) = 0 has dictatorial powers, forcing the social belief function v to give the value zero to α. This is clearly useless from any practical point of view. We now examine briefly how this phenomenon can be explained and perhaps remedied.
Let us recall the Intersubjectivity Assumption that the chairman treats the knowledge base provided by each agent as if it represented intersubjective probabilistic information. This means in effect that the chairman is treating the reported information as if it were intersubjectively trustworthy: in particular the agent is assumed not to be cheating nor to have miscalculated, and any priors which might have been used by her in her calculations are assumed to be hypothetically common to all the agents if they were privy to the same background information. This gives rise to several observations.
The chairman might reason that if pathological priors are ruled out as irrational, and if the number of background observational or mental states of agent is assumed to be finite, then on the basis of the chairman’s Intersubjectivity Assumption, no agent should definitively assign probability zero to an event unless the agent considers that event logically impossible. On the other hand the same assumption implies that if one agent considers an event to be logically impossible, then all the agents should do likewise. Thus if the chairman is going to be able to abide by her Intersubjectivity Assumption it is necessary that for any atom α, either each of the knowledge bases in K separately forces belief in α to be zero, or none of them do so. The chairman might therefore reasonably insist that for any atom α which is not ruled by prior universal agreement to be logically impossible, no agent shall specify for α a definitive value zero in her knowledge base K i . However while the extreme problem caused by zeros might be evaded in this way, the general problem caused by an agent’s specified belief value close to zero still remains.
From the chairman’s highly idealised viewpoint it now seems that the extreme influence over the social belief which SEP or LogOp gives to an agent who has belief close to zero in a particular atom α is, given her normative assumptions, not quite so unreasonable as it at first appeared. The phenomenon arises precisely because SEP treats all knowledge bases at face value: since some knowledge bases may be providing much more information than others to the social belief function, the chairman may obviously be in trouble if she does not actually have full trust in each agent’s input. It is intuitively clear that an agent who ascribes belief 240 to some particular propositional variable p is providing more information to the college than an agent who ascribes belief 1 2 to p, while an agent whose constraint set is empty supplies no information at all. While for normative reasons the chairman has decided to treat each agent’s knowledge base as if it represented intersubjective knowledge in the sense described, nevertheless because she actually recognises that the information of the agent may be untrustworthy, she may still wish to limit the influence of the agent on the social belief function. Her revised attitude could then be summed up as: “I will take the information you supply at face value, but I will attempt to limit your influence on the social belief function in some proportionate manner”.
The reasons for the chairman’s desire to limit the influence of a particular agent may be of two kinds: intrinsic and extrinsic. By an extrinsic reason we mean some extra information about the agent or the nature of her knowledge, or about the nature of any intended application of the social belief function. By an intrinsic reason we mean a natural caution on the part of the chairman to limit the effects on the social belief function of the knowledge bases of particular agents, based solely on the combined properties of the knowledge bases, and independently of any information of an extrinsic nature. Here we shall consider only the problem of an intrinsic analysis of influence limitation.
There are a some obvious ad hoc methods by which the chairman could attempt to limit the influence of individual agents by modifying SEP. For example, given some ϵ > 0 with ϵ < < 1 J let D ϵ denote the convex subset of D J consisting of all w D J for which wj for all j = 1 … J. Then the chairman, having chosen an ϵ, could replace each V K i for which V K i D ϵ = ϕ by some convex V K i * consisting of those points in D ϵ which are “informationally closest” to V K i .(We deliberately leave this imprecise because there are several different plausible interpretations). However, while such ad hoc methods may prevent extreme influence by a single agent in simple situations such as when each agent specifies a single belief function, it is not clear that they will have the desired effect in other situations such as that of the counterexample to Open Mindedness given in section 3.3 above.
The “information” contained in a vector of knowledge bases K is in general extremely complex, indeed far more so than in the case addressed by pooling operators, because the knowledge bases are likely to contain much information about each agent’s ignorance, a situation which one would expect information theoretic methods to be good at dealing with. It appears to the author that what is needed here in order to deal with the intrinsic problem of trust and influence limitation outlined above, is a foundational study from an information theoretic point of view of the intuitive notion of degree of influence of an agent’s knowledge base K i on the social belief function, relative to the vector of knowledge bases K . If such an analysis can be carried out it may suggest naturally justifiable ways in which SEP can be modified in order to limit the influence of an agent to a prescribed degree, which could reflect the degree of trust which we are prepared to place on the information provided by the agent.

5.2. Open Problems

  • Is there a social inference process extending ME which satisfies the principles known to be satisfied by SEP (i.e., Language Invariance and those of Theorem 14), and which also satisfies the Strong Independence Principle, or at least the Irrelevant Information Principle?
  • Is there any mathematical “number of possible states” argument similar in spirit to that of statistical mechanics which can be used to derive SEP or some other social inference process, in a manner analogous to the classical derivation of ME as in [9]?
  • Is there some set of principles which can be used to characterise SEP uniquely in a manner similar to that in which ME is characterised in [2] ?
  • Is it possible to develop an information theoretic theory of influence and of trust along the lines suggested in the previous section, which could be applied to adapt SEP for practical use?
  • Are there algorithms for the calculation of SEP which are of comparable efficiency to those available for ME ?
  • The quantity M K of Definition 8 appears to be a natural measure of the joint consistency of knowledge bases K . What are its properties when it is viewed as such a measure?

Acknowledgments

My thanks are due to Alena Vencovská and Martin Adamčík for many stimulating discussions over the last four years, and for pointing out some errors in earlier versions of the text. Thanks are also due to Jon Williamson for some insightful criticism, and to two anonymous referees for helpful suggestions. I am very grateful to Dugald Macpherson and the School of Mathematics of Leeds University for their collegiality in granting me an academic refuge after my retirement from Manchester University. Finally my gratitude is also due to Jeff Paris for his inspiration and steadfast support at Manchester over an academic lifetime.
  • 1See Paris [3] where representation independence is called Atomicity, and [15,16] where representation independence is discussed at greater length. It should be mentioned that even prior to Paris’s result there were very good reasons to distrust the notion of representation independence as E.T. Jaynes, the longtime champion of ME, frequently pointed out to critics of ME. However mere good reasons are never as effective in silencing critics as a proof that their arguments are incoherent, in this case because the proposed desideratum is vacuous.
  • 2The terminology is due to the logician Georg Kreisel, see e.g. [36], although I do not claim that I use the terminology in the sense that Kreisel intends. Similar ideas can also be found in the work of Lakatos [37].
  • 3We should note that allowing J to take any positive integral value does not in any real sense generalise the Paris–Vencovská framework, but is sometimes a notational convenience when constructing simple examples.
  • 4This characterisation considerably strengthens earlier work of [38].
  • 5See Paris [3] for a general introduction to inference processes, and also Hawes [6], especially the comparative table in Chapter 9, for an excellent résumé of the current state of knowledge concerning this topic. Renyi inference processes are those which maximise one of the family of generalised notions of entropy due to Alfred Renyi (see [6,11,33,39]).
  • 6See [4,5,20,35,4045] for further discussion of the axiomatics of pooling operators.
  • 7This terminology is due to Carnap [46]. The principle is also known as Bernoulli’s Maxim, while in [3] it is called the Watts Assumption.
  • 8For an interesting account of intersubjective probability see Gillies [47].
  • 9An interesting example of a social inference process which is de facto obdurate, but whose initial definition does not appear intentionally reductionist, is that given by Kern-Isberner and Rödder in [23]. In effect the social inference process which they define applies the ME inference process to each Ki and then applies a weighted arithmetic mean pooling operator to the resulting points, where the weights are proportional to the exponential entropies of the respective points. See also Adamčík [26] for an account of which principles are satisfied by this social inference process.
  • 10Technically Williamson is concerned in [28] with the question as to how a single agent should rationally arrive at a unique belief function on the basis of probabilistic evidence which is derived from different “sources” which, taken together, may be inconsistent. Williamson’s proposed solution is to consider the convex hull of the regions of D J defined by considering maximal consistent unions of the sets of constraints corresponding to the individual sources, and then to choose the maximum entropy solution from the resulting convex region. Of course if we treat the different sources of the agent’s evidence as themselves being agents, then this procedure in effect defines a social inference process. See also [48] and [26].
  • 11The word “information” is clearly used here in a different sense from that of Shannon information; it would perhaps be more accurate to say that what is being discarded is in this case is information about the extent of A’s ignorance.
  • 12See also footnote 9.
  • 13We should note that while from our point of view, collegiality appears a very natural principle, this fact depends heavily on our underlying assumptions; from the very different viewpoint of Williamson [48] collegiality is too strong a principle, and it is not even clear if Williamson would accept the ignorance principle.
  • 14We note here that Csiszár in [49], [50], introduces a property which he calls locality, but which corresponds to the relativisation principle of Paris [3] and is much weaker than the notion of locality in the present paper.
  • 15The proof of Theorem 4 is not however germane to understanding the remainder of this paper and may safely be skipped if the reader so wishes.
  • 16See footnote 5. A definition of Renyi processes is given below in the proof of Theorem 4.
  • 17This condition was first formulated by Madansky [51] in 1964 and further analyzed in [52] and [43], where it is shown that the only externally Bayesian pooling operators are closely related to LogOp. See also [4] for related properties of pooling operators.
  • 18An earlier version of this theorem which was stated without proof in [1] contains an error because the statement of the result is incorrect for cases in which 0’s appear in the coordinates.
  • 19The first seven of these were announced, but not proved, in the author’s earlier work [1], while Language Invariance was proved in [27].
  • 20These properties are all noted by Adamčík in [26], with the exception of General Locality and Proportionality. Proportionality holds trivially since it is satisfied by LogOp, while General Locality holds for OSEP because ME satisfies Locality (cf. Theorem 4) and LogOp satisfies General Locality (an immediate corollary of Theorem 14 above).

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Wilmers, G.M. The Social Entropy Process: Axiomatising the Aggregation of Probabilistic Beliefs. In Probability, Uncertainty and Rationality; Hosni, H., Montagna, F., Eds.; CRM series, Edizioni Della Normale; Scuola Normale Superiore: Pisa, Italy, 2010; Volume 10, pp. 87–104. [Google Scholar]
  2. Paris, J.B.; Vencovská, A. A Note on the Inevitability of Maximum Entropy. Int. J. Approximate Reasoning 1990, 4, 183–224. [Google Scholar]
  3. Paris, J.B. The Uncertain Reasoner’s Companion - A Mathematical Perspective; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
  4. Genest, C.; Zidek, J.V. Combining probability distributions: A critique and an annotated bibliography. Stat. Sci. 1986, 1, 114–148. [Google Scholar]
  5. French, S. Group Consensus Probability Distributions: A Critical Survey. In Bayesian Statistics; Bernardo, J. M., De Groot, M.H., Lindley, D.V., Smith, A.F.M., Eds.; North Holland: Amsterdam, The Netherlands, 1985; pp. 183–201. [Google Scholar]
  6. Hawes, P. An Investigation of Properties of Some Inference Processes. PhD Thesis; Manchester University: Manchester, UK, 2007. MIMS eprints, available from http://eprints.ma.man.ac.uk/1304/ accessed on 13 January 2015.
  7. Jaynes, E.T. Where do we Stand on Maximum Entropy. In The Maximum Entropy Formalism; Levine, R.D., Tribus, M., Eds.; 1979; MIT Press: Cambridge, MA, USA. [Google Scholar]
  8. Jaynes, E.T. The Well-Posed Problem. Found. Phys. 1973, 3, 477–493. [Google Scholar]
  9. Paris, J.B.; Vencovská, A. On the Applicability of Maximum Entropy to Inexact Reasoning. Int. J. Approximate Reasoning 1989, 3, 1–34. [Google Scholar]
  10. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  11. Renyi, A. On Measures of Entropy and Information. In Proceedings of the 4th Berkeley Symposium in Mathematical Statistics; University of California Press: Oakland, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
  12. Fadeev, D.K. Zum Begriff der Entropie einer endlichen Wahrscheinlichkeitsschemas. Arbeiten zur Informationstheorie 1957, I, 85–90, Deutscher Verlag der Wissenschaften, Berlin. [Google Scholar]
  13. Paris, J.B. Common Sense and Maximum Entropy. Synthese 1999, 16, 75–93. [Google Scholar]
  14. Chomsky, N. Interviewed by Katz, Y. Noam Chomsky on where Artificial Intelligence Went Wrong. The Atlantic 2012. [Google Scholar]
  15. Paris, J.B. What You See Is What You Get. Entropy 2014, 16, 6186–6194. [Google Scholar]
  16. Paris, J.B.; Vencovská, A. In Defense of the Maximum Entropy Inference Process. Int. J. Approximate Reasoning 1997, 17, 77–103. [Google Scholar]
  17. Adamčík, M.; Wilmers, G.M. Probabilistic Merging Operators. Logique et Analyse 2014, in press. [Google Scholar]
  18. Tversky, A.; Kahnemann, D. Judgement under Uncertainty: Heuristics and Biases. Science 1974, 185, 1124–1131. [Google Scholar]
  19. Levy, W.B.; Delic, H. Maximum entropy aggregation of individual opinions. IEEE Trans. Syst. Man. Cybern. 1994, 24, 606–613. [Google Scholar]
  20. Osherson, D.; Vardi, M. Aggregating Disparate Estimates of Chance. Game Econ. Behav. 2006, 148–173. [Google Scholar]
  21. Kracík, J. On composition of probability density functions. In Multiple Participant Decision Making, In Workshop on Computer-Intensive Methods in Control and Data Processing, Prague, Czech, 12–14 May 2004; pp. 113–121.
  22. Kracík, J. Cooperation Methods in Bayesian Decision Making with Multiple Participants. Ph.D. Thesis, Czech Technical University, Prague, Czech, 2009. [Google Scholar]
  23. Kern-Isberner, G.; Rödder, W. Belief Revision and Information Fusion on Optimum Entropy. Int. J. Intell. Syst. 2004, 19, 837–857. [Google Scholar]
  24. Yue, A.; Liu, W. A Syntax-based Framework for Merging Imprecise Probabilistic Logic Programs. Int. Joint Conf. Artif. Intell. 2009, 1990–1995. [Google Scholar]
  25. Myung, J.; Ramamoorti, S.; Bailey, A.D., Jr. Maximum Entropy Aggregation of Expert Predictions. Manag. Sci. 1996, 42, 1420–1436. [Google Scholar]
  26. Adamčík, M. Collective Reasoning under Uncertainty and Inconsistency. PhD. Thesis, The University of Manchester, Manchester, UK, 2014. [Google Scholar]
  27. Adamčík, M.; Wilmers, G.M. The Irrelevant Information Principle for Collective Probabilistic Reasoning. Kybernetika 2014, 50, 175–188. [Google Scholar]
  28. Williamson, J. Defence of Objective Bayesianism; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
  29. Adamčík, M. The Information Geometry of Bregman Divergences and Some Applications in Multi-Expert Reasoning. Entropy 2014, 16, 6338–6381. [Google Scholar]
  30. Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
  31. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  32. Savage, S.D.; The, Logical. Philosophical Foundations of Social Inference Processes. In MSc Dissertation; University of Manchester: Manchester, UK, 2010. [Google Scholar]
  33. Mohamed, I.A.M. Some Properties of the Class of Renyi Generalized Entropies in the Discrete Case. In MPhil. Thesis; School of Mathematics, Manchester University: Manchester, UK, 1998. [Google Scholar]
  34. Konieczny, S.; Pino Pérez, R. On the Logic of Merging 1998, 488–498.
  35. Genest, C.; Wagner, C.G. Further evidence against independence preservation in expert judgement synthesis. Aequationes Mathematicae 1987, 32, 74–86. [Google Scholar]
  36. Kreisel, G. Church’s Thesis and the Ideal of Informal Rigour. Notre Dame J. Formal Logic 1987, 28, 499–518. [Google Scholar]
  37. Lakatos, I. Proofs and Refutations; Cambridge University Press: Cambridge, UK, 1976. [Google Scholar]
  38. Shore, J.E.; Johnson, R.W. Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy. IEEE Trans. Inform. Theor. 1980, IT-26, 26–37. [Google Scholar]
  39. Renyi, A. Wahrscheinlichkeitsrechnung; Deutscher Verlag der Wissenschaften: Berlin, Germany, 1962. [Google Scholar]
  40. Cooke, R.M. Experts in Uncertainty: Opinion and Subjective Probability; Science, Environmental Ethics and Science Policy Series; Oxford University Press: New York, NY, USA, 1991. [Google Scholar]
  41. Garg, A.; Jayram, T.S.; Vaithyanathan, S.; Zhu, H. Generalized Opinion Pooling, Proceedings of the 8th Intl. Symp. on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida, USA, 4–6 January 2004.
  42. Genest, C. A conflict between two axioms for combining subjective distributions. J. Roy. Stat. Soc. 1984, 46, 403–405. [Google Scholar]
  43. Genest, C.; McConway, K.J.; Schervish, M.J. Characterization of externally Bayesian pooling operators. Ann. Math. Stat. 1986, 14, 487–501. [Google Scholar]
  44. Wagner, C. Aggregating Subjective Probabilities: Some Limitative Theorems. Notre Dame J. Formal Logic 1984, 25, 233–240. [Google Scholar]
  45. Wallsten, T.S.; Budescu, D.V.; Erev, I.; Diederich, A. Evaluating and Combining Subjective Probability Estimates. J. Behav. Decis. Making. 1997, 10, 243–268. [Google Scholar]
  46. Carnap, R. On the application of inductive logic. Philosophy and Phenomenological Research 1947, 8, 133–148. [Google Scholar]
  47. Gillies, D. Philosophical Theories of Probability; Routledge: London, UK, 2000. [Google Scholar]
  48. Williamson, J. Deliberation Judgement and the Nature of Evidence. Economics and Philosophy 2014, in press. [Google Scholar]
  49. Csiszár, I. Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse Problems. Ann. Stat. 1991, 19, 2032–2066. [Google Scholar]
  50. Csiszár, I. Axiomatic Characterisations of Information Measures. Entropy 2008, 10, 261–273. [Google Scholar]
  51. Madansky, A. Externally Bayesian Groups; Technical Report RM-4141-PR; RAND Corporation, 1964. [Google Scholar]
  52. Genest, C. A characterization theorem for externally Bayesian groups. Ann. Stat. 1984, 12, 1100–1105. [Google Scholar]

Share and Cite

MDPI and ACS Style

Wilmers, G. A Foundational Approach to Generalising the Maximum Entropy Inference Process to the Multi-Agent Context. Entropy 2015, 17, 594-645. https://0-doi-org.brum.beds.ac.uk/10.3390/e17020594

AMA Style

Wilmers G. A Foundational Approach to Generalising the Maximum Entropy Inference Process to the Multi-Agent Context. Entropy. 2015; 17(2):594-645. https://0-doi-org.brum.beds.ac.uk/10.3390/e17020594

Chicago/Turabian Style

Wilmers, George. 2015. "A Foundational Approach to Generalising the Maximum Entropy Inference Process to the Multi-Agent Context" Entropy 17, no. 2: 594-645. https://0-doi-org.brum.beds.ac.uk/10.3390/e17020594

Article Metrics

Back to TopTop