Next Article in Journal
Generation of Basis Sets for Accurate Molecular Calculations: Application to Helium Atom and Dimer
Next Article in Special Issue
Cognitive Hybrid Intelligent Diagnostic System: Typical Architecture
Previous Article in Journal
Classifying the Degree of Bark Beetle-Induced Damage on Fir (Abies mariesii) Forests, from UAV-Acquired RGB Images
Previous Article in Special Issue
Modeling the Territorial Structure Dynamics of the Northern Part of the Volga-Akhtuba Floodplain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aggregating Composite Indicators through the Geometric Mean: A Penalization Approach

by
Francesca Mariani
and
Mariateresa Ciommi
*,†
Department of Economics and Social Sciences, Università Politecnica delle Marche, 60121 Ancona, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 28 February 2022 / Revised: 7 April 2022 / Accepted: 13 April 2022 / Published: 18 April 2022
(This article belongs to the Special Issue Control Systems, Mathematical Modeling and Automation)

Abstract

:
In this paper, we introduce a penalized version of the geometric mean. In analogy with the Mazziotta Pareto Index, this composite indicator is derived as a product between the geometric mean and a penalization term to account for the unbalance among indicators. The unbalance is measured in terms of the (horizontal) variability of the normalized indicators opportunely scaled and transformed via the Box–Cox function of order zero. The penalized geometric mean is used to compute the penalized Human Development Index (HDI), and a comparison with the geometric mean approach is presented. Data come from the Human Development Data Center for 2019 and refer to the classical three dimensions of HDI. The results show that the new method does not upset the original ranking produced by the HDI but it impacts more on countries with poor performances. The paper has the merit of proposing a new reading of the Mazziotta Pareto Index in terms of the reliability of the arithmetic mean as well as of generalizing this reading to the geometric mean approach.

1. Introduction

In recent years, there is an increasing interest in well-being measurement through composite indicators which are obtained combining individual indicators into a single index, based on an underlying model of the multidimensional concept that is being measured [1]. In the debate about composite indicators, the choice of the aggregation method is the core issue. Indeed, each aggregation method has a corresponding aggregation function, namely the transformation of the indicators used to obtain the composite indicator. Usually, there are two criteria used for choosing among different aggregation functions. The first is related to the importance of the single indicator and the second addresses the issue of compensability or substitutability among indicators. The importance of an indicator is measured by the marginal contribution of the indicator computed as the partial derivative of the aggregation function with respect to the indicator, when the aggregation function is differentiable. The compensability refers to the possibility of offsetting the low value of an indicator with a high value of another indicator.
This paper addresses the problem of indicator aggregation through a non-compensative approach by means of a penalization factor that captures the unbalance among indicators. The aim of the paper is twofold. Firstly, in line with the interpretation of arithmetic mean as a least-squares estimate for the data values transformed by the Box–Cox function of order one as in Berger and Casella [2], we propose a new reading of the so-called Mazziotta and Pareto Index (hereafter, MPI) [3] in terms of the Box–Cox transformation of order one. Secondly, we propose a generalization of the formula obtained for the MPI to obtain a penalized version of the geometric mean. Specifically, the penalized geometric mean is obtained as the product between the geometric mean and a penalization factor that depends on the (horizontal) variance of the normalized indicators opportunely scaled and transformed via the Box–Cox function of order zero. Roughly speaking, the penalization factor is a correction term to the geometric mean, which accounts for the unbalance among indicators. In fact, under the Box Cox transformation, the penalization factor is the variance of the scaled normalized indicators obtained by dividing the normalized indicators by their geometric mean. This is the reason why the penalization factor can be interpreted as a reliability measure for the geometric mean. Hence, the higher the unbalance among indicators, the higher the penalization.
The method presented here to penalize the geometric mean can be easily applied to every generalized means considering the appropriate Box–Cox transformation. In this paper, we make a first attempt to bridge two strands of the scientific literature concerning the use of generalized means as aggregation operators, that is: the composite indicator construction and the information theory [4].
To illustrate the appealing of our proposal, we focus on the construction of the Human Development Index (HDI). The index emerged in the first Human Development Report (HDR), which was published by the United Nations Development Programe in 1990. (All editions are available at: https://hdr.undp.org/en/global-reports, accessed on 28 February 2022.) The index is computed annually for more than 170 countries by combining three dimensions: longevity, knowledge, and access to resources. The longevity aspect is captured by life expectancy at birth. The knowledge pillar is represented by a measure of educational achievement, which is measured as a weighted sum of expected year of schooling (adult literacy) and of mean year of schooling. Finally, the resource dimension is represented by an adjusted real purchasing power parity: GDP per capita.
The paper is organized as follows: in Section 2, we discuss the problem of compensation and balance among indicators and we review the relevant literature. In Section 3, we propose a new reading for the Mazziotta Pareto Index. In Section 4, we introduce the penalized geometric mean and we present the theoretical framework. In Section 5, we apply the penalized geometric mean to compute the “penalized” Human Development Index (pHDI) and we compare HDI with pHDI in terms of the corresponding rankings. Section 6 concludes and suggests possible extensions. Proofs of some propositions of Section 4 are collected in the Appendix A.

2. Literature Review

2.1. Compensability and Balance among Indicator

The simplest aggregation approach makes use of the arithmetic mean. Despite its ease of interpretation, the arithmetic mean suffers from two main drawbacks: the fully compensability or perfect substitutability and the (possible low) reliability. The perfect substitutability allows for unbalances among indicators without considering any penalization for the unbalance. This means that different distributions of indicator values can yield similar or equal value of the composite indicator. Therefore, the unbalance gives rise to a loss of information about the multidimensional nature of the phenomenon under investigation. In this sense, the fully compensability and the unbalance have negative effects on the composite indicator and, for this reason, deserve consideration. On the other hand, it should be recalled that the arithmetic mean is a central tendency measure whose reliability depends strongly on the dispersion of the data around it. Usually, when the sample size is large enough, the dispersion is measured in terms of the standard deviation of data from the mean. The smaller/larger the standard deviation, the larger/smaller the mean value reliability. In the context of composite indicators, the dispersion to be considered is the horizontal dispersion—that is, the dispersion measured across indicators—to distinguish from the vertical dispersion—that is, the dispersion across units.
The next paragraphs will be devoted to review the main aspects of compensability and balance among indicators and to provide a summary of the methods used for computing HDI.
When the computation of a composite indicator is based on a linear (weighted) aggregation rule, full compensability among the individual indicators is always assumed. This implies that there is substitutability among the different aspects and, consequently, poor performance in one aspect can be completely compensated by surplus in another one. However, for a hypothetical well-being indicator, could life expectancy be compensated by income? Thus, a complete compensability among indicators is often not desirable.
The literature accounts for several non-compensative approaches. For instance, to rank countries, when we use as a composite index the minimum value among the (normalized) indicators, any improvements in the other indicators cannot modify the value of the final index. However, this approach is not without drawbacks, since it does not consider the unbalance among indicators. Moreover, two countries with very different profiles but with a common minimum will display the same value of the index and ranked at the same position.
Among the non-compensative approaches, the family of aggregation functions based on generalized means of power α , α R , plays a crucial role. For example, the geometric mean ( α = 0 ) is used to compute the Human Development Index [5], whereas the Human Poverty Index for developing countries (HPI-1) computed by UNDP [6] is obtained using the generalized mean of power α = 3 .
The non-compensative aggregation methods, as like the generalized means, overcome only partially the drawbacks of the arithmetic mean approach, penalizing the unbalance among indicators. Specifically, in the geometric mean approach, the marginal increase in the value of an indicator is much higher when the absolute value of the indicator is low. In this way, the performance of the high indicators is penalized, and the improvements in the weak indicators are encouraged. However, it should be noted that the penalization introduced by the geometric mean considers only partially the unbalance issue. The following example could clarify the point. Let us consider a composite indicator obtained as the geometric mean of two indicators whose normalized values range between e 0 and e 1 . The normalized values z 1 = e 0 and z 2 = e 1 yield the same composite indicator value e 0.5 that would be obtained with z 1 = z 2 = e 0.5 . Therefore, despite its non-compensative nature, the geometric mean approach attributes the same value of composite indicator to two pairs of indicator values that have very different distributions and horizontal dispersions.
In particular, we note that the use of the geometric mean is not decisive to fully balance the contributions of the single indicators. The reason for this weakness is that the geometric mean is strongly related to the arithmetic mean via Box–Cox transformation of order zero [7]. In fact, if we take the arithmetic mean of the logarithm of the indicator values, we get the same as if we take the logarithm of their geometric mean. Thereby, in the transformed space, the geometric mean suffers from the same drawbacks of the arithmetic mean.
A first attempt to consider the unbalance among indicators is made by Casadio Tarabusi and Palazzi [8] that uses the concave average approach to build the aggregation function of the Sustainability Development Index. This function is obtained as the weighted average of a strictly concave parametric transformation of the normalized indicators, whose parameters adjust the intensity of the unbalance penalization. More recently, Casadio Tarabusi and Guarini [9] combine non-linearly the weighted arithmetic mean and the Min function in a parametric aggregation function, called Mean-Min function, with the purpose of mediating the minimum penalization (represented by the arithmetic mean) and the maximum penalization (represented by the Min function). Moreover, the authors introduce a measure of compensation between a couple of indicators, called Marginal Rate of Compensation (MRC), in order to evaluate the proportion of marginal increase (decrease) of an indicator compensated by a marginal decrease (increase) of another indicator (keeping the remaining variables unchanged).
In the scientific literature, many other non-compensatory aggregation methods are suggested. According to El Gibari et al. [10], the construction of a composite indicator involves multi-criteria decision-making (MCDM) theory. They classify the different MCDM methods in five different categories: (i) elementary methods; (ii) value and utility-based methods; (iii) data envelopment analysis-based methods; (iv) distance functions-based methods; and (v) the outranking relation approach. In the first category, the Simple Additive Weighting (SAW) and the Weighted Product (WP) imply a total compensation in the case of the SAW method and a partial compensation in the case of the WP method. According to Lai et al. [11], the main advantage of those methods is that they can reduce complex problems by simple conditions. The second category refers to methods that assign a real number to each alternative and determine a preference order for the alternatives based on decision-makers’ value judgements [12]. The Data Envelopment Analysis (DEA)-based methods use a linear programming as instrumental variable to evaluate the efficiency of a set of comparable units. DEA allows for full compensation among the criteria [13]. The distance function method requires the definition of a reference point and a function that measures the deviation between the values of each indicator and their corresponding reference levels [14]. Finally, the outranking relation approach demands a comparison between pairs of options to compare alternatives, in order to state if a given alternative is at least as good as another one. According to El Gibari et al. [10], the two most used methods in this last category are the ELECTRE (Elimination and Choice Expressing Reality [15]) and PROMETHEE (Preference Ranking Organization Method for Enrichment Evaluations [16]). Greco et al. [17] identify these two methods as the two main types of non-compensatory aggregation techniques.
The literature includes also the so-called Mixed Strategies. The name derives from the fact that it cannot fit into one category or another, since they use a combination of different approaches for solving the aforementioned issues [17]. This is the case of the Mazziotta Pareto method [18].
The Mazziotta Pareto Index (MPI) derives from the Method of Penalties by Coefficient of Variation proposed by the same authors to measure the health infrastructure endowment under the assumption of non-substitutability of the indicators. The MPI uses as aggregation function the arithmetic mean adjusted by a penalization coefficient that accounts for the (horizontal) variability of (opportunely standardized) indicators in relation to the mean, with the purpose of penalizing the unbalanced distribution of the indicators.
A newer variant of the index allows for comparisons over time [19]. The Adjusted MPI (AMPI) method uses a different normalization procedure from a modified z -score to a rescaling method. In addition, it allows for choosing reference points, the so-called goalposts, e.g., to fix to 100 the average in a given year to facilitate the interpretation of results.

2.2. The Human Development Index

To illustrate the proposed method, we focus on one of the most famous composite indicators defined by means of a geometric mean, namely the Human Development Index, and we compute its corresponding penalized version.
In the original formulation, the HDI was computed as the arithmetic mean of the three dimensions: life expectancy, education, and GDP per capita. In 2010, the arithmetic mean has been replaced by the geometric mean [5]. There are at least two advantages in the use of the geometric mean instead of the arithmetic mean. Firstly, the geometric mean reduces the level of substitutability between dimensions. Secondly, the geometric mean ensures that a 1% decline in one dimension has the same impact on the aggregate index as a 1% decline in another dimension. Over the years, many modifications of the geometric mean approach have been proposed. For instance, Noorbakhsh [20] proposes to modify the normalization process. Paul [21] suggests to overcome what he defines as the problem of underestimation of achievement at the higher level by assigning higher weights to each of the physical indicators at the margin. Others scholars have proposed to modify the indicators; for instance, Jha et al. [22] propose to modify the health dimension to account for a morbidity situation, and Prados de la Escosura [23] suggests to modify the non-income dimensions by applying a convex achievement function. The HDI has also been investigated from a theoretical point of view. For instance, Chakravarty [24] axiomatically characterizes a family of measures of achievement that reduces to HDI as a special case. Finally, Alkire and Forster [25] propose a multiplicative modification of the HDI that accounts for the inequality by introducing a multiplicative inequality measure based on the Atkinson inequality family.

3. A New Reading of the Mazziotta Pareto Index

According to the Method of Penalties by Coefficient of Variation [18,26] denoting by x i j the value of the indicator j relative to the i-th unit, i = 1 , 2 , , n , j = 1 , 2 , , m , the normalized value of the indicator j relative to the i-th unit is obtained standardizing x i j to have a mean of 100 and standard deviation of 10, as follows:
z i j = 100 ± x i j M x j S x j 10 , i = 1 , 2 , , n , j = 1 , 2 , , m ,
where M x j and S x j are, respectively, the mean and the standard deviation of the j-th indicator. Starting from the normalized indicators (1), the MPI relative to the i-th unit is defined as:
M P I i ± = μ i 1 ± S i 2 μ i 2 , i = 1 , 2 , , n ,
where
μ i = 1 m j = 1 m z i j , i = 1 , 2 , , n ,
is the arithmetic mean of the normalized indicators for unit i, and
S i 2 = 1 m j = 1 m ( z i j μ i ) 2 , i = 1 , 2 , , n ,
is the (biased) sample variance of the normalized indicators z i j for unit i , i = 1 , 2 , , n . The term 1 ± S i 2 / μ i in (2) penalizes the arithmetic mean μ i to account for the (horizontal) variability of the normalized indicators. The addition of this penalization has two main effects: firstly, it makes the MPI not fully compensable; secondly, it discriminates between units with the same arithmetic mean using a criterion that deals with the reliability of the arithmetic mean itself. In fact, the MPI penalizes more the units with larger (horizontal) variability and, as a consequence, with smaller arithmetic mean reliability. We recall that the ± sign in (2) depends on the type of phenomenon considered; if increasing variations of the indicator correspond to positive variations of the phenomenon (positive polarity), we choose the sign , otherwise (negative polarity), we choose the sign + .
We propose a new reading of the Mazziotta Pareto Index, with the twofold purpose of interpreting the penalization term in (2) as a measure of the error committed using the arithmetic mean instead of the normalized indicators, and of extending this idea to generalized means. In this paper, we focus on the geometric mean, but the generalization proposed here can be applied to all the other generalized means and it will be surely an object of further research.
In order to understand the new reading of the Mazziotta Pareto Index, it is necessary to introduce the interpretation of the arithmetic mean proposed by Berger and Casella [2], according to which the generalized means can be derived as least square estimates of data transformed via a Box–Cox function [7] Specifically, according to Berger and Casella, the arithmetic mean μ i in (14) can be read as the solution of the following optimization problem:
min a R F ( a ) ,
where
F ( a ) = 1 m j = 1 m ( h 1 ( z i j ) h 1 ( a ) ) 2 , a R ,
and h 1 ( x ) is the Box–Cox transformation of order one defined as:
h 1 ( x ) = x 1 , x R .
That is, μ i , as a solution of (5), is the preimage under h 1 of the least squares estimate of the normalized indicators transformed via the function h 1 , i.e.,
μ i = h 1 1 1 m j = 1 m h 1 ( z i j ) , i = 1 , 2 , , n .
Under the Box–Cox transformation h 1 , the error made approximating the normalized indicators relative to the i -th unit with μ i is the value of function (6) at the optimum μ i , and it coincides with the (biased) sample variance of z i j , j = 1 , 2 , , m , i.e.,
F ( μ i ) = 1 m j = 1 m ( h 1 ( z i j ) h 1 ( μ i ) ) 2 = 1 m j = 1 m ( z i j μ i ) 2 = S i 2 , i = 1 , 2 , , n .
The reliability of μ i depends on the size of (9). It should be pointed out that every unit i , i = 1 , 2 , , n , has a corresponding error S i 2 whose size depends strongly on μ i . Thereby, the errors relative to units with different means are not comparable.
In order to overcome this difficulty, we consider the scaled normalized indicators obtained dividing the normalized indicators relative to the i-th unit by the corresponding arithmetic mean μ i , i.e.,
z ˜ i j = z i j μ i , i = 1 , 2 , , n .
It is simple to see that the arithmetic mean, μ ˜ i , of the scaled values z ˜ i j , j = 1 , 2 , , m , is equal to one for every unit i , i = 1 , 2 , , n . Under the Box–Cox transformation h 1 , the error made approximating the scaled normalized indicators z ˜ i j relative to the i-th unit with μ ˜ i = 1 is:
S ˜ i 2 = 1 m j = 1 m h 1 ( z ˜ i j ) 2 = 1 m j = 1 m h 1 z i j μ i 2 , i = 1 , 2 , , n .
The error in (11) coincides with the (biased) sample variance of the scaled normalized indicators z ˜ i j transformed via the function h 1 . Note that (11) is independent from the size of μ ˜ i (that is the same for each unit) and allows for a comparison between units with different means. The higher the value of S ˜ i 2 , the higher the loss of information caused by considering μ i instead of the normalized indicators z i j , no matter the value of μ i .
Note that:
S ˜ i 2 = S i 2 μ i 2 , i = 1 , 2 , , n ,
is the squared coefficient of variation of z i j , j = 1 , 2 , , m . Moreover, the MPI relative to the i-th unit defined in (2) can be rewritten as follows:
M P I i ± = μ i h 1 1 ( ± S ˜ i 2 ) , i = 1 , 2 , , n .
Equation (13) proposes a new reading of the penalization term appearing in the MPI Formula (2), according to which the penalization term is nothing more than the preimage under h 1 of ± S ˜ i 2 . The larger the error S ˜ i 2 (and, as a consequence, smaller the reliability of μ i ), the smaller (in the case of positive polarity) or larger (in the case of negative polarity) the value of MPI. Therefore, the idea is to discriminate between units with the same arithmetic mean but different arithmetic mean reliability, attributing smaller (in the case of positive polarity) or larger (in the case of negative polarity) MPI value to the units for which the arithmetic mean is less reliable.

4. The Penalized Geometric Mean

According to the interpretation of Berger and Casella illustrated in the previous section, the composite indicator relative to the i -th unit given by the geometric mean of the normalized indicators z i j :
μ 0 , i = j = 1 m z i j 1 m , i = 1 , 2 , , n ,
can be expressed as a preimage of the least squares estimate of the normalized indicators transformed via the Box–Cox function of order zero, h 0 , as follows:
μ 0 , i = h 0 1 1 m j = 1 m h 0 ( z i j ) , i = 1 , 2 , , n ,
where h 0 is defined as:
h 0 ( x ) = ln x , x R + .
In analogy with the interpretation of MPI in Section 3, we define the penalized geometric mean as follows:
G M i ± = μ 0 , i h 0 1 ± S ˜ 0 , i 2 = μ 0 , i exp { ± S ˜ 0 , i 2 } , i = 1 , 2 , , n ,
where
S ˜ 0 , i 2 = 1 m j = 1 m h 0 z i j μ 0 , i 2 = 1 m j = 1 m ln z i j ln μ 0 , i 2 , i = 1 , 2 , , n ,
is the (biased) sample variance of the scaled normalized indicators z ^ i j = z i j / μ 0 , i , j = 1 , 2 , , m , transformed via the function h 0 .
The penalized geometric mean (17) is obtained multiplying the geometric mean μ 0 , i with the penalization factor h 0 1 ( ± S ˜ 0 , i 2 ) with the purpose of discriminating between units with same geometric mean but different geometric mean reliability, attributing smaller (in the case of positive polarity) or larger (in the case of negative polarity) value to the units for which the geometric mean is less reliable. Analogously to the MPI, the reliability of the geometric mean μ 0 , i is measured in terms of the reliability of the arithmetic mean of h 0 ( z ^ i j ) , j = 1 , 2 , , m .
Proposition 1.
The penalized geometric mean (17) satisfies the following properties:
1.
G M i + μ 0 , i G M i .
2.
G M i + = G M i = μ 0 , i , i f a n d o n l y i f S ˜ 0 , i = 0 .
3.
G M i + = G M i exp 2 S ˜ 0 , i 2 .
4.
Given two units k and h ( k h ) with μ p , k = μ p , h , we have:
G M k > G M h i f f S ˜ 0 , h 2 > S ˜ 0 , k 2 , G M k + > G M h + i f f S ˜ 0 , k 2 > S ˜ 0 , h 2 .
5.
Given two units k and h ( k h ) we have:
G M k > G M h i f f μ 0 , k > μ 0 , h exp S ˜ 0 , k 2 S ˜ 0 , h 2 , G M k + > G M h + i f f μ 0 , k > μ 0 , h exp S ˜ 0 , h 2 S ˜ 0 , k 2 .
Proof. 
The proof follows easily from (8). □
The following propositions list some properties of G M i ± as a function of z i k , i = 1 , 2 , , n and k = 1 , 2 , , m .
Proposition 2.
For i = 1 , 2 , , n and k = 1 , 2 , , m the penalized geometric mean (17) satisfies the following properties:
1.
G M i z i k 0 for z i k z i and has a local maximum at the point z i ,
2.
G M i + z i k 0 for z i k z i + and has a local minimum at the point z i + ,
where z i ± = μ 0 , i exp m 2 ( m 1 ) .
Proof. 
See Appendix A.  □
Recalling that the geometric mean μ 0 , i is an increasing function of z i k , i = 1 , 2 , , n , k = 1 , 2 , , m , for any values of z i k 0 , Proposition 2 establishes that for increasing z i k , the penalization term exp { S ˜ 0 , i 2 } beyond the threshold value z i has a negative effect on the penalized geometric mean greater than the growth of the geometric mean. Analogously, for decreasing z i k , the penalization term exp { S ˜ 0 , i 2 } beyond the threshold value z i + has a positive effect on the penalized geometric mean greater than the reduction of the geometric mean. Note that when z i 1 the penalized geometric mean, G M i is an increasing function of z i k for any k = 1 , 2 , , m .
Thus, following Casadio Tarabusi and Guarini [9], for the penalized geometric mean, we compute the Marginal Rate of Compensation (MRC).
Proposition 3.
The MRC of the penalized geometric mean (17) between variables z i k and z i h is given by:
M R C k h , i ± = z i h z i k m ± 2 ( m 1 ) ln z i k μ 0 , i m ± 2 ( m 1 ) ln z i h μ 0 , i .
Proof. 
See Appendix A. □
The MRC between variables z i k , z i h represents the proportion of the marginal increase (decrease) of z i k compensated by the marginal decrease (increase) of z i h ceteris paribus. We can note that, similarly to the geometric mean, when z i k z i for the penalized geometric mean G M i the decrease (increase) of z i h required to compensate for an increase (decrease) of z i k is larger the smaller the value of z i k . Moreover, as z i k approaches zero, M R C k h , i + and M R C k h , i degenerate, respectively, to + and to .

5. Empirical Findings

For the purpose of illustrating the effect of introducing the penalization factor in the geometric mean, in what follows, we apply the penalized geometric mean approach (17) to compute a penalized version of the United Nations’ Human Development Index (HDI) for the year 2019. The HDI is a composite indicator obtained aggregating by means of a geometric mean the life expectancy, education, and per capita income indicators.
Thus, we compare the classical HDI and its penalized version, which we call pHDI. The pHDI is obtained from HDI using (17). It should be noted that since the HDI has positive polarity, we use the penalized geometric mean G M .
Below, we briefly summarize the procedure used to construct the HDI, moving from the normalization step to the computation of the geometric mean. Then, we compute the penalization factor as defined in (17) and, consequently, the pHDI.
The data used in our analysis come from the United Nations Development Programme (UNDP) dataset (http://hdr.undp.org/en/data, accessed on 1 February 2021). Data refer to 2019 and cover 189 countries around the world.

5.1. The HDI: A Brief Introduction and Its Computation

Among the huge list of composite indicators, the HDI is one of the most-known indicators. Since 1990, it is annually computed for almost all countries around the world, and it is defined as a combination of three dimensions: namely, Health, Education and Economic dimensions. Before 2010, the HDI was computed as the arithmetic mean of the three dimensions. From 2010, in order to overcome some drawbacks of the arithmetic mean approach, as like the perfect substitutability of the indicators and the dependence from the reference value used for normalizing the indicators, the HDI is computed aggregating the three dimensions through the geometric mean. This new approach has the advantage of providing rankings invariant to the normalization reference and is only partially affected by the substitutability of the indicators as well as of preserving the ease of computation.
To define HDI, four variables, belonging to three dimensions, are used. Specifically, the Health dimension is captured by the Life Expectancy at birth (LE), the Education dimension is given by the Education Indicator (EI), computed as arithmetic mean between the Mean Years of Schooling (MYS) and the Expected Years of Schooling (EYS), and, finally, the Income Indicator (II), computed in terms of the GNI per capita (PPP ( purchasing power parity) international dollars), represents the Economic dimension (see Table 1).
The first step to perform to compute the HDI consists in normalizing the variables in order to obtain indicators in the same range, that is [ 0 , 1 ] . Following the classical approach, all the variables are normalized according to a sort of max-min method. In particular, denoting by x i , L E , x i , E Y S , and x i , M Y S , respectively, the values of the LE, EYS, and MYS variables for the i-th unit, we normalize the LE to obtain the LEI indicator that represents the Health domain as follows:
L E I i = x i , L E 20 85 20 , i = 1 , 2 , , 189 .
In this way, for any i = 1 , 2 , , 189 , L E I i ranges in [ 0 , 1 ] ; it is equal to 1 when Life Expectancy at birth is 85 years, and it is equal to 0 when Life Expectancy at birth is 20 years.
Analogously, for the Education domain, we normalize the two indicators. We denote by nEYS and nMYS the normalized EYS and MYS, respectively, which were obtained according to the following formula relative to the i-th unit:
n E Y S i = x i , E Y S 0 18 0 , i = 1 , 2 , , 189 ,
and
n M Y S i = x i , M Y S 0 15 0 , i = 1 , 2 , , 189 .
For the M Y S and E Y S variables, the minimum and maximum values are, respectively, 0 and 15 years and 0 and 18 years, since there is consensus that school-age children are expected to go for at least 15 years, and the expected amount of schooling years is 18. In this way, 15 mean years of schooling equals one, and 18 years of expected schooling equals one. Countries having E Y S values greater than 18 are arbitrarily set equal to 18. (In the dataset, only 10 countries have such a value, namely Australia ( 22 ) , Belgium ( 19.8 ) , Denmark ( 18.9 ) , Finland ( 19.4 ) , Iceland ( 19.1 ) , Ireland ( 18.7 ) , Netherlands ( 18.5 ) , New Zealand ( 18.8 ) , Norway ( 18.1 ) and Sweden ( 19.5 ) . )
The Education dimension, EI, is computed as the arithmetic mean between the normalized MYS and EYS variables as follows:
E I i = n M Y S i + n E Y S i 2 , i = 1 , 2 , , 189 .
Finally, the I I is computed by using the logarithm of the GNI per capita, that is:
I I i = ln ( G N I p c i ) ln ( 100 ) ln ( 75 , 000 ) ln ( 100 ) , i = 1 , 2 , , 189 .
II is equal to 1 when the GN per capita is $75,000, and it is equal to 0 when the GNI per capita is $ 100 . As for E Y S , countries having G N I p c values greater than $75,000 are arbitrarily set equal to this maximum value (In our dataset; only three countries have G N I p c values greater than $75,000: namely Liechtenstein ($131,032), Qatar ($92,418), and Singapore ($88,155).).
Table 2 reports descriptive statistics for the four variables L E , E Y S , M Y S , G N I p c .
All the original indicators, except for the income variable ( G N I p c ), have a negative but near to zero skewness, meaning that for L E I , E Y S and M Y S , the distribution is not far from a normal distribution.
Figure 1 displays the three indicators ( L E I , E I and I I ) after the normalization process and the aggregation for the education domain. The three normalized indicators exhibit a different distribution. For instance, for the Health dimension ( L E I ), fixing the minimum at 20 years produces a new standardized distribution with 0.51 as the minimum value, in contrast with the Education dimension ( E I ) that achieves 0.25 .
Table 3 reports descriptive statistics for the three normalized indicators.
Although they have different distributions, the three normalized indicators have high levels of correlation: 0.8176 for the Health and Education indicators, 0.8412 for the Economic and Health indicators, and 0.8653 for the Education and Economic indicators.
The HDI for the i-th country is computed as the geometric mean of the three normalized indicators L E I , E I and I I as follows:
H D I i = L E I i · E I i · I I i 3 , i = 1 , 2 , , 189 .
As discussed in the previous sections, the main advantage of the geometric mean is that a poor performance in any dimension is directly reflected in the final value of the indicator. In other words, a low achievement in one dimension is not linearly compensated for by a higher achievement in another dimension. In this way, the level of substitutability between dimensions is reduced and, more importantly, a 1% decline in the indicator of one dimension has the same impact on the HDI as a 1% decline in another one; that is, life expectancy, education and income have the same importance.

5.2. The Computation of the pHDI

Before computing the pHDI through the penalized geometric mean in (17), we check the assumption of Proposition 2. For each country i, we compute its own threshold value, defined as z i = μ 0 , i exp m 2 ( m 1 ) , i = 1 , 2 , , 189 . Since here, we have m = 3 indicators, the threshold value is z i = μ 0 , i e 3 4 , i = 1 , 2 , , 189 . The threshold values associated to the countries range in [ 0.84 , 2.03 ] , with a mean value of 1.53 and a standard deviation of 0.32 .
Thus, we look for the countries whose threshold value is lower than 1. Only 11 countries have the threshold value lower than 1; these are: Burkina Faso (27), Burundi (28), Central African Republic (33), Chad (34), Eritrea (55), Mali (107), Mozambique (118), Niger (125), Sierra Leone (153), South Sudan (159), and Yemen (187) (in brackets, the ranking according to the geometric mean). Thus, for those countries, we check if all the values of the indicators constituting the pHDI are less than the threshold value. None of the countries suffers from this limitation.
The computation of the pHDI, based on Equation (17), requires the determination, for each country of the penalization factor exp { S ˜ 0 , i 2 } , i = 1 , 2 , , 189 . The penalization factors associated to the countries range from 0.8500307 (for Eritrea) to 0.9999778 (for Kazakhstan), with a mean value of 0.9807745 . As displayed in Figure 2, the distribution of the penalization factor is not symmetric, and 87.83 % of its values fall in the range [ 0.95 , 1 ] .

5.3. A Comparison between the Two Approaches

Table 4 reports descriptive statistics for the geometric mean and the penalized geometric mean approaches. The HDI and the pHDI display similar mean values ( 0.72 and 0.71 , respectively) and the same maximum value ( 0.96 ) even if the penalized HDI, pHDI, has a wider range than HDI (the minimum value for pHDI is 0.34 compared with 0.39 for HDI).
To better compare the values of the two composite indicators, we analyze their distributions (see Figure 3 and Figure 4).
Descriptive statistics, as well as distribution comparison, show a similarity in the distribution of HDI and pHDI, meaning that the addition of the penalization factor in the HDI does not change the distributional feautures. However, since there are (even if small) differences in the range of the two indicators, we compare the corresponding rankings. Firstly, we compute the Spearman’s rank correlation coefficient, which is a simple non-parametric measure of rank correlation, and we test it (Spearman’s rank correlation test). We find a correlation value of ρ = 0.9984554 , and the test suggests that ρ is not equal to 0 with a p-value < 2.2 × 10 16 .
Then, we compute differences in the ranking position obtained using HDI and pHDI. A positive value of this difference means that country occupies a position according to the HDI better than the position occupied according to pHDI. What emerges is that 42 countries, i.e., 22.22 % of the sample, do not change their ranking positions. The percentage of countries that are better ranked by the HDI with respect to its penalized version is 43.39 % and, consequently, 34.39 % of countries display an opposite behavior. On average, the absolute value of the difference between ranking positions is about t w o positions.
Table 5 reports the ranking positions of the countries with higher (positive and negative) differences between the two methods.
To investigate in depth the size of the penalization factor, we plot its values as a function of the HDI ranking position (see Figure 5). The plot reveals that the size of the penalization is higher for high ranked countries; this means that the greater the HDI ranking position of the country, the greater the penalization suffered using pHDI instead of HDI.
The greater impact, which corresponds to the smaller values for penalization factor, is attributed to Eritrea, that in the original ranking of HDI occupies the position 180, whereas the smaller impact is achieved by Kazakhstan with a penalization factor of 0.999977836 (ranked 51 in the HDI ranking) followed by Germany with a penalization factor of 0.999975672 (ranked 6 in the HDI ranking).
Finally, in order to investigate the role of the aggregation method used, we study the relationship between HDI and pHDI using as benchmarks two other versions of the HDI obtained aggregating the Health, Education, and the Income indicators with the generalized mean of order p = 1 (the harmonic mean) and p = 1 (the arithmetic mean). We have chosen the harmonic mean because this approach, as with the pHDI, introduces a downward penalization for unbalanced indicator values. Specifically, in Figure 6, we plot the values of three versions of the HDI obtained with the harmonic mean ( p = 1 , in green), the arithmetic mean ( p = 1 , in black) and the penalized geometric mean (in red) versus the values of the HDI obtained with the geometric mean. In Figure 7, we plot the ranking positions obtained with the harmonic mean version of the HDI ( p = 1 , in green), with the arithmetic mean version of the HDI ( p = 1 , in black) and with the pHDI (in red). In both the figures, to highlight the difference of the three HDI versions with respect to the geometric mean version, we plot the strength line of equality.
Figure 6 shows that the penalization introduced in the pHDI gives values of the composite indicator that are less than those produced by the harmonic mean approach, especially for the countries with lower values of HDI. Contrarily, the countries with larger values of HDI have similar values for the other versions of the HDI. This behavior is confirmed by Figure 7, where we can see that the HDI top-ranked countries are closer to the line of equality, meaning that the top ranked countries are less influenced by the choice of the aggregation method. All these findings reveal that, differently from those at the bottom and in the middle, the countries on the top of the HDI ranking have highly balanced values of the indicators constituting the HDI; moreover, the pHDI penalizes more than the harmonic mean approach.

6. Conclusions

A composite indicator is a mathematical combination of a set of indicators. The crucial aspect is the choice of the aggregation function to use. There is not universal agreement on the aggregation method to use even if the criterion that must guide the choice is the ease of computation.
The simplest aggregation function is the arithmetic mean. Despite its ease of computation, it has the major drawback due to substitutability between indicators. A possible way to overcome this limitation consists of introducing, for each unit, a penalization factor that accounts for the horizontal variability among indicators. This is the idea behind the Mazziotta and Pareto aggregation method. Keeping this idea in mind, in this paper, we propose a theoretical method for penalizing the geometric mean, by means of a penalization term that measures the horizontal variability among the normalized indicators opportunely transformed applying the Box–Cox function of order zero. The introduction of a penalization allows capturing the unequal distribution of achievements within the country and, consequently, gives a more accurate picture of the differences among countries.
The empirical part highlights our proposal by comparing the classical HDI with its penalized version, namely the pHDI, for 189 countries in 2019. The comparison between the two methods reveals that the new method does not upset the ranking provided by the geometric mean and impacts more for countries with poor performances. The method proposed here to penalize the geometric mean could be generalized for any member of the family of generalized means. This awaits further research.

Author Contributions

Conceptualization, F.M. and M.C.; Data curation, F.M. and M.C.; Methodology, F.M. and M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we prove Propositions 2 and 3 of Section 4.
Proof of Proposition 2.
The first-order derivative of G M i ± with respect to z i k is:
G M i ± z i k = G M i ± ln μ 0 , i z i k ± S ˜ 0 , i 2 z i k .
The derivative of ln μ 0 , i with respect to z i k is:
ln μ 0 , i z i k = 1 m z i k .
The derivative of S ˜ 0 , i 2 with respect to z i k is:
S ˜ 0 , i 2 z i k = 1 m z i k 2 ( m 1 ) m ln z i k μ 0 , i .
Substituting (A2), (A3) into (A1), we obtain that:
G M i ± z i k = G M i ± 1 m 1 z i k 1 ± 2 ( m 1 ) m ln z i k μ 0 , i .
For m 2 , the first-order derivative of G M i ± with respect to z i k vanishes at the point:
z i ± = μ 0 , i exp m 2 ( m 1 )
and
G M i z i k 0 , f o r z i k z i , G M i + z i k 0 , f o r z i k z i + .
The second-order derivative of G M i ± with respect to z i k is:
2 G M i ± z i k 2 = G M i ± ( m z i k ) 2 g ( z i k ) 2 m g ( z i k ) 2 ( m 1 ) 2 m
where:
g ( z i k ) = 1 ± 2 ( m 1 ) m ln z i k μ 0 , i .
From (A7) and (A8), noting that g ( z i ± ) = 0 , we can conclude that z i and z i + are, respectively, a local maximum for G M i and a local minimum for G M i + .  □
Proof of Proposition 3.
By definition, the MRC of the penalized geometric mean (8) between variables z i k , z i h is:
M R C k h , i ± = G M i ± z i k G M i ± z i h
The proof follows easily substituting (A4) into (A9). □

References

  1. OECD/European Union/EC-JRC. Handbook on Constructing Composite Indicators: Methodology and User Guide; OECD Publishing: Paris, France, 2008; Available online: https://0-www-oecd--ilibrary-org.brum.beds.ac.uk/economics/handbook-on-constructing-composite-indicators-methodology-and-user-guide_9789264043466-en (accessed on 28 February 2022).
  2. Berger, R.L.; Casella, G. Deriving Generalized Means as Least Squares and Maximum Likelihood Estimates. Am. Stat. 1992, 46, 279–282. [Google Scholar]
  3. Mazziotta, M.; Pareto, A. A non-compensatory approach for the measurement of the quality of life. In Quality of Life in Italy; Springer: Dordrecht, The Netherlands, 2012; pp. 27–40. [Google Scholar]
  4. Ijiri, Y. Fundamental Queries in Aggregation Theory. J. Am. Stat. Assoc. 1971, 66, 766–782. [Google Scholar] [CrossRef]
  5. UNDP. Human Development Report 2010: The Real Wealth of Nations—Pathways to Human Development; UNDP: New York, NY, USA, 2010; Available online: http://hdr.undp.org/en/content/human-development-report-2010 (accessed on 28 February 2022).
  6. UNDP. Human Development Report 2007/2008-Fighting Climate Change: Human Solidarity in a Divided World; UNDP: New York, NY, USA, 2007; Available online: https://hdr.undp.org/sites/default/files/reports/268/hdr_20072008_en_complete.pdf (accessed on 28 February 2022).
  7. Box, G.E.P.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. Ser. B 1964, 26, 211–252. [Google Scholar] [CrossRef]
  8. Casadio Tarabusi, E.; Palazzi, P. An index for sustainable development. BNL Q. Rev. 2004, 229, 185–206. [Google Scholar]
  9. Casadio Tarabusi, E.; Guarini, G. An unbalance adjustment method for development. Soc. Indic. Res. 2013, 112, 19–45. [Google Scholar] [CrossRef]
  10. El Gibari, S.; Gomez, T.; Ruiz, F. Building composite indicators using multicriteria methods: A review. J. Bus. Econ. 2019, 89, 1–24. [Google Scholar] [CrossRef]
  11. Lai, E.; Lundie, S.; Ashbolt, N.J. Review of multi-criteria decision aid for integrated sustainability assessment of urban water systems. Urban Water J. 2008, 5, 315–327. [Google Scholar] [CrossRef]
  12. Azapagic, A.; Perdan, S. An integrated sustainability decision-support framework Part II: Problem analysis. Int. J. Sustain. Dev. World Ecol. 2005, 12, 112–131. [Google Scholar] [CrossRef]
  13. Charnes, A.; Cooper, W.W.; Rhodes, E.L. Measuring the efficiency of decision making units. Eur. J. Oper. Res. 1978, 2, 429–444. [Google Scholar] [CrossRef]
  14. Diaz-Balteiro, L.; Gonzalez-Pachon, J.; Romero, C. Measuring systems sustainability with multicriteria methods: A critical review. Eur. J. Oper. Res. 2017, 258, 607–616. [Google Scholar] [CrossRef]
  15. Roy, B. The outranking approach and the foundations of ELECTRE methods. Theory Decis. 1991, 31, 49–73. [Google Scholar] [CrossRef]
  16. Brans, J.P.; Vincke, P.; Mareschal, B. How to select and how to rank projects: The PROMETHEE methods. Eur. J. Oper. Res. 1986, 24, 228–238. [Google Scholar] [CrossRef]
  17. Greco, S.; Ishizaka, A.; Tasiou, M.; Torrisi, G. On the methodological framework of composite indices: A review of the issues of weighting, aggregation, and robustness. Soc. Indic. Res. 2019, 141, 61–94. [Google Scholar] [CrossRef] [Green Version]
  18. Mazziotta, M.; Pareto, A. Un indicatore sintetico di dotazione infrastrutturale: Il metodo delle penalità per coefficiente di variazione. In Proceedings of the Lo Sviluppo Regionale nell’Unione Europea-Obiettivi, Strategie, Politiche, Atti della XXVIII Conferenza Italiana di Scienze Regionali, Bolzano, Italy, 28–28 September 2007; Available online: https://aisre.it/images/old_papers/Mazziotta-Pareto.pdf (accessed on 28 February 2022).
  19. Mazziotta, M.; Pareto, A. On a generalized non-compensatory composite index for measuring socioeconomic phenomena. Soc. Indic. Res. 2016, 127, 983–1003. [Google Scholar] [CrossRef]
  20. Noorbakhsh, F.A. modified human development index. World Dev. 1998, 26, 517–528. [Google Scholar] [CrossRef]
  21. Paul, S. A modified human development index and international comparison. Appl. Econ. Lett. 1996, 3, 677–682. [Google Scholar] [CrossRef]
  22. Jha, R.P.; Bhattacharyya, K.; Mishra, D.; Pedgaonkar, S.P. Health Adjusted Human Development Index: A Modified Measure of Human Development. Int. J. Health Sci. Res. 2017, 7, 207–220. [Google Scholar]
  23. Prados de la Escosura, L. Improving human development: A long-run view. J. Econ. Surv. 2010, 24, 841–894. [Google Scholar] [CrossRef] [Green Version]
  24. Chakravarty, S.R. A generalized human development index. Rev. Dev. Econ. 2003, 7, 99–114. [Google Scholar] [CrossRef]
  25. Alkire, S.; Foster, J.E. Designing the Inequality-Adjusted Human Development Index; OPHI Working Paper Series; 2010; Volume WP37, Available online: https://www.ophi.org.uk/wp-content/uploads/ophi-wp37.pdf (accessed on 28 February 2022).
  26. Mazziotta, C.; Mazziotta, M.; Pareto, A.; Vidoli, F. La sintesi di indicatori territoriali di dotazione infrastrutturale: Metodi di costruzione e procedure di ponderazione a confronto. Riv. Econ. Stat. Territ. 2010, 1, 7–33. [Google Scholar] [CrossRef]
Figure 1. (a) Distribution of the Health indicator ( L E I ). (b) Distribution of the Education indicator ( E I ). (c) Distribution of the Economic indicator ( I I ).
Figure 1. (a) Distribution of the Health indicator ( L E I ). (b) Distribution of the Education indicator ( E I ). (c) Distribution of the Economic indicator ( I I ).
Computation 10 00064 g001
Figure 2. Distribution of the penalization factor.
Figure 2. Distribution of the penalization factor.
Computation 10 00064 g002
Figure 3. Frequency histogram of HDI and pHDI.
Figure 3. Frequency histogram of HDI and pHDI.
Computation 10 00064 g003
Figure 4. Density distribution of HDI and pHDI.
Figure 4. Density distribution of HDI and pHDI.
Computation 10 00064 g004
Figure 5. Penalization factor vs. HDI ranking.
Figure 5. Penalization factor vs. HDI ranking.
Computation 10 00064 g005
Figure 6. Scatter plot of the harmonic mean version (in green), the arithmetic mean version (in black) of the HDI and the pHDI values (in red) vs. the HDI values.
Figure 6. Scatter plot of the harmonic mean version (in green), the arithmetic mean version (in black) of the HDI and the pHDI values (in red) vs. the HDI values.
Computation 10 00064 g006
Figure 7. Scatter plot of the ranking obtained with the harmonic mean version of HDI (in green), with the arithmetic mean version of HDI (in black) and with the pHDI (in red) vs. the ranking obtained with HDI ranking.
Figure 7. Scatter plot of the ranking obtained with the harmonic mean version of HDI (in green), with the arithmetic mean version of HDI (in black) and with the pHDI (in red) vs. the ranking obtained with HDI ranking.
Computation 10 00064 g007
Table 1. Variables used to compute the HDI.
Table 1. Variables used to compute the HDI.
VariableDefinitionUnitRange
LELife expectancy at birthyears53.3–84.9
EYSExpected years of schoolingyears5.0–22.0
MYSMean years of schoolingyears1.6–14.2
GNIpcGNI per capita (PPP international dollars)dollars754.0–131,032.0
Table 2. Descriptive statistics for the variables used to compute the HDI.
Table 2. Descriptive statistics for the variables used to compute the HDI.
VariableMeanSt. DevMedianCVSkewKurtosis
L E 72.71 7.39 74.0 10.16 % 0.55 0.41
E Y S 13.33 2.94 13.2 22.06 % 0.11 0.08
M Y S 8.73 3.08 9.0 35.28 % 0.31 0.99
G N I p c 20 , 219.76 21 , 229.08 12 , 707.0 104.99 % 1.76 3.90
Table 3. Descriptive statistics for the normalized indicators used to compute the HDI.
Table 3. Descriptive statistics for the normalized indicators used to compute the HDI.
IndicatorMeanSt. DevMedianMinMaxRangeSkewKurtosisCV
Health ( L E I ) 0.81 0.11 0.83 0.51 1.00 0.49 0.55 0.41 13.58 %
Education ( E I ) 0.66 0.17 0.68 0.25 0.95 0.69 0.35 0.77 25.76 %
Economic ( I I ) 0.71 0.17 0.73 0.31 1.00 0.69 0.24 0.89 23.94 %
Table 4. Descriptive statistics for two methods.
Table 4. Descriptive statistics for two methods.
Comp IndMeanSt. DevMedianMinMaxRangeSkewKurtosisCV
HDI 0.72 0.15 0.74 0.39 0.96 0.56 0.32 0.92 20.83 %
pHDI 0.71 0.16 0.74 0.34 0.96 0.62 0.37 0.85 22.54 %
Table 5. Countries with higher ranking position differences.
Table 5. Countries with higher ranking position differences.
CountryHDIpHDIDifference
Maldives95110−15
Syrian Arab Republic151161−10
Qatar4553−8
Lebanon92100−8
Cuba7077−7
Armenia81756
Mongolia99936
Lesotho1651596
Guinea-Bissau1771707
Nigeria1611538
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mariani, F.; Ciommi, M. Aggregating Composite Indicators through the Geometric Mean: A Penalization Approach. Computation 2022, 10, 64. https://0-doi-org.brum.beds.ac.uk/10.3390/computation10040064

AMA Style

Mariani F, Ciommi M. Aggregating Composite Indicators through the Geometric Mean: A Penalization Approach. Computation. 2022; 10(4):64. https://0-doi-org.brum.beds.ac.uk/10.3390/computation10040064

Chicago/Turabian Style

Mariani, Francesca, and Mariateresa Ciommi. 2022. "Aggregating Composite Indicators through the Geometric Mean: A Penalization Approach" Computation 10, no. 4: 64. https://0-doi-org.brum.beds.ac.uk/10.3390/computation10040064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop