Abstract

The purpose of this paper is to assess the statistical characterization of weighted networks in terms of the generalization of the relevant parameters, namely, average path length, degree distribution, and clustering coefficient. Although the degree distribution and the average path length admit straightforward generalizations, for the clustering coefficient several different definitions have been proposed in the literature. We examined the different definitions and identified the similarities and differences between them. In order to elucidate the significance of different definitions of the weighted clustering coefficient, we studied their dependence on the weights of the connections. For this purpose, we introduce the relative perturbation norm of the weights as an index to assess the weight distribution. This study revealed new interesting statistical regularities in terms of the relative perturbation norm useful for the statistical characterization of weighted graphs.

1. Introduction

Complex systems may also [1] emerge from a large number of interdependent and interacting elements. Networks have proven to be effective models of natural or man-made complex systems, where the elements are represented by the nodes and their interactions by the links. Typical well-known examples include communication and transportation networks, social networks, and biological networks [25].

Although the statistical analysis of the underlying topological structure has been very fruitful [25], it was limited due to the fact that in real networks the links may have different capacities or intensities or flows of information or strengths. For example, weighted links can be used for the Internet to represent the amount of data exchanged between two hosts in the network. For scientific collaboration networks, the weight depends on the number of coauthored papers between two authors. For airport networks, it is either the number of available seats on direct flight connections between airports and or the actual number of passengers that travel from airport to . For neural networks, the weight is the number of junctions between neurons and for transportation networks it is the Euclidean distance between two destinations.

The diversification of the links is described in terms of weights on the links. Therefore, the statistical analysis has to be extended from graphs to weighted complex networks. If all links are of equal weight, the statistical parameters used for unweighted graphs are sufficient for the statistical characterization of the network. Therefore, the statistical parameters of the weighted graphs should reduce to the corresponding parameters of the conventional graphs if all weights are put equal to unity.

Complex graphs are characterized by three main statistical parameters, namely, the degree distribution, the average path length, and the clustering coefficient. We will briefly mention the definitions for clarity and for a better understanding of the proposed extensions of these parameters for weighted graphs.

The structure of a network with nodes is represented by an binary matrix , known as adjacency matrix, whose element equals 1, when there is a link joining node to node and 0 otherwise ().

In the case of undirected networks with no self-links (links connecting a node to itself), the adjacency matrix is symmetric () and all elements of the main diagonal equal 0 ().

The degree of a node is defined as the number of its neighbors, that is, the number of links incident to node :

where the elements of the adjacency matrix and the neighborhood of node .

The degree distribution is the probability that some node has connections to other nodes. Many real complex networks have been found to be described by a power law degree, , with .

The characteristic path length of a network is defined as the average of the shortest path lengths between any two nodes:

where is the shortest path length between and , defined as the minimum number of links traversed to get from node to node .

In many real networks, it is found that the existence of a link between nodes and and between nodes and enhances the probability that node will also be connected to node . This tendency of the neighbors of any node to connect to each other is called clustering and is quantified by the clustering coefficient , which is the fraction of triangles in which node participates, to the maximum possible number of such triangles:

where is the actual number of triangles in which node participates, that is, the actual number of links between the neighbors of node , and is the maximum possible number of links, when the subgraph of neighbors of node is completely connected.

The clustering coefficient equals 1 if node is the center of a fully interconnected cluster, and equals 0 if the neighbors of node are not connected to each other.

In order to characterize the network as a whole, we usually consider the average clustering coefficient over all the nodes. We may also consider the average clustering coefficient over the node degree .

Studies of real complex networks have shown that their connection topology is neither completely random nor completely regular, but lies between these extreme cases. Many real networks share features of both extreme cases. For example, the short average path length, typical of random networks, comes along with large clustering coefficient, typical of regular lattices. The coexistence of these attributes defines a distinct class of networks, interpolating between regular lattices and random networks, known today as small world networks [36]. Another class of networks emerges when the degree distribution is a power law (scale free) distribution, which signifies the presence of a nonnegligible number of highly connected nodes, known as hubs. These nodes, with very large degree compared to the average degree , are critical for the network’s robustness and vulnerability. These networks are known today as scale-free networks [24, 7].

The purpose of this paper is to assess the statistical characterization of weighted networks in terms of proper generalizations of the relevant parameters, namely, average path length, degree distribution, and clustering coefficient. After reviewing the definitions of the weighted average path length, weighted degree distribution and weighted clustering coefficient in Section 2, we compare them in Section 3. Although the degree distribution and the average path length admit straightforward generalizations, for the clustering coefficient several different definitions have been proposed. In order to elucidate the significance of different definitions of the weighted clustering coefficient, we studied their dependence on the weights of the connections in Section 4, where we introduce the relative perturbation norm as an index to assess the weight distribution. This study revealed new interesting statistical regularities in terms of the relative perturbation norm useful for the statistical characterization of weighted graphs.

2. Statistical Parameters of Weighted Networks

The weights of the links between nodes are described by an matrix . The weight is 0 if the nodes and are not linked. We will consider the case of symmetric positive weights (), with no self-links ().

In order to compare different networks or different kinds of weights, we usually normalize the weights in the interval [], by dividing all weights by the maximum weight. The normalized weights are .

The statistical parameters for weighted networks are defined as follows.

The node degree , which is the number of links attached to node , is extended directly to the strength or weighted degree, which is the sum of the weights of all links attached to node :

The strength of a node takes into account both the connectivity as well as the weights of the links.

The degree distribution is also extended for the weighted networks to the strength distribution , which is the probability that some node’s strength equals s. Recent studies indicate power law [810].

There are two different generalizations of the characteristic path length in the literature, applicable to transportation and communication networks. In the case of transportation networks, the weighted shortest path length between the nodes and is defined as the smallest sum of the weights of the links throughout all possible paths from node to node [11, 12]:

where is a path from node to node and is the class of paths from to .

The weight describes physical distances and/or cost usually involved in transportation networks. The capacity, intensity, strength, or efficiency of the connection is inversely proportional to the weight.

However, this definition is not suitable for communication networks, where the efficiency of the communication channel between two nodes is proportional to the weight. The shortest path length in case of communication networks is defined as the smallest sum of the inverse weights of the links throughout all possible paths from node to node :

The weighted characteristic path length for both cases is the average of all shortest path lengths and it is calculated by formula (1.2).

We found in the literature seven proposals for the definitions of the weighted clustering coefficient, which we will review.

(i) Zhang and Horvath’s [13] definition is as follows: The weights in this definition are normalized. The idea of the generalization is the substitution of the elements of the adjacency matrix by the weights in the nominator of formula (1.3); as for the denominator, the upper limit of the nominator is obtained in order to normalize the coefficient between 0 and 1. The definition originated from gene coexpression networks.

As shown by Kalna and Higham [14], an alternative formula that may apply for this definition is

(ii) Lopez-Fernandez et al.’s [15] definition is The weights in this definition are not normalized. The idea of the generalization is the substitution of the number of links that exist between the neighbors of node in formula (1.3) by the weight of the link between the neighbors and . The definition originated from an affiliation network for committers (or modules) of free, open source software projects.

(iii) Onnela et al.’s [16] definition is The weights in this definition are normalized. The quantity is called “intensity” of the triangle . The concept for this generalization is to substitute the total number of the triangles in which node participates by the intensity of the triangle, which is geometric mean of the links’ weights.

(iv) Barrat et al.’s [8] definition is The weights in this definition are not normalized. The idea of the generalization is the substitution of the elements of the adjacency matrix in formula (1.3), by the average of the weights of the links between node and its neighbors and with respect to normalization factor which ensures that . This definition was used for airport and scientific collaboration networks.

(v) Serrano et al.’s [17] definition is where has been named “disparity.”

The weights in this definition are not normalized. This formula is used for the generalization of the average clustering coefficient with degree , which has a probabilistic interpretation just as the unweighted clustering coefficient.

(vi) Holme et al.’s [18] definition is The only difference between formulas (2.4) and (2.10) is that (2.10) is divided by .

We will not discuss this definition in the comparison because the essence of the comparison is already addressed by definition (2.4).

(vii) Li et al.’s [19] definition of the weighted clustering coefficient is another version of the Lopez-Fernandez proposal (2.6).

3. The Relation between the Different Weighted Clustering Coefficients

(1) All definitions reduce to the clustering coefficient (1.3), when the weights are replaced by the adjacency matrix elements.

(2) All weighted clustering coefficients reduce to 0 when there are no links between the neighbors of node , that is, when .

(3) In the other extreme, all weighted clustering coefficients take the value 1 when all neighbors of node are connected to each other. Formulas (2.4) and (2.6) take the value 1 if the weights between the neighbors of the node are 1, independently of the weights of the other links. Formula (2.7) takes the value 1, if and only if all the weights are equal to 1. Formulas (2.8) and (2.9) take the value 1 for all fully connected graphs, independently of all the weights.

These calculations are presented in Appendix 1.

(4) We calculated the values of the weighted clustering coefficients of node participating in a fully connected triangle. Formulas (2.4) and (2.6) take the value of the weight of the link between neighbors and of node . Formula (2.7) becomes equal to the intensity of the triangle for all nodes of the triangle. Formulas (2.8) and (2.9) take the value 1 for all fully connected graphs, independently of all weights.

These calculations are presented in Appendix 2.

4. The Dependence of the Weighted Clustering Coefficients on the Weights

In order to understand the meaning of the different proposals or definitions (2.4), (2.6), (2.7), (2.8), and (2.9) of the weighted clustering coefficient, we will examine their dependence on the weights, without alteration of the topology of the graph. We simply examine the values of these definitions for different distributions of weights, substituting the nonzero elements of the adjacency matrix by weights normalized between 0 and 1.

A way to distinguish and compare different weight distributions over the same graph is in terms of the relative perturbation norm , which gives the percentage of the perturbation of the adjacency matrix introduced by the weights. For simplicity, we considered the norm.

Although the weight distributions of certain real weighted networks have been found to be inhomogeneous like the exponential [810, 12, 19], we will examine the dependence of the weighted clustering coefficient with respect to the relative perturbation norms for the exponential as well as for the uniform and normal distributions, for different graphs.

We have examined many networks from 20 up to 300 nodes with different topologies that were generated by the networks software PAJEK [20]. The weights examined are randomly generated numbers following uniform, normal, or exponential distributions with several parameter values, so that the percentages of the perturbations scale from 0–90% increasing by 10% at each perturbation. All simulations gave rise to the same results, Figures 3 and 4, representing the typical trends of random and scale free networks, Figures 1 and 2. The related statistical trend analysis is reported in Appendix 3.

Observations on the dependence of the weighted clustering coefficients on the relative perturbation norm are defined as follows.

(1) For the weighted clustering coefficients (2.4), (2.6), and (2.7), there is a clear linear dependence of their values, in terms of the relative perturbation norm of the weighted network. It is remarkable that no dependence is observed on the values of weights on specific links. The Zhang and Horvath (2.4), Lopez-Fernandez et al. (2.6), and Onnela et al. (2.7) weighted clustering coefficients follow the same trend, decaying smoothly as the relative perturbation norm increases. More specifically the trends of Zhang and Horvath (2.4) and Lopez-Fernandez et al. (2.6) almost coincide, as expected (Section 3), while the trend of Onnela et al. (2.8) varies slightly from the other two.

(2) The resulting linear models indicate that the values of the weighted clustering coefficients decrease by 10% of the value of the unweighted clustering coefficient, as the relative perturbation norm increases by 10%.

(3) The dependence of the weighted clustering coefficients (2.8) and (2.9) on the weights is not significant. The weighted clustering coefficients of Barrat et al. (2.8) and Serrano et al. (2.9) do not change (variations appear after the first two decimal digits), regardless of the size of the network or the distribution of the weights. As mentioned in Section 3, these coefficients are independent of the weights when the graph is completely connected. We observe here that the weighted clustering coefficients (2.8) and (2.9) are independent of the weights for graphs that are not completely connected. We conjecture therefore that this is a general fact.

5. Concluding Remarks

(1) From the observed dependence of the values of all five weighted clustering coefficients on the relative perturbation norm, we conjecture that the proposed relative perturbation norm is a reliable index of the weight distribution. The meaning of the decaying trend of weighted clustering coefficients defined by Zhang and Horvath (2.4), Lopez-Fernandez et al. (2.6), and Onnela et al. (2.7), with respect to the increase of the relative perturbation norm, is quite natural. The clustering decreases almost linearly as the weights “decrease.”

(2) We presented in Appendices 1 and 2 the calculations demonstrating that all definitions reduce to the clustering coefficient (1.3), when the weights are replaced by the adjacency matrix elements. The values of the weighted clustering coefficients of node participating in a fully connected triangle are presented for completeness because we did not found them in the literature.

Appendices

A. Calculations on the Weighted Clustering Coefficient

The definitions (2.4)–(2.9) reduce to the clustering coefficient (1.3), when the weights are replaced by the adjacency matrix elements.

(1) Zhang and Horvath’s is [13]

The proof is presented by the authors.

For example, for a fully connected network with four nodes,

(2) Lopez-Fernandez et al.’s is [15]

this formula can be expressed as

It is obvious that the formula reduces to the unweighted (1.3), when are substituted by .

(3) Onnela et al.’s [16]

reduces to the unweighted definition (1.3), when are substituted by :

(4) Barrat et al.’s [8]

reduces to the unweighted definition (1.3), when and are substituted by the adjacency matrix elements:

(5) Serrano et al.’s [17] formula can be expressed as

It is obvious that the formula reduces to the unweighted (1.3), when are substituted by .

B. The Values of the Weighted Clustering Coefficients of Some Node Participating in A Fully Connected Triangle

We calculate the weighted clustering coefficient of node 1.

(1) Zhang and Horvath’s is [13]

(2) Lopez-Fernandez et al.’s is [15]

(3) Onnela et al.’s is [16]

(4) Barrat et al.’s is [8]

Degree of node 1:

Strength of node 1:

since

We also prove that Barrat et al.’s definition for the weighted clustering coefficient is independent of all weights for all fully connected networks;

For a fully connected network: , and , so

since

(5) Serrano et al.’s is [17]

C. Trend Analysis. Linear Regression

The trend analysis was performed with SPSS [21], using the least squares method. We also tried quadratic and cubic fitting but the nonlinear regression gave zero for the coefficients corresponding to the quadratic and cubic terms for all cases of networks and weighted clustering coefficients.

We observe the presence of decreasing linear correlation between the weighted clustering coefficients and the relative perturbation norm for all distributions (homogeneous and symmetric (uniform, normal) and inhomogeneous and nonsymmetric (exponential)).

Acknowledgments

We would like to thank Professor D. Kandylis from the Medical School of Aristotle University of Thessaloniki who showed to us the significance of weighted networks in cognitive processes. We also thank Drs. M. A. Serrano, M. Boguñá, and R. Pastor-Satorras who pointed out their work to us. Finally, we would like to thank the referees for their constructive critisism which improved both the content and the presentation of the paper.