• Open Access

Statistics of Shared Components in Complex Component Systems

Andrea Mazzolini, Marco Gherardi, Michele Caselle, Marco Cosentino Lagomarsino, and Matteo Osella
Phys. Rev. X 8, 021023 – Published 20 April 2018
PDFHTMLExport Citation

Abstract

Many complex systems are modular. Such systems can be represented as “component systems,” i.e., sets of elementary components, such as LEGO bricks in LEGO sets. The bricks found in a LEGO set reflect a target architecture, which can be built following a set-specific list of instructions. In other component systems, instead, the underlying functional design and constraints are not obvious a priori, and their detection is often a challenge of both scientific and practical importance, requiring a clear understanding of component statistics. Importantly, some quantitative invariants appear to be common to many component systems, most notably a common broad distribution of component abundances, which often resembles the well-known Zipf’s law. Such “laws” affect in a general and nontrivial way the component statistics, potentially hindering the identification of system-specific functional constraints or generative processes. Here, we specifically focus on the statistics of shared components, i.e., the distribution of the number of components shared by different system realizations, such as the common bricks found in different LEGO sets. To account for the effects of component heterogeneity, we consider a simple null model, which builds system realizations by random draws from a universe of possible components. Under general assumptions on abundance heterogeneity, we provide analytical estimates of component occurrence, which quantify exhaustively the statistics of shared components. Surprisingly, this simple null model can positively explain important features of empirical component-occurrence distributions obtained from large-scale data on bacterial genomes, LEGO sets, and book chapters. Specific architectural features and functional constraints can be detected from occurrence patterns as deviations from these null predictions, as we show for the illustrative case of the “core” genome in bacteria.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 27 July 2017
  • Revised 29 January 2018

DOI:https://doi.org/10.1103/PhysRevX.8.021023

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Interdisciplinary PhysicsPhysics of Living Systems

Authors & Affiliations

Andrea Mazzolini1, Marco Gherardi2,3, Michele Caselle1, Marco Cosentino Lagomarsino2,3,4, and Matteo Osella1,*

  • 1Physics Department and INFN, University of Turin, via P. Giuria 1, 10125 Turin, Italy
  • 2Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
  • 3CNRS, UMR 7238, Paris, France
  • 4FIRC Institute of Molecular Oncology (IFOM), 20139 Milan, Italy

  • *To whom correspondence should be addressed. mosella@to.infn.it

Popular Summary

Many complex systems in very different contexts—from biology to linguistics—can be broken down to clearly defined basic building blocks or components. Books, for example, can be seen as sets of words, and genomes can be seen as sets of genes. Analysis of the component usage in these systems can reveal interesting quantitative laws, some of which are specific to that system, while others are shared across diverse systems. A common theoretical framework for component systems is needed to understand such similarities and differences between systems.

This work focuses on the statistics of shared components and asks the following basic questions: How many components (e.g., words or genes) are common to all realizations (e.g., books or genomes)? How many are, instead, very specific? What are the system-level features that set the probability of sharing components? Such questions are central in evolutionary genomics, and we extend them here to general component systems, using the examples of texts and LEGO toys.

Solving a model based on random sampling of components, we show that several universal aspects of the statistics of shared components are a direct consequence of the heterogeneity of component usage and of system parameters such as the size of realizations and the size of the component vocabulary. While this simple model can capture general properties of empirical systems, deviations from its predictions can be used to highlight system-specific architectural or functional constraints. We show the validity of this approach for detecting the core genome in prokaryotic genomes.

Bridging different areas of research in complex systems can open the way for developing and applying statistical null models in different contexts. We show, for example, how a modeling approach rooted in quantitative linguistics can shed light on the dynamics of genome evolution.

Key Image

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand
Issue

Vol. 8, Iss. 2 — April - June 2018

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review X

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×