Gene family innovation, conservation and loss on the animal stem lineage

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: July 3, 2018 (This version)
Accepted Manuscript updated: June 15, 2018 (Go to version)
Accepted Manuscript published: May 31, 2018 (Go to version)
Accepted: May 26, 2018
Received: December 10, 2017

1. Related to
Genome Evolution: We are not so special

Zachary R Lewis, Casey W Dunn

Insight Jul 3, 2018
Further reading

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Choanoflagellates, the closest living relatives of animals, can provide unique insights into the changes in gene content that preceded the origin of animals. However, only two choanoflagellate genomes are currently available, providing poor coverage of their diversity. We sequenced transcriptomes of 19 additional choanoflagellate species to produce a comprehensive reconstruction of the gains and losses that shaped the ancestral animal gene repertoire. We identified ~1944 gene families that originated on the animal stem lineage, of which only 39 are conserved across all animals in our study. In addition, ~372 gene families previously thought to be animal-specific, including Notch, Delta, and homologs of the animal Toll-like receptor genes, instead evolved prior to the animal-choanoflagellate divergence. Our findings contribute to an increasingly detailed portrait of the gene families that defined the biology of the Urmetazoan and that may underpin core features of extant animals.

https://doi.org/10.7554/eLife.34226.001

eLife digest

All animals, from sea sponges and reef-building corals to elephants and humans, share a single common ancestor that lived over half a billion years ago. This single-celled predecessor evolved the ability to develop into a creature made up of many cells with specialized jobs. Reconstructing the steps in this evolutionary process has been difficult because the earliest animals were soft-bodied and microscopic and did not leave behind fossils that scientists can study.

Though their bodies have since disintegrated, many of the instructions for building the first animals live on in genes that were passed on to life forms that still exist. Scientists are trying to retrace those genes back to the first animal by comparing the genomes of living animals with their closest relatives, the choanoflagellates. Choanoflagellates are single-celled, colony-forming organisms that live in waters around the world. Comparisons with choanoflagellates may help scientists identify which genes were necessary to help animals evolve and diversify into so many different species. So far, 1,000 animal and two choanoflagellate genomes have been sequenced. But the gene repertoires of most species of choanoflagellates have yet to be analyzed.

Now, Richter et al. have cataloged the genes of 19 more species of choanoflagellates. This added information allowed them to recreate the likely gene set of the first animal and to identify genetic changes that occurred during animal evolution. The analyses showed that modern animals lost about a quarter of the genes present in their last common ancestor with choanoflagellates and gained an equal number of new genes. Richter et al. identified several dozen core animal genes that were gained and subsequently preserved throughout animal evolution. Many of these are necessary so that an embryo can develop properly, but the precise roles of some core genes remain a mystery. Most other genes that emerged in the first animals have been lost in at least one living animal.

The study of Richter et al. also showed that some very important genes in animals, including genes essential for early development and genes that help the immune system detect pathogens, predate animals. These key genes trace back to animals’ last common ancestor with choanoflagellates and may have evolved new roles in animals.

https://doi.org/10.7554/eLife.34226.002

Introduction

The biology of the first animal, the ‘Urmetazoan,’ has fascinated and confounded biologists for more than a century (Dujardin, 1841; James-Clark, 1867; Haeckel, 1869; Haeckel, 1873; Haeckel, 1874; Kent, 1880; Leadbeater and McCready, 2000). What features defined the biology of the Urmetazoan, and which of those features represent animal innovations? Despite the fact that the first animals originated over 600 million years ago (Douzery et al., 2004; Hedges et al., 2004; Peterson et al., 2004; Narbonne, 2005; Knoll, 2011), features of their genomes can be reconstructed through phylogenetically-informed comparisons among extant animals, their closest living relatives, the choanoflagellates, and other closely related lineages (King, 2004; Rokas, 2008; Richter and King, 2013; Grau-Bové et al., 2017; Sebé-Pedrós et al., 2017).

Although close to 1000 genomes of animals have been sequenced (NCBI Resource Coordinators, 2017), only two choanoflagellate genomes have been previously published (King et al., 2008; Fairclough et al., 2013). These two choanoflagellates are the strictly unicellular Monosiga brevicollis and the emerging model choanoflagellate Salpingoeca rosetta, which differentiates into a number of sexual and asexual cell types, ranging from single cells to multicellular rosette colonies (Fairclough et al., 2010; Dayel et al., 2011; Levin and King, 2013). The M. brevicollis and S. rosetta genomes revealed that many genes critical for animal biology, including p53, Myc, cadherins, C-type lectins, and diverse tyrosine kinases, evolved before the divergence of animals and choanoflagellates (King et al., 2008; Fairclough et al., 2013), whereas many other genes essential for animal biology, including components of the Wnt, Notch/Delta, Hedgehog, TGF-β, and innate immune pathways (e.g., Toll-like receptors) have not been detected in choanoflagellates, and therefore have been considered textbook examples of animal innovations.

Nonetheless, M. brevicollis and S. rosetta are relatively closely related (Carr et al., 2017), leaving the bulk of choanoflagellate diversity unexplored. Moreover, both species have demonstrably experienced gene loss, as some genes conserved among animals and non-choanoflagellates are apparently missing from M. brevicollis and S. rosetta. Examples include RNAi pathway components, which are present across eukaryotes (Shabalina and Koonin, 2008), the cell adhesion protein β-integrin, and T-box and Runx transcription factor families, which have been detected in the filasterean Capsaspora owczarzaki (Sebé-Pedrós and Ruiz-Trillo, 2010; Sebé-Pedrós et al., 2010; Sebé-Pedrós et al., 2011; Sebé-Pedrós et al., 2013a; Ferrer-Bonet and Ruiz-Trillo, 2017). Gene loss can lead to false negatives during ancestral genome reconstruction, and the phenomenon in choanoflagellates parallels that of animals, where two species selected for early genome projects, Drosophila melanogaster and Caenorhabditis elegans, were later found to have lost numerous genes (e.g., Hedgehog and NF-κΒ in C. elegans and fibrillar collagens in both C. elegans and D. melanogaster) that are critical for animal development and otherwise conserved across animal diversity (C. elegans Sequencing Consortium, 1998; Aspöck et al., 1999; Gilmore, 1999; Rubin et al., 2000).

To counteract the impact of gene loss in M. brevicollis and S. rosetta, and gain a more complete picture of the Urmetazoan gene catalog, we analyzed the protein coding genes of 19 previously unsequenced species of choanoflagellates representing each major known lineage (Carr et al., 2017). By comparing their gene catalogs with those of diverse animals and other phylogenetically relevant lineages, we have greatly expanded and refined our understanding of the genomic heritage of animals. This more comprehensive data set revealed that ~372 gene families that were previously thought to be animal-specific actually evolved prior to the divergence of choanoflagellates and animals, including gene families required for animal development (e.g., Notch/Delta) and immunity (e.g., Toll-like receptors). We find that an additional ~1944 gene families evolved along the animal stem lineage, many of which likely underpin unique aspects of animal biology. Although most of these animal-specific genes were subsequently lost from one or more species, 39 core animal-specific genes are conserved in all animals within our data set, likely because of their importance to core features of animal biology.

Results

The phylogenetic distribution of animal and choanoflagellate gene families

To reconstruct the genomic landscape of animal evolution, we first cataloged the protein coding potential of nineteen diverse choanoflagellate species by sequencing and assembling their transcriptomes (Figure 1, Figure 1—figure supplement 1, Supplementary file 1). Because most of these species were previously little-studied in the laboratory, two important stages of this project were the establishment of growth conditions optimized for each species and the development of improved protocols for isolating choanoflagellate mRNA for cDNA library construction and sequencing (Supplementary file 1, Materials and methods). After performing de novo transcriptome assembly and filtering for cross-contamination, we predicted a catalog of between 18,816–61,053 unique protein-coding sequences per species. [These counts likely overestimate the true numbers of underlying protein-coding genes, as they may include multiple alternative splice variants for any given gene and redundant contigs resulting from intra-species polymorphisms or sequencing artifacts (Grabherr et al., 2011; Haas et al., 2013)].

Figure 1 with 3 supplements see all

Download asset Open asset

Representative choanoflagellates analyzed in this study.

Choanoflagellates have diverse morphologies, including single cells, multicellular colonies, and the production in some lineages of extracellular structures. (a) *Diaphanoeca grandis*, within a silica-based extracellular structure called a ‘lorica’. (b) *Acanthoeca spectabilis*, within lorica. (c) *Codosiga hollandica*, with a basal organic stalk. (d) A rosette colony of *Salpingoeca rosetta*; image courtesy of Mark Dayel. (e) *Salpingoeca dolichothecata*, within an organic extracellular structure called a ‘theca’. (f) *Mylnosiga fluctuans*, with no extracellular structure. (g) *Didymoeca costata*, within lorica. (h) *Salpingoeca helianthica*, within theca. (h’) A rosette colony of *S. helianthica*. All scale bars represent 5 µm. Prey bacteria are visible in most panels as small black dots. Images of all choanoflagellate species sequenced in this study can be found in Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.34226.003

Using multiple independent metrics, we found that the new choanoflagellate transcriptomes approximate the completeness of choanoflagellate genomes for the purposes of cataloging protein-coding genes. For example, by comparing the S. rosetta genome with its transcriptome, we found that 93% of S. rosetta genes predicted from the genome were represented in its transcriptome with coverage over at least 90% of their length (Figure 1—figure supplement 2a). Furthermore, compared with the genomes of M. brevicollis and S. rosetta, which contain 83 and 89%, respectively, of a benchmark set of conserved eukaryotic genes [BUSCO; (Simão et al., 2015)], each of the new choanoflagellate transcriptomes contains between 88–96% (Supplementary file 2).

We also investigated the phylogenetic diversity of the choanoflagellate species we sequenced, finding it comparable to that of animals: the average phylogenetic distance between pairs of choanoflagellates was slightly larger than the phylogenetic distance between the mouse Mus musculus and the sponge Amphimedon queenslandica (Figure 2—figure supplement 1). Next, we used subsets of our data to reconstruct the phylogeny of choanoflagellates. We found that the positions of two species lying on long terminal branches (Salpingoeca dolichothecata and Codosiga hollandica) were poorly supported or recovered at inconsistent locations (Materials and methods). Therefore, to avoid basing our comparative genomics efforts on a potentially incorrect phylogeny, and because the focus of our study was on reconstructing large-scale patterns of gene family evolution between animals and choanoflagellates, we designed our analyses to be independent of species relationships within either group. [For display purposes only, we relied on a consensus of previously published phylogenies (Philippe et al., 2009; Burki et al., 2016; Carr et al., 2017)].

We next compared the choanoflagellate gene catalogs with those of diverse animals and phylogenetically relevant outgroups (Supplementary file 3) to identify orthologous gene families and determine the ancestry of genes present in animals (see Materials and methods for rationale underlying inferences of gene family orthology). In summary, two features that distinguish our analyses from prior reconstructions of ancestral animal gene content are (1) the additional breadth and depth provided by 19 phylogenetically diverse and newly-sequenced choanoflagellates and (2) a probabilistic and phylogenetically-informed approach designed to avoid the artificial inflation of ancestral gene content resulting from methods that rely on binary decisions for gene family presence or absence in each species while remaining independent of currently unresolved or contentious species relationships (Figure 2—figure supplement 2, Materials and methods).

By grouping gene families by their phylogenetic distribution on a heat map, we were able to visualize and infer their evolutionary history, as well as their presence or absence in each species analyzed (Figure 2, Figure 2—figure supplement 3, Figure 2—figure supplement 4, Supplementary file 4; Supplementary file 5). Several notable observations emerged from this visualization. First, the origins of animals, choanoflagellates, and choanozoans [the monophyletic group composed of animals and choanoflagellates (Brunet and King, 2017)] were each accompanied by the evolution of distinct sets of gene families (i.e., synapomorphies), some of which likely underpin their unique biological features. Second, the numbers of gene families gained on the animal and choanoflagellate stem lineages are roughly equivalent (~1944 and ~2,463, respectively), indicating that the specific functions of novel gene families, not their quantity, were critical to the very different phenotypes each clade went on to have. Finally, although different sets of gene families can reliably be inferred to have been present in the last common ancestor of each group, gene family loss was rampant during animal and choanoflagellate diversification. [After these analyses were complete, several additional genomes from early-branching holozoans and animals became available. Incorporating them post hoc into the heat map did not substantially change any of the above observations (Figure 2—figure supplement 5; Materials and methods)].

Figure 2 with 12 supplements see all

Download asset Open asset

The evolution of gene families in animals, choanoflagellates and their eukaryotic relatives.

Top, a consensus phylogeny (Philippe et al., 2009; Burki et al., 2016; Carr et al., 2017) of the species whose gene contents were compared. Each colored node represents the last common ancestor of a group of species. Bottom, a heat map of the 13,358 orthologous gene families inferred to have been present in at least one of six nodes representing common ancestors of interest: Ureukaryote, Uropisthokont, Urholozoan, Urchoanozoan, Urchoanoflagellate, and Urmetazoan (the full heat map for all gene families is shown in Figure 2—figure supplement 4). Each row represents a gene family. Gene families are sorted by their presence in each group of species, indicated by colored bars and boxes (eukaryotes, opisthokonts, holozoans, choanozoans, choanoflagellates and animals) and subsequently clustered within groups by uncentered Pearson correlation.

https://doi.org/10.7554/eLife.34226.007

Figure 2—source data 1 Phylogenetic trees (in phyloXML format) used to test for contamination of choanoflagellate transcriptomes with animal sequences. Seven protein-coding genes are represented: actin, alpha tubulin, beta tubulin, EF-1A, EFL, HSP70 and HSP90. Each tree includes the top three hits to each species in our database. Branches with bootstrap support below 50 are collapsed.: https://doi.org/10.7554/eLife.34226.020
Download elife-34226-fig2-data1-v3.txt

Differential retention and loss of ancestral gene families in extant animals and choanoflagellates

While the phenomenon of gene loss has been well documented in the evolution of animals and other eukaryotes (Wolf and Koonin, 2013; Albalat and Cañestro, 2016; O'Malley et al., 2016), it has been unclear which extant animals retained the most gene families from the Urmetazoan. Using the Urchoanozoan and Urmetazoan gene family catalogs reconstructed in this study, we ranked extant species based on their conservation of ancestral gene families (Figure 3, Figure 3—figure supplement 1). Compared with other animals in our study, the cephalochordate Branchiostoma floridae retains the most gene families that evolved along the animal stem lineage and also the most gene families with pre-choanozoan ancestry [extending prior observations that B. floridae preserved a comparatively large portion of the gene content of the last common ancestor of chordates (Louis et al., 2012)]. Among the non-bilaterian animal lineages, the cnidarian Nematostella vectensis most completely retains the Urmetazoan genetic toolkit [consistent with previous findings of conservation between N. vectensis and bilaterians (Putnam et al., 2007; Sullivan and Finnerty, 2007)], followed by the sponge Oscarella pearsei. Importantly, B. floridae, N. vectensis, and O. pearsei each retain different subsets of the Urmetazoan gene catalog, as only two thirds (67%) of the genes retained in any one of these species are found in all three species. In contrast, the more rapidly evolving ecdysozoans C. elegans, Pristionchus pacificus and Tetranychus urticae, as well as the ctenophore Mnemiopsis leidyi, retain the fewest ancestral gene families, suggesting widespread gene family loss in these lineages, although the draft nature of some of their genome assemblies and high rates of sequence evolution may artificially inflate counts of missing genes.

Figure 3 with 3 supplements see all

Download asset Open asset

Gene family retention in animals, choanoflagellates, and *Capsaspora owczarzaki*.

(a) Consensus phylogenetic tree (Philippe et al., 2009; Burki et al., 2016; Carr et al., 2017) with gene family retention. Gene families are divided by their origin in the last common ancestor in different groups: holozoan, choanozoan, choanoflagellate, or animal. Colors correspond to nodes indicated in the phylogenetic tree. Bars represent the sum of presence probabilities for gene families with each origin. (Note that a small sum of probabilities is assigned to certain species of choanoflagellates for animal-specific gene families, and vice versa. This is an expected result, as, in our method, every gene family is assigned a finite probability of presence in every species, producing a small background signal when summed over the approximately two thousand animal- or choanoflagellate-specific gene families. Variation in this background signal across species is due to species-specific effects on BLAST scores arising from database size and other factors, as well as intrinsically noisy scores assigned to weak BLAST hits.) (**b–c**) Ranked order of gene family retention for (b) animals and (c) choanoflagellates, similar to (a), but with the addition of gene families originating in the last common ancestor of Opisthokonts and of eukaryotes. Gene families originating within choanoflagellates are not included, in order to focus only on those gene families potentially shared with animals.

https://doi.org/10.7554/eLife.34226.021

Of the 21 choanoflagellates in our analysis, S. dolichothecata [which, despite the shared genus name, is not closely related to S. rosetta (Carr et al., 2017)] retains the most choanozoan-specific gene families, and therefore may be relatively more informative for comparative genomic studies of animal origins than other choanoflagellate species (Figure 3, Figure 3—figure supplement 1). Notably, the two choanoflagellate species with previously-sequenced genomes, M. brevicollis and S. rosetta, are among the choanoflagellates that have retained the fewest ancestral gene families. Thus, they are less representative of Urchoanoflagellate gene content than are most choanoflagellate species we sequenced. Indeed, several key gene families that were previously thought to be absent from choanoflagellates (due to their absence in M. brevicollis and S. rosetta) are conserved in S. dolichothecata and other choanoflagellates: the ancient ribonucleases Argonaute and Dicer, which are required for RNAi across eukaryotes (Jinek and Doudna, 2009), and holozoan gene families previously found in C. owczarzaki that are important for the regulation of animal development, including the transcription factors Churchill and Runx (Sebé-Pedrós et al., 2011) and a diagnostic domain for integrin β (Sebé-Pedrós et al., 2010) (Figure 2—figure supplement 6, Figure 3—figure supplement 2, Supplementary file 6, Materials and methods). These findings of lineage-specific gene family loss in certain animals, M. brevicollis and S. rosetta echo the more general observation that the criteria used to select species for genome sequencing also frequently select for those with streamlined genomes [e.g., (Gu et al., 2005)].

Animal-specific gene families: innovation and loss

Gene families that originated on the stem lineage leading to animals are more likely to function in pathways or processes that distinguish animals from other eukaryotic groups. We identified ~1944 such animal-specific gene families (Supplementary file 5), including well-known developmental receptors, signaling proteins and transcription factors such as TGF-β, Hedgehog, Pax and Sox [consistent with previous reports (Srivastava et al., 2010; Riesgo et al., 2014)]. Notably, we detected many animal-specific gene families with no known function; 203 gene families (12% of total) lack any Pfam domain, and a further 50 (3%) are annotated only with Pfam domains of unknown function. The biochemical activities of these uncharacterized animal-specific gene families remain to be discovered, along with their roles in animal development and evolution.

We next sought to characterize the extent to which the ~1944 gene families that originated on the animal stem lineage were subsequently retained in the 21 animal genomes we analyzed (Table 1). We found only 39 gene families that are universally conserved in all 21 animal genomes in our study; we refer to these as core animal-specific gene families. This count of core animal-specific gene families is likely to be an underestimate due to methodological tradeoffs in the genome-scale analysis that we used to identify gene families (see Materials and methods). By reducing the stringency of the requirement for conservation, we identified a total of 153 gene families that were missing in no more than two animals from our data set (i.e., approximately 10%; Figure 3—figure supplement 3), leaving ~1791 gene families that, despite being specific to animals, were lost in three or more extant lineages. In addition, recent studies in organisms not included in our genomic data set – myxozoans, a parasitic lineage of cnidarians, and glass sponges, which develop into syncytial larvae and adults – indicate that even among the 39 core animal-specific genes, some appear to be dispensable in animals with dramatically derived body plans (Chang et al., 2015; Schenkelaars et al., 2017).

Table 1

Core animal-specific gene families that are present in all animals in this study

Representative gene names and annotations are based on a consensus from each gene family (Materials and methods). Gene families are ordered by pathway/function. *: missing in myxozoans, a lineage of parasitic cnidarians (Chang et al., 2015), ** missing in myxozoans and in glass sponges (Schenkelaars et al., 2017), two animal lineages with derived body plans.

https://doi.org/10.7554/eLife.34226.025

Gene family ID	Representative gene name(s)	Pathway/Function
6201	A-kinase anchor protein 17B	gene regulation
8693	interleukin enhancer-binding factor 2	gene regulation
5720	lethal(2) giant larvae	gene regulation
5290	mediator subunit	gene regulation
6241	mediator subunit	gene regulation
7805	MEX3 B/C	gene regulation
6675	nuclear factor 1 A/B	gene regulation
3849	T-box transcription factor TBX 2/3	gene regulation
6532	catenin beta	Wnt
4891	catenin delta	Wnt
6254	dishevelled 1–3	Wnt**
3570	frizzled 1/2/5/7/8	Wnt*
441	low-density lipoprotein receptor (LRP) 1/2/4/5/6	Wnt*
5637	transcription factor 7 (TCF/LEF)	Wnt
6000	transcription factor COE 1–4	Wnt
6831	fermitin 1–3	cell-cell adhesion
804	hemicentin/obscurin/titin	cell-cell adhesion
4442	integrin alpha 2/5/8	cell-cell adhesion
7164	laminin gamma 1–3	cell-cell adhesion
8024	vinculin	cell-cell adhesion
4993	calcium-dependent secretion activator	synapse/vesicle
476	metabotropic glutamate receptor 1–8	synapse/vesicle
7929	receptor-type tyrosine-protein phosphatase-like N	synapse/vesicle
8406	kinase suppressor of Ras 2	MAPK/JNK
7174	MAPK 7	MAPK/JNK
6010	MAPK 8–10	MAPK/JNK
495	disintegrin and metalloprotease (ADAM)	metalloprotease
4051	tetraspanin 5/14/17/33	metalloprotease
2737	caspase 3/7/9	apoptosis
5675	calumenin	calcium ion
6842	cyclin T 1/2	cell cycle
621	dystonin/desmoplakin/plectin	cytoskeleton
7916	phosphorylase b kinase	glycan
6512	heparan-sulfate 6-O-sulfotransferase 1–3	heparan sulfate
7146	inositol monophosphatase 3	inositol
6163	protein kinase C iota/zeta	PI3K
6251	small G protein signaling modulator 1–2	Rab GTP
6366	MAP kinase-activating death domain protein 9–11	TNF
8587	BTB/POZ domain-containing protein 9	ubiquitin

Focusing on the 39 core animal-specific gene families, we asked whether they might participate in pathways known to be critical for animal biology. Indeed, this set of genes includes seven from the Wnt pathway (including Frizzled, Dishevelled, TCF/LEF and β-catenin), five involved in cell-cell adhesion (including integrin α, laminin, and vinculin), and other well-known animal gene families such as JNK, caspases, and metabotropic glutamate receptors. The 39 core animal gene families also include several that are less well characterized or whose specific contributions to animal origins and animal biology are not immediately obvious, such as two subunits of the transcription-regulating Mediator complex (Malik and Roeder, 2010) and the ubiquitination-associated BTB/POZ domain-containing protein 9 (DeAndrade et al., 2012). For comparison, choanoflagellates have a similarly small set of 75 gene families (out of ~2463 choanoflagellate-specific gene families) that are conserved in all 21 choanoflagellate species that we sampled; 27% of these gene families encode Pfam domains related to kinase signaling (including protein kinases, phosphatases and adapters; Supplementary file 7).

While novel features of animal biology might have evolved with the emergence of new gene families, the loss of ancient genes also influenced animal origins. We detected ~1645 gene families that evolved prior to the choanoflagellate-animal divergence that were retained in choanoflagellates and lost entirely from animals. These include gene families in pathways necessary for the biosynthesis of the amino acids leucine, isoleucine, valine, methionine, histidine, lysine and threonine (Figure 2—figure supplement 7, Supplementary file 8). The shikimic acid pathway, which is required for the synthesis of the aromatic amino acids tryptophan and phenylalanine, and other aromatic compounds, was also lost along the animal stem lineage [although subsequently regained in cnidarians through horizontal gene transfer from bacteria (Fitzgerald and Szmant, 1997; Starcevic et al., 2008)]. We thus demonstrate that components of the biosynthesis pathways for nine amino acids that are essential in animals (Payne and Loomis, 2006; Guedes et al., 2011) were lost on the animal stem lineage, and not prior to the divergence of choanoflagellates and animals. The SLN1 two-component osmosensing system, which has been shown in fungi to regulate acclimation to changes in environmental salinity (Posas et al., 1996), is also conserved in choanoflagellates but absent in animals. [Although these amino acid synthesis and osmosensing pathway components were retained in choanoflagellates, several other gene families involved in diverse biosynthetic pathways were instead lost on the choanoflagellate stem lineage (Supplementary file 9)]. Together, the ensemble of animal gene family losses reflects the substantial changes in metabolism and ecology that likely occurred during early animal evolution.

Choanozoan-specific gene families: innovation and loss

In addition to the set of gene families that evolved on the animal stem lineage, those that originated on the holozoan and choanozoan stem lineages also contributed to the genomic heritage of animals. Our increased sampling of choanoflagellate diversity allowed us to ask whether gene families previously thought to have been animal innovations, due to their absence from M. brevicollis, S. rosetta and other outgroups, may in fact have evolved before the divergence of animals and choanoflagellates. We found that ~372 gene families previously thought to be restricted to animals have homologs in one or more of the 19 newly sequenced choanoflagellates (Supplementary file 5; see Supplementary file 10 for a list of pathways with components gained or lost on the Choanozoan stem lineage).

Within this set of genes are the Notch receptor and its ligand Delta/Serrate/Jagged (hereafter Delta), which interact to regulate proliferation, differentiation and apoptosis during animal developmental patterning (Artavanis-Tsakonas et al., 1999). Intact homologs of these important signaling proteins have never previously been detected in non-animals, although some of their constituent protein domains were previously found in M. brevicollis, S. rosetta and C. owczarzaki (King et al., 2008; Suga et al., 2013). In our expanded choanoflagellate data set, we detected a clear Notch homolog in Mylnosiga fluctuans with the prototypical EGF (epidermal growth factor), Notch, transmembrane and Ank (ankyrin) domains in the canonical order, while five other choanoflagellates contain a subset of the typical protein domains of Notch proteins (Figure 2—figure supplement 6, Figure 2—figure supplement 8). Similarly, the choanoflagellate species S. dolichothecata expresses a protein containing both of the diagnostic domains of animal Delta (MNNL and DSL, both of which were previously thought to be animal-specific) and four other choanoflagellate species express one of the two diagnostic domains of Delta, but not both. The distributions of Notch and Delta in choanoflagellates suggest that they were present in the Urchoanozoan and subsequently lost from most (but not all) choanoflagellates, although it is formally possible that they evolved convergently through shuffling of the same protein domains in animals and in choanoflagellates. A similar portrait emerges for the cadherins Flamingo and Protocadherin (Chae et al., 1999; Usui et al., 1999; Frank and Kemler, 2002) that were previously thought to be animal-specific, but are found in a subset of choanoflagellates within our data set (Figure 2—figure supplement 6, Materials and methods).

We also found evidence that numerous gene families and pathways that originated in animals arose through shuffling of more ancient protein domains and genes that were already present in the Urchoanozoan (Ekman et al., 2007; King et al., 2008; Grau-Bové et al., 2017). For example, the new choanoflagellate gene catalogs confirm that several signature animal signaling pathways, such as Hedgehog, Wnt, JNK, JAK-STAT, and Hippo, are composed of a mixture of gene families that were present in the Urchoanozoan and others that evolved later on the animal stem lineage or within animals (Snell et al., 2006; Adamska et al., 2007; Hausmann et al., 2009; Richards and Degnan, 2009; Srivastava et al., 2010; Sebé-Pedrós et al., 2012; Babonis and Martindale, 2017) (Supplementary file 8). For another animal signaling pathway, TGF-β, the critical ligand, receptor and transcription factor gene families are composed of animal-specific domain architectures (Heldin et al., 1997; Munger et al., 1997), although all three contain constituent protein domains that evolved on the choanozoan stem lineage (Figure 2—figure supplement 9).

The pre-animal origins of the animal innate immunity pathway

One surprise from our analyses was the existence in choanoflagellates of genes required for innate immunity in animals. Although innate immunity is a feature of both animal and plant biology, the receptors and pathways used by these evolutionarily distant organisms are thought to have evolved independently (Ausubel, 2005). The animal immune response is initiated, in part, when potential pathogens stimulate (either directly or indirectly) the Toll-like receptors (TLRs), which have previously been detected only in animals (Leulier and Lemaitre, 2008). Importantly, although TLRs are found in nearly all bilaterians and cnidarians, they are absent from placozoans, ctenophores, and sponges [proteins with similar, but incomplete domain architectures have been detected in sponges (Miller et al., 2007; Riesgo et al., 2014)] and were therefore thought to have evolved after the origin of animals.

We found that 14 of 21 sequenced choanoflagellates encode clear homologs of animal TLRs (Figure 4a,c), implying that TLRs first evolved on the Urchoanozoan stem lineage (Materials and methods). Animal and choanoflagellate TLRs are composed of an N-terminal signal peptide, multiple leucine rich repeat (LRR) domains, a transmembrane domain, and an intracellular Toll/interleukin-1 receptor/resistance (TIR) domain. In the canonical TLR signaling pathway, the interaction of the intracellular TIR domain of Toll-like receptors with TIR domains on adapter proteins (e.g., MyD88) initiates one of a number of potential downstream signaling cascades that ultimately lead to activation of the NF-κB transcription factor (Janeway and Medzhitov, 2002).

Figure 4 with 2 supplements see all

Download asset Open asset

Evolution of the TLR signaling pathway.

Components of the canonical TLR pathway (a) and a potential choanoflagellate Kinase TLR signaling pathway (b), with their canonical domain architectures and colored by their inferred ancestral origin (blue = choanozoan ancestry, purple = metazoan ancestry, green = choanoflagellate ancestry, and red = holozoan ancestry). Question marks denote steps of the signaling pathway and/or interaction partners that are hypothesized, but untested. (c) Presence of receptors, adapters, kinases and the transcription factor NF-κB in animals, choanoflagellates and *C. owczarzaki*.

https://doi.org/10.7554/eLife.34226.026

To investigate whether the Urchoanozoan TLR might have activated a downstream signaling pathway that resembled the canonical TLR pathway in animals, we searched for TLR adapters, downstream kinases and NF-κB in choanoflagellates (Figure 4c, Supplementary file 6). While many choanoflagellates encode NF-κB, we found no evidence for two critical Death domain-containing proteins involved in the TLR-dependent activation of NF-κB: the adapter protein MyD88 (Wiens et al., 2005; Gauthier et al., 2010) and the downstream kinase IRAK (Song et al., 2012). However, we did detect two new classes of choanoflagellate-specific proteins that pair kinase domains directly with LRR and/or TIR domains, potentially bypassing the need to recruit kinases into multi-protein signaling complexes (Figure 4b): TLR-like proteins with an intracellular kinase domain positioned between the transmembrane domain and TIR domain (which we provisionally term ‘Kinase TLRs’) and proteins encoding TIR and kinase domains, but lacking a transmembrane domain (which we provisionally term ‘Kinase TIRs’). In addition, we detected homologs of the TIR-containing adapter SARM1, a multi-functional protein that can trigger both NF-κB-dependent and NF-κB-independent responses (Couillault et al., 2004; Sethman and Hawiger, 2013; Liu et al., 2014). Choanoflagellate SARM1 homologs contain a conserved glutamic acid residue that is necessary for SARM1 NADase activity in animals (Essuman et al., 2017) (Figure 4—figure supplement 1). Finally, although we did not detect most animal cytosolic innate immune sensors in choanoflagellates, including the LRR-containing NLR family, ALRs, MAVS, MDA5 and RIG-I, we did find evidence for both cGAS and STING in diverse choanoflagellates [as previously reported in M. brevicollis (Wu et al., 2014)]. Thus, critical components of the animal innate immune pathway, including both extracellular and intracellular pattern sensing receptors, predate animal origins.

Discussion

Our increased sampling of choanoflagellates reveals how the Urmetazoan genome evolved as a mosaic of old, new, rearranged, and repurposed protein domains, genes and pathways. We have identified ~8418 gene families that were present in the Urmetazoan [consistent with recent findings (Simakov and Kawashima, 2017)], about 75% of which were also present in the Urchoanozoan and the remainder of which evolved on the animal stem lineage (Supplementary file 5). The patchwork ancestry of the Urmetazoan genome is illustrated by the fact that many gene families responsible for animal development, immunity and multicellular organization evolved through shuffling of protein domains that first originated in the choanozoan stem lineage together with ancient or animal-specific domains (e.g. the TGF-β ligand and receptor; Figure 2—figure supplement 6, Figure 2—figure supplement 9). In addition, other gene families found in the Urchoanozoan were subsequently combined into new pathways in the animal stem lineage along with newly evolved genes (e.g., the TLR and Hedgehog pathways; Figure 4, Supplementary file 8). Moreover, the history of the Urmetazoan genome is not simply one of innovation and co-option, as ~1,645 Urchoanozoan genes were lost on the animal stem lineage, including genes for the synthesis of nine essential amino acids [Figure 2—figure supplement 7, Supplementary file 8; (Payne and Loomis, 2006; Guedes et al., 2011; Erives and Fassler, 2015)]. A study based on similar methodology that incorporated two of the 21 choanoflagellate species analyzed here (M. brevicollis and S. rosetta) was recently published and found a similar pattern of gene innovation on the animal stem lineage, while also identifying many of the same core animal-specific genes (Paps and Holland, 2018).

The origin of multicellularity in animals provided novel niches for bacteria to exploit, requiring the first animals to evolve new mechanisms for mediating interactions with pathogenic and commensal bacteria. In addition, the progenitors of animals interacted with bacteria – both as prey and pathogens – and the roots of animal innate immunity clearly predate animal origins. We have found that choanoflagellates express TLRs, transmembrane receptors that trigger the animal innate immune response, as well as its canonical downstream signaling target, NF-κB, suggesting that both existed in the Urchoanozoan (Figure 4a,c). Like modern choanoflagellates and sponges, the Urchoanozoan likely preyed upon bacteria (McFall-Ngai et al., 2013; Richter and King, 2013), and bacterial cues can induce life history transitions in choanoflagellates (Alegado et al., 2012; Woznica et al., 2016; 2017), although the mechanisms by which choanoflagellates capture bacteria and sense bacterial cues are unknown. We hypothesize that the core TLR/NF-κB pathway functioned in prey sensing, immunity, or more complex processes in the Urchoanozoan that subsequently formed the basis of a self-defense system in animals. Because critical pathway members linking TLR and NF-κB appear to be animal innovations (e.g., MyD88), the animal signaling pathway may have evolved to diversify downstream signaling processes to tailor responses in a multicellular context. This pathway diversification may have included the evolution of roles in development, as TLRs have been implicated in both NF-κB-dependent and NF-κB-independent developmental signaling (in addition to their functions in immunity) in bilaterians and in the cnidarian N. vectensis (Leulier and Lemaitre, 2008; Brennan et al., 2017). The uncharacterized choanoflagellate-specific Kinase TLRs and Kinase TIRs (Figure 4b,c) may function as part of a streamlined signaling pathway that mediates responses to extracellular cues, including bacteria, although further research will be required to test this hypothesis.

Our study provides a detailed view of the changes in gene content that laid the foundation for animal origins. Innovations in gene and protein regulation in the Urmetazoan also likely contributed to animal evolution, as features of animal phosphoproteome remodeling, gene co-regulation and alternative splicing (but not animal promoter types and enhancers) have been found in C. owczarzaki (Sebé-Pedrós et al., 2013b; 2016a; 2016b) and in the holozoan Creolimax fragrantissima (de Mendoza et al., 2015). Multicellularity has also evolved independently in a number of other eukaryotic lineages, including slime molds, brown algae, fungi, and chlorophytes. In each transition to multicellularity, the underlying genomic changes are distinct from those that occurred on the animal stem lineage. For example, in the social amoeba Dictyostelium discoideum, many novel gene families are involved in extracellular sensing (Glöckner et al., 2016), similar to the marked increase in signal transduction gene families found in the multicellular brown alga Ectocarpus siliculosus (Cock et al., 2010). Gene innovations in multicellular ascomycete fungi are enriched for functions related to endomembrane organelles (Nguyen et al., 2017) and the gene complement of the multicellular green alga Volvox carteri is largely distinguished from its unicellular relative Chlamydomonas reinhardtii by expansions or contractions within gene families, rather than the evolution of new families (Prochnik et al., 2010). Through our analyses of genomes and transcriptomes representing the full breadth of choanoflagellate and animal diversity, we have provided a genome-scale overview of the gene families whose invention or co-option distinguished the Urmetazoan from all other organisms and therefore may have provided a basis for the evolution of unique mechanisms regulating development, homeostasis and immunity in animals.

Materials and methods

The sections on Quality trimming, Error correction, De novo transcriptome assembly, Identification and removal of cross-contamination, Prediction of amino acid sequences from assembled transcripts and elimination of redundant transcripts, and Measurement of expression levels and elimination of noise transcripts were described in (Peña et al., 2016). They are repeated here for convenience and clarity (with modifications to the text but not to the underlying methods).

Share this article

Cite this article

Representative choanoflagellates analyzed in this study.

The evolution of gene families in animals, choanoflagellates and their eukaryotic relatives.

Figure 2—source data 1

Gene family retention in animals, choanoflagellates, and Capsaspora owczarzaki.

Core animal-specific gene families that are present in all animals in this study

Evolution of the TLR signaling pathway.

Author details

Daniel J Richter

Contribution

Competing interests

Parinaz Fozouni

Contribution

Competing interests

Michael B Eisen

Contribution

Competing interests

Nicole King

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Further reading