Metagenomics of the Water Column in the Pristine Upper Course of the Amazon River

Rohit Ghai; Francisco Rodŕíguez-Valera; Katherine D. McMahon; Danyelle Toyama; Raquel Rinke; Tereza Cristina Souza de Oliveira; José Wagner Garcia; Fernando Pellon de Miranda; Flavio Henrique-Silva

doi:10.1371/journal.pone.0023785

Abstract

River water is a small percentage of the total freshwater on Earth but represents an essential resource for mankind. Microbes in rivers perform essential ecosystem roles including the mineralization of significant quantities of organic matter originating from terrestrial habitats. The Amazon river in particular is famous for its size and importance in the mobilization of both water and carbon out of its enormous basin. Here we present the first metagenomic study on the microbiota of this river. It presents many features in common with the other freshwater metagenome available (Lake Gatun in Panama) and much less similarity with marine samples. Among the microbial taxa found, the cosmopolitan freshwater acI lineage of the actinobacteria was clearly dominant. Group I Crenarchaea and the freshwater sister group of the marine SAR11 clade, LD12, were found alongside more exclusive and well known freshwater taxa such as Polynucleobacter. A metabolism-centric analysis revealed a disproportionate representation of pathways involved in heterotrophic carbon processing, as compared to those found in marine samples. In particular, these river microbes appear to be specialized in taking up and mineralizing allochthonous carbon derived from plant material.

Citation: Ghai R, Rodŕíguez-Valera F, McMahon KD, Toyama D, Rinke R, Cristina Souza de Oliveira T, et al. (2011) Metagenomics of the Water Column in the Pristine Upper Course of the Amazon River. PLoS ONE 6(8): e23785. https://doi.org/10.1371/journal.pone.0023785

Editor: Purification Lopez-Garcia, Université Paris Sud, France

Received: April 15, 2011; Accepted: July 25, 2011; Published: August 19, 2011

Copyright: © 2011 Ghai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by Petróleo Brasileiro S.A. - Petrobras, as part of a research agreement (#0050.0045454.08-4) with the Federal University of São Carlos, SP, Brazil, within the context of the Cognitus Project. FHS is recipient of a Research Productivity Scholarship from The National Council for Research and Development (CNPq #305655/2009-4). DT and RR were recipients of a master scholarship from the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (CAPES). This work was also supported by projects MAGYK (BIO2008-02444), MICROGEN (Programa CONSOLIDER-INGENIO 2010 CSD2009-00006), METAGHALO (BIO2009-10138) and HALOMET (CGL2010-19303) from the Spanish Ministerio de Ciencia e Innovación, DIMEGEN (PROMETEO/2010/089) from the Generalitat Valenciana. RG was supported by a Juan de la Cierva scholarship, from the Spanish Ministerio de Ciencia e Innovación. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: José Wagner Garcia is employed by Noosfera Projetos Especiais Ltda, São Paulo, Brazil. Fernando Pellon de Miranda is employed by Petróleo Brasileiro S.A - Petrobras, Brasil. Francisco Rodriguez-Valera is an academic editor at PLoS.

Introduction

River water, around which civilizations flourish, is only a very small percentage (0.006%) of the total freshwater on earth, and a miniscule 0.0002% of the total water in the hydrosphere [1]. Even so, freshwater habitats like rivers, streams, lakes and wetlands, provide invaluable ecosystem services to human populations in the form of drinking water, recreation, and fisheries. They also play a previously underestimated but surprisingly important role in the oxidation, storage, and release of terrestrial carbon, thereby affecting global carbon budgets [2], [3], [4]. The Amazon river basin is the largest river basin in the world, comprising ∼40% of the total area of the continent of South America. The Amazon river itself is by far the largest river in the world in terms of volumetric discharge (6.3 trillion m³/year), with a length of 6280 km, a catchment area the size of ∼7000 square kilometres, and runoff amounting to nearly 15% of the total runoff of all the rivers in the world together. The tropical rainforest surrounding the river is an extraordinarily diverse ecosystem, boasting thousands of plant and animal species, many endemic to it, with several regions still untouched by anthropogenic pressure. However, such a situation may not continue for long since the extent of pristine waters in these regions might be dramatically reduced in the future. This, of course, has serious consequences for the macrofauna and flora, the most visible components of the ecosystem. However, the less visible, microscopic component of this habitat has been largely ignored. We have practically no information on the major microbial species that dominate the Amazon River.

Freshwater habitats harbor microbial taxa distinct from those routinely detected in marine and terrestrial ecosystems [5], [6]. Notably, surveys targeting 16S rRNA genes in freshwater lakes and reservoirs (reviewed in [7]) have identified several “cosmopolitan” and prevalent freshwater lineages of Betaproteobacteria [8], [9], [10], [11], [12] and Actinobacteria [13], [14], [15], [16], [17]. Other commonly recovered lineages belong to the Bacteroidetes, Verrucomicrobia, and Alphaproteobacteria. Cyanobacteria and Gammaproteobacteria are retrieved frequently in particular types of lakes (e.g. highly productive or polluted). Much has been learnt about the primary factors determining lake bacterioplankton community composition, such as trophic status [18], [19], pH [20], [21], landscape position [22], and retention time [20]. Comparatively fewer studies have been conducted on lotic systems, and these frequently focused on rivers or streams much smaller than the Amazon. Many have specifically targeted biofilm communities [23]. Pelagic or suspended bacterioplankton in large flowing rivers tends to comprise freshwater taxa routinely detected in lakes [24], [25], [26], but very little is known about their traits or functions. Though there have been instances of 16 s rRNA based surveys of river waters [27], surprisingly, only three metagenomic studies exploring the functional capabilities of freshwater microbes have been published to date, representing tropical lakes Gatun (Panama) [28], and Samsonvale (Australia) [29] and the temperate eutrophic Lac du Bourget (France) [30]. Apart from the large sequencing effort for the Lake Gatun sample, the other studies were relatively small (0.15 Mb Samsonvale, 11 Mb Lac du Bourget).

This is the first study to present the analysis of a large (375 Mb) freshwater metagenomic dataset. Besides, we provide a first glimpse into the functional characteristics of microbiota of the water column of the Solimões-Amazon (henceforth referred to as the Amazon) river in its upper reaches (about 420 km upstream from Manaus), a region relatively pristine with little (if any) human impact and more than 1300 km from the nearest ocean (the Atlantic). Total DNA from a single sample from the mid water column, which provides us with a snapshot of this part of the river, was directly 454 pyrosequenced providing 375 Mb of sequence data. Here we present the analysis of this dataset.

Results and Discussion

Sample Description

The source of the Amazon river is in the Andes mountains. Several rivers gradually merge together (mainly the Rio Ucayali and Rio Marañón from the Peruvian Andes) to form the Amazon proper. The river and its tributaries, including the floodplains, cover about 300,000 km². The floodplains are of fundamental importance to the flora and fauna of the region. Three types of streams are usually distinguished in the Amazon hydrological basin; clearwater, white-water and blackwater [31]. Clearwater streams are low in suspended sediment and dissolved carbon, whereas white-water streams have high sediment concentrations. Both types have a near-neutral pH. Blackwater streams, on the other hand, are low in suspended sediment but are rich in dissolved organic matter and have a lower pH (about 4–5). The sample analyzed here is a typical example of mainstem white water. The sample was collected at a site nearly 400 km upstream from Manaus, Brazil, at the end of the dry season (September, average rainfall for the month 50 mm) and was taken from mid stream and mid waters (8 m depth with the bottom at 15 m) (Figure S1). This site is upstream from the Coari Terminal of Petrobras (Brazil's state-controlled petroleum company), and still in a pristine condition. Physicochemical parameters of the sample are provided in Table S1. One full plate of 454 sequencing yielded nearly one million reads (n = 1153502), with an average read length of 325 bases. This amounts to a total of 375 Mb, the largest reported so far for any freshwater habitat. The next largest dataset from a freshwater metagenome is that of Lake Gatun, a freshwater lake near the Panama Canal that was sampled during the Global Ocean Survey (GOS) [28], and the only freshwater sample in that collection (total 315 Mb). We note, however, that the Lake Gatun sample comprised the 0.1–0.8 µm size fraction from the near surface (2 m deep) while our Amazon sample constituted the 0.2–5 µm fraction and was taken from mid depth (8 m).

Comparative characteristics of freshwater metagenomic datasets

The GC content plot of metagenomic datasets is often characteristic with widely different values and shape depending on the habitat. Two extremes are marine oligotrophic water samples with a marked low GC peak (∼32%) and soil with a distinctly higher GC peak (∼65%) (Figure 1). The GC% of the Amazon dataset, along with Gatun had a distinctive bimodal distribution with almost equally sized peaks of low and high GC% (∼45% and 65%) (Figure 1). The low GC peak of ca. 45% might be a freshwater hallmark since it was found in both freshwater metagenomes, but its difficult to make a generalization based on only two samples. However, in both estuary samples from the GOS datasets two peaks were again observed, one coinciding with the purely marine waters (∼35%) and the other (∼45%) with the low GC% peak of the Amazon and Gatun datasets.

Download:

Figure 1. GC content of diverse metagenomic datasets.

The GC% profile of the Amazon reads is shown (red) in comparison to a freshwater datasets (Lake Gatun), an estuary (Chesapeake Bay), a typical marine sample (Mediterranean Deep Chlorophyll Maximum) and a soil metagenome (Waseca County Soil).

https://doi.org/10.1371/journal.pone.0023785.g001

Salinity has been shown to be a critical environmental factor in determining community structure of the microbiota [32]. We compared the sequences retrieved here with other aquatic metagenomic datasets with different salinities using all versus all comparisons of reads with BLASTN. The most similar sample to the Amazon metagenome was the freshwater sample from Lake Gatun, followed by two estuary samples (Figure 2 and Figure S2). As a reference we included in the comparison metagenomes from hypersaline ponds and the divergence between the two freshwater datasets was similar to that found between a 19% (5 times seawater salinity) and 37% (10 times seawater salinity). We know that these two environments still share many microbes at least at the level of genus [33]. A closer comparison to the Lake Gatun dataset revealed that the Amazon reads could overlap with nearly 55% of the Lake Gatun sequences (using Amazon metagenome reads as queries, 50% query coverage and with %identity of >90%), indicating a high amount of similar sequences in these two habitats, i.e. extremely closely related microbes inhabiting both locations. Reference genomes were used to recruit reads common to both datasets, to determine which organisms might be common to both systems (Table S2).

Download:

Figure 2. Metagenomic dataset similarity to each other.

The tree shows the relatedness of diverse metagenomic datasets based on Jaccard distance derived from all versus all comparisons of reads with BLASTN. Datasets of different salinities were chosen for the comparison. Freshwater: Amazon, Lake Gatun; Marine: Mediterranean DCM, Sargasso Sea GS000a, HOTS 4000 m; Hypersaline: Punta Cormoran 6% (salinity) and Santa Pola salterns of three different salinities 19%, 32% and 37%). Waseca county soil dataset was used as an outgroup to construct the tree (see methods).

https://doi.org/10.1371/journal.pone.0023785.g002

Microbiota of the Amazon

To analyze the community structure of the mainstem Amazon we have used the results from the direct 454 pyrosequencing and classified the 16S rRNA gene fragments recovered from the metagenome (722 sequences) (Figure 3). The advantage of this approach is that it is free from any amplification (PCR) or cloning bias [34]. The most frequently recovered 16S rRNA gene sequences were affiliated with the Actinobacteria and Proteobacteria (Alpha, Beta and Gamma) (Figure 3). We also attempted to bin all 1.1 million pyrosequencing reads with the MG-RAST server and found very similar results (∼49% % of reads could be assigned to a taxon) (Figure 3 and Table 1). Finally, the recruitment of genomes of representative strains that were detected was analyzed to assess the coverage and the overall similarity of the relatives in the Amazon dataset (Figure S3).

Download:

Figure 3. Phylogenetic profile of the Amazon metagenome.

Phylogenetic profile of the Amazon metagenome is shown using two different approaches, one, using the 16 s rRNA sequences gathered from the metagenome, and the other using all the reads and comparing against sequenced microbial genomes (using the MG-RAST server).

https://doi.org/10.1371/journal.pone.0023785.g003

Download:

Table 1. Sequenced genomes with most hits to the Amazon metagenome (using the MG-RAST server).

https://doi.org/10.1371/journal.pone.0023785.t001

All approaches confirmed the presence of large numbers of Actinobacteria (20–25% of reads). Although this was not unexpected since they have been found in freshwaters of all kinds (still and flowing), [7], [13], [14], [17]. Actinobacterial genomes recruited smaller numbers individually but when pooled together they recruited the most (20% of binned reads). Actually, the available actinobacterial genomes recruited at a very low similarity (60% mean percentage identity in translated sequence comparisons) indicating a very distant relationship with the ones found in the Amazon. Cosmopolitan lineages of freshwater Actinobacteria have remained difficult to culture and thus no complete genomes are yet available. In the absence of complete genome information, we performed phylogenetic analysis of the actinobacterial rRNA reads in context of nearly full-length rRNA sequences from freshwater Actinobacteria recovered previously using PCR-based 16S rRNA gene cloning to determine more specifically which acintobacterial clades were present in the metagenome sample. A total of 135 reads were identified as containing partial 16S rRNA gene sequences related to Actinobacteria, and these were inserted into a maximum likelihood phylogenetic tree constructed with nearly full-length 16S rRNA sequences (Figure 4). Based on this analysis, most were affiliated with the acI lineage (73%) and with the acIV lineage (17%). A few were also assigned to the acII, acTH2, acVII, and acSTL lineages. Therefore, we infer that most of the metagenomic reads binned as Actinobacteria are derived either from acI or acIV, both of which are cosmopolitan freshwater taxa found in many different types of lakes, reservoirs, and rivers. It has been proposed that these have a photoheterotrophic lifestyle since rhodopsins affiliated to actinobacteria have been described (actinorhodopsins) in Lake Gatun [35]. In our dataset as well, the abundance of actinorhodopsins was evident (71 out of a total of 125).

Download:

Figure 4. Phylogenetic affiliation of the Actinobacterial reads of the Amazon metagenome.

Most of the 16S rRNA reads affiliated with the freshwater acI and acIV lineages. Phylogenetic reconstruction was conducted by maximum likelihood (RAxML) with near full length (>1300 nt) reference 16S rRNA gene sequences from a manually curated alignment [32] and highly variable positions masked. Metagenomic reads were added without altering tree topology using maximum parsimony criterion and a 50% base frequency filter in the ARB software package. Bootstrap values are indicated above nodes with greater than 60% support and the scale bar represents 10 base substitutions per 100 nt positions.

https://doi.org/10.1371/journal.pone.0023785.g004

The individual reads adscribed to Actinobacteria were low GC (Figure 5), as has been recently described for the abundant freshwater actinobacteria [36]. It is interesting that the only large freshwater metagenome available in databases (Lake Gatun) also contained a massive fraction (close to 40%) of actinobacterial reads, explaining a large part of the sequence overlap between the two environments (∼47% of the overlap). A soil metagenome (Waseca County Soil Sample) and the deep sea HOTS 4000 m sample showed only the presence of high GC Actinobacteria. The other important taxa that contributed prominently to the low GC reads in the Amazon dataset were the Crenarchaea and Bacteroidetes (Figure S4).

Download:

Figure 5. GC% of actinobacterial reads in diverse metagenomic datasets.

The GC% of all reads assigned to actinobacteria in different datasets is shown. Datasets shown are the Amazon metagenome (red), Lake Gatun (black), HOTS 4000 m (deep blue), Waseca County Soil (Brown). The inset shows the % of actinobacterial reads in each dataset.

https://doi.org/10.1371/journal.pone.0023785.g005

Betaproteobacteria were the next most dominant group. They were represented by a diverse collection of organisms, with members of the Polynucleobacter (Pnec) and Methylophilaceae (betIV) lineages comprising 9% and 22% of betaproteobacterial rRNA reads, respectively. In addition, the Polynucleobacter sp. QLW-P1DMWA-1 genome recruited more reads than any other available bacterial genome, although at low similarity (Table 1 and Figure S3). The presence of Polynucleobacter relatives is not surprising as it is among the most well known and widely distributed bacterium that has been isolated from both lentic and lotic habitats all over the world [8], [11].

Even though archaea accounted for only about 6% of the 16S rRNA detected, the genome that recruited the most reads (3% of binned reads) was Nitrosopumilus maritimus, an ammonia oxidizing crenarchaeon isolated originally from a marine fish tank [37] (Table 1 and Figure S3). Crenarchaea were originally thought to be constrained to harsh environmental niches but several surveys have now demonstrated the ubiquity of crenarchaea in aquatic systems [38]. A few studies done in freshwater systems have demonstrated the role of group 1 crenarchaeota in ammonia-oxidation [39], [40]. We examined the abundance of archaeal reads in diverse metagenomic datasets and they always comprised a small percentage (less than 6%) of the total (Figure S5). For instance, the deep ocean HOTS 4000 m sample [41] which was the highest percentage of archaeal reads we found in the datasets we examined, contained 5.8% archaeal reads. Besides, only the Amazon and Lake Gatun datasets were found to contain more Crenarchaea than Euryarchaea, while the opposite was observed for a soil metagenome and the Mediterranean Deep Chlorophyll Maximum, and a selected sample from Sargasso Sea and the deep ocean HOTS 4000 m sample had nearly equal amounts of both taxa (Figure S5). It is clear that crenarchaea are numerically dominant in freshwater systems than in marine and thus probably have a greater role to play in nutrient cycling in both lentic and lotic habitats, as some previous PCR studies of 16S rRNA genes already indicated [42], [43], [44].

Assembly of the entire dataset using stringent cutoffs to avoid chimeric assemblies (see methods) yielded a total of 65 contigs larger than 3 kb. Of these, 16 were clearly crenarchaeal and 7 were of alphaproteobacterial origin. The taxonomic affiliation of the remaining contigs could not be ascertained. In the crenarchaeal contigs, the majority of the predicted genes had highest similarities to N. maritimus and to a lesser extent, to Cenarchaeum symbiosium or to uncultured crenarchaotes. All these contigs were low GC%, with the mean GC% of the genes ∼39%, fitting well with the known low GC% of N. maritimus genome (34%), but not with C. symbiosum (57%). The assembly of these contigs directly from the metagenomic data alone provides convincing evidence that freshwater crenarchaeotes are indeed abundant organisms in the sample and may not be very diverse (compared to others that might be abundant but do not assemble). Remarkably, one of the contigs ascribed to crenarchaea was syntenic to the genomic region in the N. maritimus genome containing the important genes for ammonia-oxidation (amoA, amoB and amoC genes) (Figure 6). A search for ammonium monoxygenases (all subunits), using several representative archaeal proteins always retrieved better hits (% similarity ranging from 85% to 98%) than the bacterial homologs that yielded lower similarity levels (% similarity ranging from 45% to 82%), indicating that ammonia-oxidation in this microbial community is a function primarily performed by archaea. Other approaches have shown previously that in marine, freshwater and soil environments archaeal oxidation of ammonia to nitrite has been shown to be dominant over bacterial contribution [40], [43], [45].

Download:

Figure 6. Synteny of assembled crenarchaeal contig from the Amazon metagenome to the Nitrosopumilus maritimus genomic region containing the genes for ammonia oxidation.

The Amazon contig is shown on top and the N. maritimus genomic region is shown below. The colors indicate the level of %identity (using TBLASTX) between the sequences.

https://doi.org/10.1371/journal.pone.0023785.g006

An unexpected result from the taxonomic analysis was the presence of a close relative of Acinetobacter baumannii, a gammaproteobacterium often isolated from nosocomial infections. The genome of a human isolate, strain ATCC 17978, recruited a high number of reads (2.38% of binned reads) at very high % identity levels (Table 1 and Figure S3). Acinetobacters are routinely isolated from soil and water samples [46], [47]. However, they are not generally detected using PCR-based 16S rRNA gene surveys of freshwater [6], [7] suggesting they are usually present only at low abundance compared to typical freshwater organisms. Thus, the presence of an Acinetobacter species sharing so much genetic content with a previously sequenced human isolate in our sample might be the result of contamination during sample manipulation. However, closer inspection of the metagenomic reads affiliating with Acinetobacter 16S rRNA genes (72% of gammaproteobacteria-affiliated 16S rRNA reads) showed that nearly half the reads had the highest similarity to several other non-baumanii Acinetobacter species making it difficult to definitively conclude the identity of the microbes recruiting the reads. Besides, 16S rRNA gene PCR amplification in a total of four independent samples taken from the river mainstem, at this and other locations revealed large numbers of 16S rRNA that could be assigned to the genus Acinetobacter, but not the species A. baumannii. (data not shown). In any case, the natural habitat (reservoir) of nosocomial A. baumannii has been elusive and there has been recent evidence pointing towards a freshwater origin, e.g. it has been proven that freshwater amoeba can aid the survival of A. baumannii by inclusion in cysts [48]. Therefore, the presence in river waters of A. baumanii genotypes that might be eventually involved in nosocomial outbreaks is not out of the question. In fact, the ‘outbreak’ of a similarly unlikely nosocomial pathogen has been observed in a pristine South American lagoon system [49].

The genome of the marine, alpha-proteobacterial SAR11 strain Candidatus Pelagibacter ubique HTCC1062 also recruited a substantial number of reads from the Amazon dataset (Table 1 and Figure S3) and the Lake Gatun metagenomic dataset (Table S3). The SAR11 lineage is the most dominant bacterioplankton in the oceanic waters and the existence of a freshwater sister group (LD12 or alfV-A) has been known for some time [6], [7]. Genomic sequence information for LD12 is scarce, apart from 16S rRNA gene sequences. However, phylogenetic studies have indicated that there are distinct differences between the freshwater and the marine groups [5]. Both appear to have adapted independently to their niches with almost no mixing despite enormous population sizes. Although the Candidatus Pelagibacter ubique HTCC1062 genome recruited many reads from both the Amazon and Lake Gatun datasets, the Alphaproteobacteria in these systems are clearly distinct from the marine SAR11 lineage, sharing 64% mean protein sequence identity with the Pelagibacter genome. Notably, nearly one-third of rhodopsins found in our dataset (41 out of 125) were binned as alphaproteobacterial, indicating that this freshwater clade, like the marine SAR11, might also use rhodopsins for photoheterotrophy.

Seven contigs contained several genes most similar to Candidatus Pelagibacter species, supporting the abundant presence of the freshwater group LD12 in the freshwater sample. These contigs were also low GC% (mean GC% was 31%), fitting well with the low GC% of known Candidatus Pelagibacter genomes (∼30%). It has been hypothesized that low GC is an adaption by the marine SAR11 clade to counter the low availability of nitrogen in the oligotrophic ocean waters [50]. However, the GC content of the Amazon dataset reads with best hits to this genome was also very low (∼32%), suggesting that the LD12 group also has a low GC content. This section of the Amazon river is unlikely to be nitrogen limited (implying that there may be other factors involved in determining GC content than nitrogen availability alone).

Signatures of heterotrophy

To explore functional differences between flowing freshwater and marine microbial communities, we compared the Amazon sample to selected, representative marine metagenomic datasets using STAMP [51]. We chose the dataset of the Mediterranean Deep Chlorophyll Maximum (DCM), that was generated in a similar fashion [34], and three warm water sample datasets from a latitude similar to the Amazon from the GOS dataset; GS15, coastal sample from the Caribbean Sea, Off Key West Florida; GS16, a coastal sea sample from and GS26, and open ocean sample 134 miles from the Galapagos Islands. There are fundamental ecosystem-level differences between these marine and the freshwater dataset of the Amazon. For example, the DCM is located in an oligotrophic, phosphate poor environment and has an abundant photosynthetic cyanobacterial population, while phytoplankton production has been found to be limited in the Amazon mainstem, due to turbidity [52]. The DCM is a more stable and a more or less closed system with very low amount of external inputs in the form of organic matter. The three other marine samples are typical oligotrophic sea surface samples as well. The Amazon, however, is a more dynamic, physically mobile system, with continuous terrestrial organic matter inputs.

One of the most striking differences between the Amazon and the marine datasets was a strong signature of microbial heterotrophy in the Amazon (File S1). Marine samples contained more reads from photosystems (I and II) and carotenoid biosynthesis (for primary production), and also of uptake systems for choline and betaine (compatible solutes). The latter are hallmarks of osmotically unbalanced habitats, including seawater. The Amazon sample, on the other hand, was enriched in pathways required for degrading many diverse carbon sources including aromatic compounds, (e.g. phenylacetate, benzoate) and amino acids (e.g. Histidine, Leucine, Isoleucine, Arginine). Formate hydrogenase, the Entner-Doudoroff pathway for glycolysis, and TCA cycle genes were markedly over-represented in the Amazon as compared to the GOS samples. In addition, polyhydroxybutyrate metabolism genes were also more abundant indicating extensive use of this important carbon storage material by the freshwater microbiota. These results are consistent with the Amazon being a net heterotrophic ecosystem. Indeed, the bulk of the organic matter in the river originates from the surrounding forest [53] and others have measured high rates of microbial respiration, even in the absence of high primary production [54].

Closer inspection of the pathways involved in carbon metabolism revealed that microbes in the Amazon are likely processing organic matter fixed by terrestrial plants, providing further evidence for allochthony (File S1). For example, phenylacetate and phenylpropanoids are aromatic secondary metabolites produced by plants [55]. Some of the latter are precursors for the formation of lignin, an important structural polymer in plant cell walls. Lignin is a highly refractory compound and resistant to degradation, but can be degraded by basidiomycete fungi (e.g. white rot) and bacteria. Degradation leads to the formation of ferulic, vanillic and protocatechiuc acids, all phenylpropanoids [56]. These can be used as a substrate by several bacteria as the sole source of carbon [57]. Several key proteins involved in the uptake (a vanillate transporter vanK), and subsequent conversion of ferulic acid to 3-carbon metabolites (Feruloyl-CoA synthetase, phenylpropanoid dioxygenase) were identified as abundant in the Amazon sample. Enzymes in a related pathway, also leading to the transport and production of 3-carbon metabolites from benzoic acid, were also identified. The genes conferring ability to assimilate alkanesulfonates were also overrepresented in the Amazon dataset. Soil is a major reservoir for alkanesulfonates derived originally from plant material [58].

Although it is tempting to conclude that the above characteristics of Amazon microbes can be attributed to the freshwater habitat more generally, we found many of the carbon-processing features were unique even in comparison to Lake Gatun. The Amazon sequences were also comparatively more enriched in genes for aromatic compound degradation, amino acid degradation, polyhydroxybutyrate metabolism, and alkanesulfonate assimilation. This indicates that these represent strategies utilized by the microbial community of the Amazon specifically. However, among the important pathways enriched in the Amazon sample relative to the marine samples, that were also found in Lake Gatun were genes in the Entner-Duodoroff pathway, archaeal Embden-Meyerhoff Pathway and gluconeogenesis, and glycerate metabolism (File S1). This implies fundamental differences in carbon metabolism pathways between freshwater and marine environments. We also did not find any evidence of differences in light-dependent metabolic genes (e.g. photolyases, rhodopsins) except for photosystem II genes which were underrepresented in comparison to Lake Gatun.

We also found genes involved in resistance to cobalt-zinc and cadmium overrepresented in the Amazon alone, compared to all these other datasets, and the majority of these genes could be attributed to betaproteobacteria, that are much more abundant in the Amazon (23% of classifiable reads) than in Lake Gatun (9% of classifiable reads).

To further infer unique functional features of the Amazon river as compared to Lake Gatun, we annotated all the PFAM domains found in the predicted proteins of the two datasets, and identified domains that were more frequent in one dataset versus the other. Among the domains that were markedly more frequent in the Amazon were a number of phage-protein related domains (e.g. coat proteins, terminases, portal proteins) (Table S4 and Table S5), indicating the presence of some Amazon-dataset specific phages. One particularly interesting feature was the overrepresentation of the Tannase domain in the Amazon dataset. This domain is found in proteins involved in degradation of aromatic compounds (e.g. tannins) [59], indicating that much this specific activity is much more common in the Amazon. This is another piece of evidence for the importance of terrestrial organic matter in Amazon carbon cycle. Analysis of the taxonomic distribution of the metagenomic reads revealed this function to be broadly distributed amongst Acidobacteria and Proteobacteria (alpha, beta and gamma) to nearly equal amounts.

However, the most frequently found protein domain in the Amazon dataset, and which was also overrepresented compared to Lake Gatun, was the Bug domain (Table S4), which is found in proteins that are extracytoplasmic tripartite tricarboxylic acid receptors [60]. The substrate binding specificity of these receptors is largely uncharacterized, but citrate, glutamate, and aspartate have been co-crystallized with Bug proteins [61]. This domain was found mainly in betaproteobacterial reads (83% of all reads with the Bug domain), which is consistent with previous studies searching for Bug genes in finished genomes [60]. Their abundance in the dataset indicates that tricarboxylic acids may be a common source of carbon in the flowing river or may simply reflect the abundance of betaproteobacteria in the Amazon (23% of all reads) as compared to Lake Gatun (9% of all reads).

Concluding Remarks

Much less is known about freshwater bacteria than marine ones, or those that impact human health directly (e.g. pathogenic bacteria). This is evidenced by the paucity of genomic sequences available for freshwater bacteria, for instance, Polynucleobacter necessarius is the only cosmopolitan freshwater bacteria that has one strain genome sequenced. This can also be attributed to the hurdles in culturing, which has been the main starting point of nearly all sequenced bacterial genomes. Even metagenomic studies, which provide a culture-free alternative, have been heavily biased towards marine systems. Freshwaters play a previously underestimated but surprisingly important role in the oxidation, storage, and release of terrestrial carbon, thereby affecting global carbon budgets [2], [3], [4]. Our results confirm the importance of heterotrophic metabolism in this large river. Most of the information that has been gathered about the freshwater microbes has been through 16S rRNA gene cloning studies and limited fosmid end sequencing [30], but our results should encourage additional efforts to better characterize the freshwater metagenome in both rivers and lakes, particularly with respect to carbon metabolism.

Importantly, we confirm by a metagenomic approach that actinobacteria are primary members of freshwater ecosystems. The GC content of these relatively unknown actinobacteria does not conform to the generally held views that free living actinobacteria have high GC content [36]. Also, the results indicate that the genomic sequence fragments that we have recovered from these actinobacteria do not correspond to any particular actinobacterial species, but to a multitude, indicating that they are quite divergent from all known actinobacteria.

Apart from actinobacteria, the genomic fragments of the freshwater sister group of Candidatus Pelagibacter (LD12) was also found to be quite abundant and distinct from the marine species. Another surprising finding was that maximum number of reads could be assigned not to a bacterial genome, but to the genome of N. maritimus, a crenarchaeon. These reads likely belong to a freshwater relative of this marine microbe. Moreover, it appears that crenarchaea in general seem to be much more abundant in freshwaters than in marine systems. These observations taken together indicate the near non existence of closely related genome sequences of microbes belonging to this dataset, and point to the high degree of uniqueness of this sample. This is an even more pressing reason to conduct further meta-'omics based work in these previously neglected ecosystems.

Materials and Methods

Sample collection and processing

The sample was collected in accordance with the Brazilian law (IN n° 154/2007 IBAMA, Brazilian Institute of Environment and Renewable Natural Resources). The sample was collected from a depth of 8 m from the Solimões-Amazon River (03°56′11.08″S, 63°10′14.45″W) with a 10 liter Niskin bottle on 17 September, 2008, at 2:00 pm. River depth and width at this location were 15 m and 2100 m respectively, and the sample was taken 600 m away from the shore. Some physico-chemical parameters are shown in Table S1.

Four liters of water were filtered in 3 independent Sterivex filters. Each sample was sequentially filtered through a AP20 glass fiber filter (142 mm of diameter); 5 µm pore size polycarbonate filter (142 mm of diameter) and 0.22 µm-pore size Sterivex GV filters (Durapore, Millipore) using a peristaltic pump. Sterivex filters (retaining the 0.2–5 µm-diameter planktonic cells) were filled up with lysis buffer (40 mM EDTA, 50 mM Tris/HCl, 0.75 M sucrose) and stored at −20°C until DNA extraction. The solution was harvested from the filters and DNA was extracted using the Metagenomic DNA from Environmental Water Samples Kit (EPICENTRE, WI, USA). DNA integrity was checked by agarose gel electrophoresis and quantified spectrophotometrically in a NanoDrop ND 1000 instrument (Thermo Scientific, DE, USA). Five micrograms was used for sequencing in the 454 Sequencing GS FLX Titanium platform (Laboratório Nacional de Computação Científica, Petrópolis, Rio de Janeiro, Brazil).

Annotation and assembly

The raw metagenomic data were initially processed using CLC Genomics Workbench 3.5, where sequences were quality trimmed. Sequences less than 60 bp were discarded. Assembly was performed using the CLC Genomics Workbench Assembler (>95% identity and >50% of read length). Gene prediction was performed on the assembled sequences using the program MGA (Noguchi et al., 2006). Protein sequences were annotated by using the best blast hit against the NCBI NR database. The entire unassembled dataset was also annotated using the MG-RAST server [62].

Community structure using all reads. For taxonomy, the dataset was compared using BLAST [63] to a combined database containing the NCBI- NT database and whole genome shotgun assembly data for 1000 draft microbial genomes from NCBI (http://www.ncbi.nlm.nih.gov/lproks.cgi). The data were analyzed using MEGAN [64]. The taxonomic analysis was also performed using the MG-RAST server using a cutoff of minimum alignment length ∼50 and an evalue cutoff of 1e-5.

Community structure using rRNA

Ribosomal rRNA gene sequences in the metagenomic data were identified by comparison against the RDP [65] using BLASTN. Only alignments longer than 100 bp were considered. Sequences were considered identified to the genus level if they had a %identity >95% with a well identified existing sequence that was not annotated as “uncultured” or “unidentified”. The best named hit to the metagenomic rRNA sequence was considered to assign the sequence to a high-level taxonomic group. Otherwise only higher taxonomic levels were used. Sequences that matched only those annotated as “unidentified” or “uncultured” and those with <90% identity to the best matching hit were considered unidentified.

Recruitment plots

Fragment recruitment of the Amazon dataset was performed against all complete and draft microbial genomes using BLASTN. The criteria for counting a hit were minimum %identity of 95% and minimum alignment of 50 bp. Data was plotted using R (http://cran.r-project.org). For selected genomes, recruitment was also performed using TBLASTX (evalue cutoff 1e-5 and minimum alignment length 50).

Comparison with GOS dataset

The Amazon data were compared against the entire Global Ocean Sampling (GOS) expedition data [28] using BLASTN, and a hit was counted using the criteria of minimum 95% identity and alignment length of at least 50 bases.

Clustering with selected metagenomic datasets

All vs all comparison was performed using BLASTN with all selected datasets. Only blast hits with >70% identity and >100 bp length were considered. The Jaccard distance D between two datasets (A and B) can be computed as followswhere AB_s is the total bit score of all the common hits between datasets A and B, AA_s and BB_s are the total bit scores of the comparisons of the datasets A and B to themselves. The tree was built using the neighbor program in the PHYLIP package.

Functional profile comparison

Comparison of functional profiles was performed using STAMP [51]. The minimum %identity was 70% and the minimum alignment length was ∼100. Statistical significance of the differences between samples was assessed by the Two-sided Fisher's Exact test and Storey's FDR method was used for multiple test correction The most important metabolic categories were selected by filtering by q-value (0.05), and using only those categories that had at least 100 sequences and more than 2-fold ratio between the proportions.

Pfam Domain analysis

Orf prediction in the metagenomic reads was performed using FragGeneScan [66] and HMMER3 package was used to identify Pfam domains in translated protein sequences >60 aa in length. In the comparisons to detect more frequent protein domains in dataset X versus dataset Y, the abundance ratio is computed as % of domain in dataset X/% of domain in dataset Y.

Accession numbers

Author Contributions

Conceived and designed the experiments: FRV FHS. Performed the experiments: DT RR TCSdO JWG FPdM FHS. Analyzed the data: RG FRV KDM. Wrote the paper: RG FRV KDM.

References

1. Shiklomanov I, Rodda J, editors. (2003) World water resources at the beginning of the 21st century.
2. Cole JJ, Prairie YT, Caraco NF, McDowell WH, Tranvik LJ, et al. (2007) Plumbing the global carbon cycle: Integrating inland waters into the terrestrial carbon budget. Ecosystems 10: 171–184.
- View Article
- Google Scholar
3. Downing JA, Prairie YT, Cole JJ, Duarte CM, Tranvik LJ, et al. (2006) The global abundance and size distribution of lakes, ponds, and impoundments. Limnology and Oceanography 51: 2388–2397.
- View Article
- Google Scholar
4. Downing JA, Cole JJ, Middelburg JJ, Striegl RG, Duarte CM, et al. (2008) Sediment organic carbon burial in agriculturally eutrophic impoundments over the last century. Global Biogeochemical Cycles 22: 10.
- View Article
- Google Scholar
5. Logares R, Brate J, Heinrich F, Shalchian-Tabrizi K, Bertilsson S (2010) Infrequent transitions between saline and fresh waters in one of the most abundant microbial lineages (SAR11). Mol Biol Evol 27: 347–357.
- View Article
- Google Scholar
6. Zwart G, Crump BC, Agterveld MPKV, Hagen F, Han SK (2002) Typical freshwater bacteria: an analysis of available 16S rRNA gene sequences from plankton of lakes and rivers. Aquatic Microbial Ecology 28: 141–155.
- View Article
- Google Scholar
7. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S (2011) A guide to the natural history of freshwater lake bacteria. Microbiol Mol Biol Rev 75: 14–49.
- View Article
- Google Scholar
8. Hahn MW (2003) Isolation of strains belonging to the cosmopolitan Polynucleobacter necessarius cluster from freshwater habitats located in three climatic zones. Appl Environ Microbiol 69: 5248–5254.
- View Article
- Google Scholar
9. Wu QL, Hahn MW (2006) Differences in structure and dynamics of Polynucleobacter communities in a temperate and a subtropical lake, revealed at three phylogenetic levels. FEMS Microbiol Ecol 57: 67–79.
- View Article
- Google Scholar
10. Hahn MW, Lang E, Brandt U, Lunsdorf H, Wu QL, et al. (2010) Polynucleobacter cosmopolitanus sp. nov., free-living planktonic bacteria inhabiting freshwater lakes and rivers. Int J Syst Evol Microbiol 60: 166–173.
- View Article
- Google Scholar
11. Jezberova J, Jezbera J, Brandt U, Lindstrom ES, Langenheder S, et al. (2010) Ubiquity of Polynucleobacter necessarius ssp. asymbioticus in lentic freshwater habitats of a heterogeneous 2000 km area. Environ Microbiol 12: 658–669.
- View Article
- Google Scholar
12. Jezbera J, Jezberova J, Brandt U, Hahn MW (2011) Ubiquity of Polynucleobacter necessarius subspecies asymbioticus results from ecological diversification. Environ Microbiol 13: 922–931.
- View Article
- Google Scholar
13. Glockner FO, Zaichikov E, Belkova N, Denissova L, Pernthaler J, et al. (2000) Comparative 16S rRNA analysis of lake bacterioplankton reveals globally distributed phylogenetic clusters including an abundant group of actinobacteria. Appl Environ Microbiol 66: 5053–5065.
- View Article
- Google Scholar
14. Warnecke F, Amann R, Pernthaler J (2004) Actinobacterial 16S rRNA genes from freshwater habitats cluster in four distinct lineages. Environ Microbiol 6: 242–253.
- View Article
- Google Scholar
15. Allgaier M, Bruckner S, Jaspers E, Grossart HP (2007) Intra- and inter-lake variability of free-living and particle-associated Actinobacteria communities. Environ Microbiol 9: 2728–2741.
- View Article
- Google Scholar
16. Hahn MW (2009) Description of seven candidate species affiliated with the phylum Actinobacteria, representing planktonic freshwater bacteria. Int J Syst Evol Microbiol 59: 112–117.
- View Article
- Google Scholar
17. Jezbera J, Sharma AK, Brandt U, Doolittle WF, Hahn MW (2009) ‘Candidatus Planktophila limnetica’, an actinobacterium representing one of the most numerically important taxa in freshwater bacterioplankton. Int J Syst Evol Microbiol 59: 2864–2869.
- View Article
- Google Scholar
18. Jones SE, Newton RJ, McMahon KD (2009) Evidence for structuring of bacterial community composition by organic carbon source in temperate lakes. Environmental Microbiology 11: 2463–2472.
- View Article
- Google Scholar
19. Yannarell AC, Kent AD, Lauster GH, Kratz TK, Triplett EW (2003) Temporal patterns in bacterial communities in three temperate lakes of different trophic status. Microb Ecol 46: 391–405.
- View Article
- Google Scholar
20. Lindstrom ES, Kamst-Van Agterveld MP, Zwart G (2005) Distribution of typical freshwater bacterial groups is associated with pH, temperature, and lake water retention time. Appl Environ Microbiol 71: 8201–8206.
- View Article
- Google Scholar
21. Newton RJ, Jones SE, Helmus MR, McMahon KD (2007) Phylogenetic ecology of the freshwater Actinobacteria acI lineage. Appl Environ Microbiol 73: 7169–7176.
- View Article
- Google Scholar
22. Yannarell AC, Triplett EW (2005) Geographic and environmental sources of variation in lake bacterial community composition. Appl Environ Microbiol 71: 227–239.
- View Article
- Google Scholar
23. Rubin MA, Leff LG (2007) Nutrients and other abiotic factors affecting bacterial communities in an Ohio River (USA). Microb Ecol 54: 374–383.
- View Article
- Google Scholar
24. Crump BC, Armbrust EV, Baross JA (1999) Phylogenetic analysis of particle-attached and free-living bacterial communities in the Columbia river, its estuary, and the adjacent coastal ocean. Appl Environ Microbiol 65: 3192–3204.
- View Article
- Google Scholar
25. Lemke MJ, Lienau EK, Rothe J, Pagioro TA, Rosenfeld J, et al. (2009) Description of freshwater bacterial assemblages from the upper parana river floodpulse system, Brazil. Microb Ecol 57: 94–103.
- View Article
- Google Scholar
26. Sekiguchi H, Watanabe M, Nakahara T, Xu B, Uchiyama H (2002) Succession of bacterial community structure along the Changjiang River determined by denaturing gradient gel electrophoresis and clone library analysis. Appl Environ Microbiol 68: 5142–5150.
- View Article
- Google Scholar
27. Winter C, Hein T, Kavka G, Mach RL, Farnleitner AH (2007) Longitudinal changes in the bacterial community composition of the Danube River: a whole-river approach. Appl Environ Microbiol 73: 421–431.
- View Article
- Google Scholar
28. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5: e77.
- View Article
- Google Scholar
29. Pope PB, Patel BKC (2008) Metagenomic analysis of a freshwater toxic cyanobacteria bloom. Fems Microbiology Ecology 64: 9–27.
- View Article
- Google Scholar
30. Debroas D, Humbert JF, Enault F, Bronner G, Faubladier M, et al. (2009) Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a mesotrophic lake (Lac du Bourget–France). Environ Microbiol 11: 2412–2424.
- View Article
- Google Scholar
31. Sioli H (1950) Das Wasser im Amazonasgebiet. Forschungen und Fortschritte 26: 274–280.
- View Article
- Google Scholar
32. Lozupone CA, Knight R (2007) Global Patterns in Bacterial Diversity. Proc Natl Acad Sci U S A 104: 11436–11440.
- View Article
- Google Scholar
33. Oren A (2002) Halophilic Microorganisms and their Environments. Kluwer Academic Publishers.
34. Ghai R, Martin-Cuadrado AB, Molto AG, Heredia IG, Cabrera R, et al. (2010) Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing. Isme J 4: 1154–1166.
- View Article
- Google Scholar
35. Sharma AK, Zhaxybayeva O, Papke RT, Doolittle WF (2008) Actinorhodopsins: proteorhodopsin-like gene sequences found predominantly in non-marine environments. Environ Microbiol 10: 1039–1056.
- View Article
- Google Scholar
36. Ghai R, McMahon KD, Rodriguez-Valera F (2011) Breaking a Paradigm: Cosmopolitan and Abundant Freshwater Actinobacteria are Low GC. in press.
37. Konneke M, Bernhard AE, de la Torre JR, Walker CB, Waterbury JB, et al. (2005) Isolation of an autotrophic ammonia-oxidizing marine archaeon. Nature 437: 543–546.
- View Article
- Google Scholar
38. Callieri C, Corno G, Caravati E, Rasconi S, Contesini M, et al. (2009) Bacteria, archaea, and crenarchaeota in the epilimnion and hypolimnion of a deep holo-oligomictic lake. Appl Environ Microbiol 75: 7298–7300.
- View Article
- Google Scholar
39. Liu Z, Huang S, Sun G, Xu Z, Xu M (2010) Diversity and abundance of ammonia-oxidizing archaea in the Dongjiang River, China. Microbiol Res.
- View Article
- Google Scholar
40. Rogers DR, Casciotti KL (2010) Abundance and diversity of archaeal ammonia oxidizers in a coastal groundwater system. Appl Environ Microbiol 76: 7938–7948.
- View Article
- Google Scholar
41. Konstantinidis KT, Braff J, Karl DM, DeLong EF (2009) Comparative metagenomic analysis of a microbial community residing at a depth of 4,000 meters at station ALOHA in the North Pacific subtropical gyre. Appl Environ Microbiol 75: 5345–5355.
- View Article
- Google Scholar
42. Auguet JC, Barberan A, Casamayor EO (2010) Global ecological patterns in uncultured Archaea. ISME Journal 4: 182–190.
- View Article
- Google Scholar
43. Lliros M, Gich F, Plasencia A, Auguet JC, Darchambeau F, et al. (2010) Vertical distribution of ammonia-oxidizing crenarchaeota and methanogens in the epipelagic waters of Lake Kivu (Rwanda-Democratic Republic of the Congo). Appl Environ Microbiol 76: 6853–6863.
- View Article
- Google Scholar
44. Urbach E, Vergin KL, Larson GL, Giovannoni SJ (2007) Bacterioplankton communities of Crater Lake, OR: dynamic changes with euphotic zone food web structure and stable deep water populations. Hydrobiologia 574: 161–177.
- View Article
- Google Scholar
45. Agogue H, Brink M, Dinasquet J, Herndl GJ (2008) Major gradients in putatively nitrifying and non-nitrifying Archaea in the deep North Atlantic. Nature 456: 788–791.
- View Article
- Google Scholar
46. Jung J, Baek JH, Park W (2010) Complete genome sequence of the diesel-degrading Acinetobacter sp. strain DR1. J Bacteriol 192: 4794–4795.
- View Article
- Google Scholar
47. Peleg AY, Seifert H, Paterson DL (2008) Acinetobacter baumannii: emergence of a successful pathogen. Clin Microbiol Rev 21: 538–582.
- View Article
- Google Scholar
48. Cateau E, Verdon J, Fernandez B, Hechard Y, Rodier MH (2011) Acanthamoeba sp. promotes the survival and growth of Acinetobacter baumanii. FEMS Microbiol Lett 319: 19–25.
- View Article
- Google Scholar
49. Piccini C, Conde D, Alonso C, Sommaruga R, Pernthaler J (2006) Blooms of single bacterial species in a coastal lagoon of the southwestern Atlantic Ocean. Appl Environ Microbiol 72: 6560–6568.
- View Article
- Google Scholar
50. Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, et al. (2005) Genome streamlining in a cosmopolitan oceanic bacterium. Science 309: 1242–1245.
- View Article
- Google Scholar
51. Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26: 715–721.
- View Article
- Google Scholar
52. Fisher T (1979) Plankton and primary production in aquatic systems of the Central American Basin. Comparative Biochemistry and Physiology 62A: 31–38.
- View Article
- Google Scholar
53. Hedges J, Cowie G, Richey J, Quay P, Benner R, et al. (1994) Origins and Processing of Organic Matter in the Amazon River as Indicated by Carbohydrates and Amino Acids. Limnology and Oceanography 39: 743–761.
- View Article
- Google Scholar
54. Richey J, Hedges J, Devol A, Quay P, Victoria R, et al. (1990) Biogeochemistry of Carbon in the Amazon River. Limnology and Oceanography 35: 352–371.
- View Article
- Google Scholar
55. Vogt T (2010) Phenylpropanoid biosynthesis. Mol Plant 3: 2–20.
- View Article
- Google Scholar
56. Narbad A, Gasson MJ (1998) Metabolism of ferulic acid via vanillin using a novel CoA-dependent pathway in a newly-isolated strain of Pseudomonas fluorescens. Microbiology 144(Pt 5): 1397–1405.
- View Article
- Google Scholar
57. Merkens H, Beckers G, Wirtz A, Burkovski A (2005) Vanillate metabolism in Corynebacterium glutamicum. Curr Microbiol 51: 59–65.
- View Article
- Google Scholar
58. Autry AR, Fitzgerald JW (1990) Sulfonate-S - A major form of forest soil organic sulfur. Biology and Fertility of Soils 10: 50–56.
- View Article
- Google Scholar
59. Hatamoto O, Watarai T, Kikuchi M, Mizusawa K, Sekine H (1996) Cloning and sequencing of the gene encoding tannase and a structural study of the tannase subunit from Aspergillus oryzae. Gene 175: 215–221.
- View Article
- Google Scholar
60. Antoine R, Jacob-Dubuisson F, Drobecq H, Willery E, Lesjean S, et al. (2003) Overrepresentation of a Gene Family Encoding Extracytoplasmic Solute Receptors in Bordetella. Journal of Bacteriology 185: 1470–1474.
- View Article
- Google Scholar
61. Fischer M, Zhang QY, Hubbard RE, Thomas GH (2010) Caught in a TRAP: substrate-binding proteins in secondary transport. Trends Microbiol 18: 471–478.
- View Article
- Google Scholar
62. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, et al. (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9: 386.
- View Article
- Google Scholar
63. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
- View Article
- Google Scholar
64. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17: 377–386.
- View Article
- Google Scholar
65. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141–145.
- View Article
- Google Scholar
66. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38: e191.
- View Article
- Google Scholar

[ref1] 1. Shiklomanov I, Rodda J, editors. (2003) World water resources at the beginning of the 21st century.

[ref2] 2. Cole JJ, Prairie YT, Caraco NF, McDowell WH, Tranvik LJ, et al. (2007) Plumbing the global carbon cycle: Integrating inland waters into the terrestrial carbon budget. Ecosystems 10: 171–184.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Downing JA, Prairie YT, Cole JJ, Duarte CM, Tranvik LJ, et al. (2006) The global abundance and size distribution of lakes, ponds, and impoundments. Limnology and Oceanography 51: 2388–2397.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Downing JA, Cole JJ, Middelburg JJ, Striegl RG, Duarte CM, et al. (2008) Sediment organic carbon burial in agriculturally eutrophic impoundments over the last century. Global Biogeochemical Cycles 22: 10.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Logares R, Brate J, Heinrich F, Shalchian-Tabrizi K, Bertilsson S (2010) Infrequent transitions between saline and fresh waters in one of the most abundant microbial lineages (SAR11). Mol Biol Evol 27: 347–357.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Zwart G, Crump BC, Agterveld MPKV, Hagen F, Han SK (2002) Typical freshwater bacteria: an analysis of available 16S rRNA gene sequences from plankton of lakes and rivers. Aquatic Microbial Ecology 28: 141–155.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref7] 7. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S (2011) A guide to the natural history of freshwater lake bacteria. Microbiol Mol Biol Rev 75: 14–49.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref8] 8. Hahn MW (2003) Isolation of strains belonging to the cosmopolitan Polynucleobacter necessarius cluster from freshwater habitats located in three climatic zones. Appl Environ Microbiol 69: 5248–5254.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref9] 9. Wu QL, Hahn MW (2006) Differences in structure and dynamics of Polynucleobacter communities in a temperate and a subtropical lake, revealed at three phylogenetic levels. FEMS Microbiol Ecol 57: 67–79.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Hahn MW, Lang E, Brandt U, Lunsdorf H, Wu QL, et al. (2010) Polynucleobacter cosmopolitanus sp. nov., free-living planktonic bacteria inhabiting freshwater lakes and rivers. Int J Syst Evol Microbiol 60: 166–173.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Jezberova J, Jezbera J, Brandt U, Lindstrom ES, Langenheder S, et al. (2010) Ubiquity of Polynucleobacter necessarius ssp. asymbioticus in lentic freshwater habitats of a heterogeneous 2000 km area. Environ Microbiol 12: 658–669.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. Jezbera J, Jezberova J, Brandt U, Hahn MW (2011) Ubiquity of Polynucleobacter necessarius subspecies asymbioticus results from ecological diversification. Environ Microbiol 13: 922–931.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. Glockner FO, Zaichikov E, Belkova N, Denissova L, Pernthaler J, et al. (2000) Comparative 16S rRNA analysis of lake bacterioplankton reveals globally distributed phylogenetic clusters including an abundant group of actinobacteria. Appl Environ Microbiol 66: 5053–5065.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Warnecke F, Amann R, Pernthaler J (2004) Actinobacterial 16S rRNA genes from freshwater habitats cluster in four distinct lineages. Environ Microbiol 6: 242–253.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Allgaier M, Bruckner S, Jaspers E, Grossart HP (2007) Intra- and inter-lake variability of free-living and particle-associated Actinobacteria communities. Environ Microbiol 9: 2728–2741.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. Hahn MW (2009) Description of seven candidate species affiliated with the phylum Actinobacteria, representing planktonic freshwater bacteria. Int J Syst Evol Microbiol 59: 112–117.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref17] 17. Jezbera J, Sharma AK, Brandt U, Doolittle WF, Hahn MW (2009) ‘Candidatus Planktophila limnetica’, an actinobacterium representing one of the most numerically important taxa in freshwater bacterioplankton. Int J Syst Evol Microbiol 59: 2864–2869.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref18] 18. Jones SE, Newton RJ, McMahon KD (2009) Evidence for structuring of bacterial community composition by organic carbon source in temperate lakes. Environmental Microbiology 11: 2463–2472.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref19] 19. Yannarell AC, Kent AD, Lauster GH, Kratz TK, Triplett EW (2003) Temporal patterns in bacterial communities in three temperate lakes of different trophic status. Microb Ecol 46: 391–405.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref20] 20. Lindstrom ES, Kamst-Van Agterveld MP, Zwart G (2005) Distribution of typical freshwater bacterial groups is associated with pH, temperature, and lake water retention time. Appl Environ Microbiol 71: 8201–8206.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Newton RJ, Jones SE, Helmus MR, McMahon KD (2007) Phylogenetic ecology of the freshwater Actinobacteria acI lineage. Appl Environ Microbiol 73: 7169–7176.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref22] 22. Yannarell AC, Triplett EW (2005) Geographic and environmental sources of variation in lake bacterial community composition. Appl Environ Microbiol 71: 227–239.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Rubin MA, Leff LG (2007) Nutrients and other abiotic factors affecting bacterial communities in an Ohio River (USA). Microb Ecol 54: 374–383.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Crump BC, Armbrust EV, Baross JA (1999) Phylogenetic analysis of particle-attached and free-living bacterial communities in the Columbia river, its estuary, and the adjacent coastal ocean. Appl Environ Microbiol 65: 3192–3204.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Lemke MJ, Lienau EK, Rothe J, Pagioro TA, Rosenfeld J, et al. (2009) Description of freshwater bacterial assemblages from the upper parana river floodpulse system, Brazil. Microb Ecol 57: 94–103.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Sekiguchi H, Watanabe M, Nakahara T, Xu B, Uchiyama H (2002) Succession of bacterial community structure along the Changjiang River determined by denaturing gradient gel electrophoresis and clone library analysis. Appl Environ Microbiol 68: 5142–5150.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref27] 27. Winter C, Hein T, Kavka G, Mach RL, Farnleitner AH (2007) Longitudinal changes in the bacterial community composition of the Danube River: a whole-river approach. Appl Environ Microbiol 73: 421–431.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref28] 28. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5: e77.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref29] 29. Pope PB, Patel BKC (2008) Metagenomic analysis of a freshwater toxic cyanobacteria bloom. Fems Microbiology Ecology 64: 9–27.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref30] 30. Debroas D, Humbert JF, Enault F, Bronner G, Faubladier M, et al. (2009) Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a mesotrophic lake (Lac du Bourget–France). Environ Microbiol 11: 2412–2424.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref31] 31. Sioli H (1950) Das Wasser im Amazonasgebiet. Forschungen und Fortschritte 26: 274–280.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref32] 32. Lozupone CA, Knight R (2007) Global Patterns in Bacterial Diversity. Proc Natl Acad Sci U S A 104: 11436–11440.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref33] 33. Oren A (2002) Halophilic Microorganisms and their Environments. Kluwer Academic Publishers.

[ref34] 34. Ghai R, Martin-Cuadrado AB, Molto AG, Heredia IG, Cabrera R, et al. (2010) Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing. Isme J 4: 1154–1166.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref35] 35. Sharma AK, Zhaxybayeva O, Papke RT, Doolittle WF (2008) Actinorhodopsins: proteorhodopsin-like gene sequences found predominantly in non-marine environments. Environ Microbiol 10: 1039–1056.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref36] 36. Ghai R, McMahon KD, Rodriguez-Valera F (2011) Breaking a Paradigm: Cosmopolitan and Abundant Freshwater Actinobacteria are Low GC. in press.

[ref37] 37. Konneke M, Bernhard AE, de la Torre JR, Walker CB, Waterbury JB, et al. (2005) Isolation of an autotrophic ammonia-oxidizing marine archaeon. Nature 437: 543–546.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref38] 38. Callieri C, Corno G, Caravati E, Rasconi S, Contesini M, et al. (2009) Bacteria, archaea, and crenarchaeota in the epilimnion and hypolimnion of a deep holo-oligomictic lake. Appl Environ Microbiol 75: 7298–7300.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref39] 39. Liu Z, Huang S, Sun G, Xu Z, Xu M (2010) Diversity and abundance of ammonia-oxidizing archaea in the Dongjiang River, China. Microbiol Res.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref40] 40. Rogers DR, Casciotti KL (2010) Abundance and diversity of archaeal ammonia oxidizers in a coastal groundwater system. Appl Environ Microbiol 76: 7938–7948.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref41] 41. Konstantinidis KT, Braff J, Karl DM, DeLong EF (2009) Comparative metagenomic analysis of a microbial community residing at a depth of 4,000 meters at station ALOHA in the North Pacific subtropical gyre. Appl Environ Microbiol 75: 5345–5355.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref42] 42. Auguet JC, Barberan A, Casamayor EO (2010) Global ecological patterns in uncultured Archaea. ISME Journal 4: 182–190.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref43] 43. Lliros M, Gich F, Plasencia A, Auguet JC, Darchambeau F, et al. (2010) Vertical distribution of ammonia-oxidizing crenarchaeota and methanogens in the epipelagic waters of Lake Kivu (Rwanda-Democratic Republic of the Congo). Appl Environ Microbiol 76: 6853–6863.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref44] 44. Urbach E, Vergin KL, Larson GL, Giovannoni SJ (2007) Bacterioplankton communities of Crater Lake, OR: dynamic changes with euphotic zone food web structure and stable deep water populations. Hydrobiologia 574: 161–177.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref45] 45. Agogue H, Brink M, Dinasquet J, Herndl GJ (2008) Major gradients in putatively nitrifying and non-nitrifying Archaea in the deep North Atlantic. Nature 456: 788–791.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref46] 46. Jung J, Baek JH, Park W (2010) Complete genome sequence of the diesel-degrading Acinetobacter sp. strain DR1. J Bacteriol 192: 4794–4795.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref47] 47. Peleg AY, Seifert H, Paterson DL (2008) Acinetobacter baumannii: emergence of a successful pathogen. Clin Microbiol Rev 21: 538–582.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref48] 48. Cateau E, Verdon J, Fernandez B, Hechard Y, Rodier MH (2011) Acanthamoeba sp. promotes the survival and growth of Acinetobacter baumanii. FEMS Microbiol Lett 319: 19–25.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref49] 49. Piccini C, Conde D, Alonso C, Sommaruga R, Pernthaler J (2006) Blooms of single bacterial species in a coastal lagoon of the southwestern Atlantic Ocean. Appl Environ Microbiol 72: 6560–6568.
View Article
Google Scholar

[140] View Article

[141] Google Scholar

[ref50] 50. Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, et al. (2005) Genome streamlining in a cosmopolitan oceanic bacterium. Science 309: 1242–1245.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref51] 51. Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26: 715–721.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref52] 52. Fisher T (1979) Plankton and primary production in aquatic systems of the Central American Basin. Comparative Biochemistry and Physiology 62A: 31–38.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref53] 53. Hedges J, Cowie G, Richey J, Quay P, Benner R, et al. (1994) Origins and Processing of Organic Matter in the Amazon River as Indicated by Carbohydrates and Amino Acids. Limnology and Oceanography 39: 743–761.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref54] 54. Richey J, Hedges J, Devol A, Quay P, Victoria R, et al. (1990) Biogeochemistry of Carbon in the Amazon River. Limnology and Oceanography 35: 352–371.
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref55] 55. Vogt T (2010) Phenylpropanoid biosynthesis. Mol Plant 3: 2–20.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref56] 56. Narbad A, Gasson MJ (1998) Metabolism of ferulic acid via vanillin using a novel CoA-dependent pathway in a newly-isolated strain of Pseudomonas fluorescens. Microbiology 144(Pt 5): 1397–1405.
View Article
Google Scholar

[161] View Article

[162] Google Scholar

[ref57] 57. Merkens H, Beckers G, Wirtz A, Burkovski A (2005) Vanillate metabolism in Corynebacterium glutamicum. Curr Microbiol 51: 59–65.
View Article
Google Scholar

[164] View Article

[165] Google Scholar

[ref58] 58. Autry AR, Fitzgerald JW (1990) Sulfonate-S - A major form of forest soil organic sulfur. Biology and Fertility of Soils 10: 50–56.
View Article
Google Scholar

[167] View Article

[168] Google Scholar

[ref59] 59. Hatamoto O, Watarai T, Kikuchi M, Mizusawa K, Sekine H (1996) Cloning and sequencing of the gene encoding tannase and a structural study of the tannase subunit from Aspergillus oryzae. Gene 175: 215–221.
View Article
Google Scholar

[170] View Article

[171] Google Scholar

[ref60] 60. Antoine R, Jacob-Dubuisson F, Drobecq H, Willery E, Lesjean S, et al. (2003) Overrepresentation of a Gene Family Encoding Extracytoplasmic Solute Receptors in Bordetella. Journal of Bacteriology 185: 1470–1474.
View Article
Google Scholar

[173] View Article

[174] Google Scholar

[ref61] 61. Fischer M, Zhang QY, Hubbard RE, Thomas GH (2010) Caught in a TRAP: substrate-binding proteins in secondary transport. Trends Microbiol 18: 471–478.
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref62] 62. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, et al. (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9: 386.
View Article
Google Scholar

[179] View Article

[180] Google Scholar

[ref63] 63. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
View Article
Google Scholar

[182] View Article

[183] Google Scholar

[ref64] 64. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17: 377–386.
View Article
Google Scholar

[185] View Article

[186] Google Scholar

[ref65] 65. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141–145.
View Article
Google Scholar

[188] View Article

[189] Google Scholar

[ref66] 66. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38: e191.
View Article
Google Scholar

[191] View Article

[192] Google Scholar

Metagenomics of the Water Column in the Pristine Upper Course of the Amazon River

Metagenomics of the Water Column in the Pristine Upper Course of the Amazon River

Correction

Figures

Abstract

Introduction

Results and Discussion

Sample Description

Comparative characteristics of freshwater metagenomic datasets

Microbiota of the Amazon

Signatures of heterotrophy

Concluding Remarks

Materials and Methods

Sample collection and processing

Annotation and assembly

Community structure using rRNA

Recruitment plots

Comparison with GOS dataset

Clustering with selected metagenomic datasets

Functional profile comparison

Pfam Domain analysis

Accession numbers

Supporting Information

Figure S1.

Figure S2.

Figure S3.

Figure S4.

Figure S5.

Table S1.

Table S2.

Table S3.

Table S4.

Table S5.

File S1.

Acknowledgments

Author Contributions

References