A global survey of arsenic-related genes in soil microbiomes

Dunivin, Taylor K.; Yeh, Susanna Y.; Shade, Ashley

doi:10.1186/s12915-019-0661-5

Research article
Open access
Published: 30 May 2019

A global survey of arsenic-related genes in soil microbiomes

BMC Biology volume 17, Article number: 45 (2019) Cite this article

8792 Accesses
67 Citations
16 Altmetric
Metrics details

Abstract

Background

Environmental resistomes include transferable microbial genes. One important resistome component is resistance to arsenic, a ubiquitous and toxic metalloid that can have negative and chronic consequences for human and animal health. The distribution of arsenic resistance and metabolism genes in the environment is not well understood. However, microbial communities and their resistomes mediate key transformations of arsenic that are expected to impact both biogeochemistry and local toxicity.

Results

We examined the phylogenetic diversity, genomic location (chromosome or plasmid), and biogeography of arsenic resistance and metabolism genes in 922 soil genomes and 38 metagenomes. To do so, we developed a bioinformatic toolkit that includes BLAST databases, hidden Markov models and resources for gene-targeted assembly of nine arsenic resistance and metabolism genes: acr3, aioA, arsB, arsC (grx), arsC (trx), arsD, arsM, arrA, and arxA. Though arsenic-related genes were common, they were not universally detected, contradicting the common conjecture that all organisms have them. From major clades of arsenic-related genes, we inferred their potential for horizontal and vertical transfer. Different types and proportions of genes were detected across soils, suggesting microbial community composition will, in part, determine local arsenic toxicity and biogeochemistry. While arsenic-related genes were globally distributed, particular sequence variants were highly endemic (e.g., acr3), suggesting dispersal limitation. The gene encoding arsenic methylase arsM was unexpectedly abundant in soil metagenomes (median 48%), suggesting that it plays a prominent role in global arsenic biogeochemistry.

Conclusions

Our analysis advances understanding of arsenic resistance, metabolism, and biogeochemistry, and our approach provides a roadmap for the ecological investigation of environmental resistomes.

Background

Microbial communities drive global biogeochemical cycles through diverse functions. The biogeography of functional genes can help to predict and manage the influence of microbial communities on biogeochemical cycling [1]. These trait-based analyses require that the functional genes are well-characterized from both evolutionary and genetic perspectives [2]. The arsenic resistance and metabolism genes exemplify a suite of well-characterized functional genes that have consequences for biogeochemistry. Arsenic is a toxic metalloid that, upon exposure, can have negative effects for all life, including humans, livestock, and microorganisms. The toxicity and mobility of arsenic depends, in part, on its oxidation state: the trivalent arsenite is more mobile and more toxic than the pentavalent arsenate [3]. The toxicity of methylated arsenic species varies with oxidation state and number of methyl groups (monomethyl, dimethyl, trimethyl). Pentavalent methylarsenicals are progressively less toxic than inorganic arsenate, while trivalent methylarsenicals are progressively more toxic than inorganic arsenite with the exception of trimethylarsine which is the least toxic arsenic species [4, 5]. Additionally, volatilization of arsenic can occur through methylation [6], which has varied impacts. Methylated forms of arsenic can be released to new areas through air [7], captured during bioremediation [8], or accumulate in crops such as rice [9]. Microbial transformations of arsenic can have consequences for arsenic speciation and methylation; therefore, they impact arsenic ecotoxicity and the fate of arsenic in the environment.

Arsenic biogeochemical cycling by microbial communities is both an ancient [10, 11] and a contemporary [3, 12] phenomenon. Changes to the methylation or oxidation state of arsenic alter biogeochemical cycling of arsenic, and microbes have evolved a variety of mechanisms to carry out these functions. Arsenic-related genes are generally separated into two categories: resistance and metabolism [13]. Arsenic resistance, or detoxification, is encoded by the ars operon [14]. The ars operon protects the cell from arsenic but does not detoxify arsenic itself in the environment. This operon includes arsenite efflux (ArsB, Acr3) which is potentially precluded by cytoplasmic arsenate reduction with either glutaredoxin (ArsC (grx)) or thioredoxin (ArsC (trx)) [14]. Arsenic metabolisms include methylation (ArsM), oxidation (AioAB, ArxAB), and dissimilatory reduction (ArrAB) [13]. While these genetic determinants of arsenic detoxification and metabolism are well-characterized, the full scope of arsenic detoxification and metabolism gene distribution, diversity, and interspecies transfer is unknown [15,16,17].

Microbial arsenic resistance is reportedly widespread in the environment. Arsenic-resistant organisms have been found in sites with low arsenic concentrations (< 7 ppm) [18, 19], and it has been speculated that nearly all organisms have arsenic resistance genes [20]. While the number of identified microorganisms with arsenic resistance genes continues to grow [13], the number of microorganisms without arsenic resistance genes is unclear. Furthermore, though the complete arsenic biogeochemical cycle has been detected in the environment [10], the relative contributions of genes encoding detoxification and metabolism remain unknown [11]. A global, biogeographic perspective of environmental arsenic-related genes would improve understanding of their ecology. This information would expand foundational knowledge of arsenic detoxification and metabolism, including local and global abundances, gene diversity, dispersal across different environments, and representations over the microbial tree of life.

Knowledge gaps concerning the diversity of microbial arsenic-related genes are driven, in part, by numerous inconsistencies in nomenclature and detection methods. Though public microbial metagenome and genome data continue to surge, there are several practical hurdles to achieving a robust, global assessment of microbial arsenic-related genes from this wealth of data. First, tools to detect these genes rely on imperfect annotation [15] and widely vary in nomenclature [21]. Next, the use of different reference databases [12, 22,23,24,25] and normalization techniques [25, 26] complicates comparisons between studies. To overcome these hurdles, we developed an open-access toolkit to examine arsenic resistance and metabolism genes in microbial sequence datasets. This toolkit allowed us probe genomic and metagenomic datasets simultaneously to investigate arsenic-related genes in soil microbiomes. We first asked whether arsenic-related genes are universal in soil-associated microorganisms. Next, we tested the hypothesis that genes encoding arsenic detoxification are more abundant than those encoding arsenic metabolism. We also tested the hypothesis that arsenic resistance genes with redundant function (i.e., acr3 and arsB; arsC (grx) and arsC (trx)) would have complementary environmental abundances. Third, we asked whether estimations of arsenic-related gene abundance are biased by cultivation efforts, as cultivation is often a research emphasis because cultivable, arsenic-resistant microorganisms can be used in bioremediation [17]. Finally, we tested the hypothesis that sequence variants of arsenic-related genes are endemic, not cosmopolitan.

Results

A bioinformatic toolkit for detecting and quantifying arsenic-related genes

We developed a toolkit to improve investigations of microbial arsenic-related genes (Fig. 1a, b) [14, 31,32,33,34,35]. We selected these nine genes because they are markers of arsenic detoxification and metabolism [21, 25] and because their genetic underpinnings are well established. Seed sequences (high-quality and full-length sequences) for each gene of interest were collected and used to construct BLAST databases [30], functional gene (FunGene) databases [27], hidden Markov models (HMMs [36]), and gene resources for gene-targeted assembly (Xander [28]) (Fig. 1a). Altogether, this toolkit relies on consistent references and nomenclature and can search both amino acid and nucleotide sequence data.

To demonstrate the utility of our toolkit, we performed an analysis of arsenic-related genes in soil-associated genomes and metagenomes. We used HMMs for marker genes for arsenic detoxification and metabolism to search RefSoil+ genomes, a set of complete chromosomes and plasmids from cultivable soil microorganisms [37]. Additionally, we used a gene-targeted assembler [28] to test 38 public soil metagenomes from Brazil, Canada, Malaysia, Russia, and the USA for arsenic resistance and metabolism genes (Additional file 1). Ultimately, these data serve as a broad baseline of arsenic detoxification and metabolism genes in soil.

Phylogenetic distributions and genomic locations of arsenic-related genes

We asked whether arsenic resistance and metabolism genes were universal in RefSoil+ organisms [37]. Of the 922 RefSoil+ genomes spanning 25 phyla (Fig. 2b; Additional file 2), 14.3% (132 genomes) did not contain any tested arsenic-related genes. Of the 25 phyla in RefSoil+, two phyla (Chlamydiae and Crenarchaeota) did not have any of these genes. These phyla, however, had few RefSoil+ representatives (three and nine, respectively), so other members of these phyla may have arsenic detoxification and metabolism genes. Supporting this hypothesis, a Crenarchaeota isolate was previously reported to oxidize arsenic [38]. Nonetheless, these data suggest that arsenic-related genes are widespread, but not universal, even among cultivable soil organisms (Fig. 2).

We next asked whether 16S rRNA gene phylogeny was predictive of arsenic genotypes using a test for phylogenetic signal (Bloomberg’s K [39]). No phylogenetic signal was observed for plasmid-borne sequences or genes encoding arsenic metabolisms (aioA, arrA, arxA); however, relatively few RefSoil+ microorganisms tested positive for these genes. Despite their phylogenetic breadth (Additional files 3, 4, 5, 6, and 7), chromosomally encoded acr3, arsB, arsC (grx), arsC (trx), and arsM were similar between phylogenetically related organisms (false discovery rate adjusted p < 0.01; Fig. 2a).

Phylogenetic diversity of arsenic-related genes: insights into vertical and horizontal transfer

Arsenite efflux pumps

We examined the phylogenetic diversity of distinct genes encoding arsenite efflux pumps, acr3 and arsB, for soil-associated microorganisms (Fig. 3, Additional files 3 and 4). Gene acr3 is separated into two clades: acr3(1) and acr3(2) [40]. Clade acr3(1) is typically composed of Proteobacterial sequences while acr3(2) is typically composed of Firmicutes and Actinobacterial sequences [21, 40, 41]. Though RefSoil+ genomes were mostly composed of acr3(2) sequences from Proteobacteria (Fig. 3a; Additional file 3), we observed greater taxonomic diversity observed than previously reported for this clade [21, 40, 41]. Surprisingly, there were deep branches in acr3(2) that belonged to Bacteroidetes, Euryarchaeota, Firmicutes, Fusobacteria, and Verrucomicrobia. Similarly, acr3(1) contained closely related acr3 sequences present in a diverse array of phyla (10 out of 25). Both clades had sequences present on plasmids (6.1%). Plasmid-borne arsB sequences were only present in Proteobacteria and Deinococcus-Thermus strains (Fig. 3b; Additional file 4). Sequences from Actinobacteria, Proteobacteria, and Firmicutes were each present in two distinct phylogenetic groups, and previous studies also observed separation of arsB sequences based on phylum [40, 41]. Interestingly, our genome-centric analysis revealed that microorganisms with multiple copies of arsB did not harbor identical copies. For example, seven Bacillus subtilis subsp. subtilis strains had two copies of arsB, with one from each of the two clades (Additional file 4).

Cytoplasmic arsenate reductases

Cytoplasmic arsenate reductase (ArsC (trx)) was phylogenetically widespread in RefSoil+ microorganisms (Fig. 4a; Additional file 5). While some arsC (trx) sequences were plasmid-borne, the majority were chromosomally encoded. Similarly, plasmid-encoded arsC (grx) made up 4.6% of RefSoil+ hits (Fig. 4b; Additional file 6). Notably, several Proteobacteria strains have multiple copies of arsC (grx) with distinct sequences. It is possible that this is the result of an early gene duplication event or HGT of a second arsC (grx).

Arsenic metabolisms

arsM was relatively uncommon in RefSoil+ microorganisms (5.2%) (Fig. 2). In the RefSoil+ database, arsM was observed in Euryarchaeota as well as several bacterial phyla Acidobacteria, Actinobacteria, Armatimonadetes, Bacteroidetes, Chloroflexi, Cyanobacteria, Firmicutes, Gemmatimonadetes, Nitrospirae, Proteobacteria, and Verrucomicrobia (Fig. 5; Additional file 7). Notably, only one RefSoil+ microorganism, Rubrobacter radiotolerans (NZ_CP007516.1), had a plasmid-borne arsM.

Arsenic metabolism genes aioA, arrA, and arxA were phylogenetically conserved (Fig. 6). Genes encoding arsenite oxidases aioA and arxA were restricted to Proteobacteria. aioA sequences clustered into two clades based on class-level taxonomy: all Alphaproteobacteria sequences cluster separately from Gamma- and Betaproteobacteria sequences. The gene encoding dissimilatory arsenate reduction arrA was also phylogenetically conserved in RefSoil+ strains, with strains from Proteobacteria clustering separate from Firmicutes (Fig. 6).

Cultivation bias and environmental distributions of arsenic-related genes

To gain a cultivation-dependent perspective of the abundances of arsenic-related genes in soils, we used inferred environmental abundances of RefSoil microorganisms [42, 43]. The environmental abundance of RefSoil microorganisms, which are cultivable, soil-associated microorganisms, was previously estimated by comparing 16S rRNA gene sequences in RefSoil with those in soil metagenomes [42]. We used this estimated abundance of cultivable microorganisms along with arsenic-related gene information from this study (Fig. 2) to estimate the environmental abundances of arsenic-related genes from the cultivated bacteria. Arsenic metabolism genes (aioA, arrA, arsM, arxA) were predicted to be less common in the environment compared with arsenic detoxification genes (acr3, arsB, arsC (grx), arsC (trx), and arsD) (Fig. 7a; Mann-Whitney U test p < 0.01). Despite similar distributions of acr3 and arsB in RefSoil+ (Fig. 2b), acr3 was more abundant in most soil orders (Fig. 7a; Mann-Whitney U test p < 0.05). For genes encoding cytoplasmic arsenate reductases, arsC (grx) was more abundant than arsC (trx) (Mann-Whitney U test p < 0.01).

To gain a cultivation-independent perspective of the abundances of arsenic-related genes, we examined their normalized abundance from soil metagenomes (Fig. 7b). An undetected gene does not confirm absence, so we present a conservative estimate that only includes metagenomes testing positive for a gene. Arsenic detoxification genes (acr3, arsB, arsC (grx), arsC (trx), and arsD) were more abundant than arsenic metabolism genes (aioA, arrA, arsM, and arxA) (Mann-Whitney U test p < 0.01; Fig. 7b). Genes encoding arsenite efflux pumps differed in their abundance with acr3 being more abundant than arsB (Mann-Whitney U test p < 0.01). We also observed differences in cytoplasmic arsenate reductases: arsC (grx) was more abundant than arsC (trx) (Mann-Whitney U test p < 0.01).

We explored cultivation bias of arsenic-related genes with a case study comparing cultivation-dependent (lawn growth on the standard medium TSA50) and cultivation-independent communities from the same soil. Genes in the ars operon (acr3, arsB, arsD, and arsC (trx)) were elevated in the cultivation-dependent metagenome (Fig. 7c). Additionally, arsenic metabolism genes were not detected (aioA, arrA, arxA) or in low abundance (arsM) in the cultivation-dependent sample; however, all four of these arsenic metabolism genes were detected in the cultivation-independent sample. Though this is a single-case study of cultivation-dependent and cultivation-independent methods, these results recapitulate the general discrepancies between RefSoil+ genomes and soil metagenomes (Fig. 7b). This bias has important implications for studies focusing on arsenic bioremediation because cultivation-dependent studies could misestimate the potential of microbiomes for arsenic detoxification and metabolism in situ.

Arsenic-related gene endemism

Arsenic-related genes are globally distributed, but their biogeography is poorly understood. Broadly, arsenic-related genes had comparable abundance among different soils (Fig. 7a, b). The relative distributions of distinct arsenic detoxification and metabolism mechanisms in one site, however, are relevant for predicting the impact of microbial communities on the fate of arsenic. To understand site-specific distributions, we explored soil metagenomes from Brazil, Canada, Malaysia, Russia, and the USA (Additional file 1). These 16 sites had differences in community membership (Additional file 9) and arsenic-related gene content (Fig. 8a). Geographic location was not predictive of arsenic-related gene content (Mantel’s r = 0.03493; p > 0.05). Soils had different distributions of arsenic-related genes and therefore differed in their potential impact on the biogeochemical cycling of arsenic. While arsC (grx) and arsM dominated most samples, their relative proportions varied greatly (Fig. 8a). RefSoil+ data suggests that arsM can be found in Verrucomicrobia (100%, n = 2), which is of particular importance for soil metagenomes since Verrucomicrobia are often underestimated with cultivation-dependent methods [44]. The mangrove sample had the most even proportions of arsenic-related genes (Fig. 8a). This distribution was driven by a high abundance of arsC (trx) and arrA.

We further examined the arsenic resistance gene abundance at individual sites. We did not include arr and arx in this analysis due to limited available data. For each gene, the abundance varied greatly, but replicates within one site had similar abundances (Fig. 8b). The majority of arsenic-related gene sequences (99.3%) were endemic and only found in one to two sites, but 24 sequences were detected in three or more sites (Fig. 8c; Additional file 10). The majority (70.8%) of cosmopolitan sequences belonged to arsC (grx). This analysis suggests that arsenic-related genes acr3, arsB, arsC (trx), arsD, arsM, and aioA are generally endemic.

Discussion

A bioinformatic toolkit for detecting and quantifying arsenic-related genes

We developed a toolkit for detecting arsenic-related genes from sequence data that supports a variety of applications (Fig. 1a): arsenic-related genes can be detected in amino acid sequences from completed genomes (HMMs [29], BLAST [30]), nucleotide sequences in draft genomes (BLAST), and metagenomes and metatranscriptomes (Xander [28]). Because each tool relies on the same seed sequences, there is consistency and opportunity for comparison between sequence datasets that were generated from different sources. While primers already exist for arsenic-related genes: aioA [45, 46], acr3 [41], arsB [41], arsC (grx) [47], arsC (trx) [48], arsM [9], and arrA [49,50,51], these FunGene [27] databases can be used for testing primer breadth, designing new primers, and browsing sequences.

The toolkit is scalable for additional mechanisms for arsenic resistance and other functional genes of interest (e.g., methylarsenite oxidase (ArsH), C-As lyase (ArsI), trivalent organoarsenical efflux permease (ArsP), organoarsenical efflux permease (ArsJ) [20]), or redox transformations of elements involved in arsenic biogeochemical cycling (e.g., nitrate reductase (NarG) and sulfate reductase (DsrAB) [3, 20]). This toolkit serves as both a resource and an example workflow for developing similar toolkits to examine functional genes, beyond arsenic-related genes, in microbial sequence datasets.

Phylogenetic diversity and distribution of arsenic-related genes

It has been conjectured that nearly all organisms have arsenic resistance genes [20], and though this assumption has propagated in the literature, it had never been explicitly quantified. Our data suggest that arsenic detoxification and metabolism genes are ubiquitous, but not universal in RefSoil+ microorganisms (Fig. 2). It is possible for these 132 organisms to have untested or novel arsenic-related genes; nonetheless, these nine well-characterized genes were not universally detected. Additionally, phylogeny was predictive of the presence of acr3, arsB, arsC (grc), arsC (trx), and arsM. This correlation suggests that taxonomy is predictive of arsenic genotype despite documented potential for HGT [19, 40, 48, 52, 53]. This result could be explained by ancient rather than contemporary HGT, as seen with arsM [53] and arsC (grx) [48]. Therefore, we next assessed evidence for HGT by examining the phylogenetic congruence and genomic location (e.g., chromosome or plasmid) of arsenic-related gene sequences.

Horizontal transfer of arsenic-related genes has been well documented [19, 40, 48, 52,53,54,55] and is an important consideration for understanding the propagation and taxonomic identity of arsenic-related genes. We examined the phylogenetic diversity of arsenic-related genes in RefSoil+ microorganisms, including plasmids and chromosomes, and compared them with the 16S rRNA gene taxonomy.

Efflux pumps

While known acr3 sequences separate into two clades [21, 40, 41], plasmid-borne acr3 sequences were present across clades, suggesting a potential for transfer across unrelated taxa. Therefore, studies assigning taxonomy to acr3 in the absence of host information should consider the clade precisely and proceed with caution. Despite their functional redundancy as arsenite efflux pumps, acr3 and arsB have very distinctive diversity. As compared with acr3, arsB was less diverse and more phylogenetically conserved (Fig. 3b; Additional file 4). This observation is in agreement with previous reports comparing the diversity of arsB to acr3 [40, 41]. Multiple, phylogenetically distinct copies of arsB were present in some RefSoil+ organisms, which could be due to an early gene duplication and subsequent diversification or to an early transfer event. Therefore, despite relatively lower sequence variation, this arsB phylogeny suggests an interesting evolutionary history that could be investigated further.