Overview of experiment design
The synthetic microbial community DNA (sequins), which consists of 83 artificial sequences with the proportion of mass varying from 0.01 to 6.9% (Additional file 1: Table S1), was serially diluted to generate positive samples with 5000 pg, 500 pg, 50 pg, 5 pg, and 0.5 pg DNA as well as negative controls. The samples were then subjected to WGA-based and non-WGA-based library constructions. For the latter, libraries were prepared using five commercial kits capable of handling low-biomass samples, including two kits applying sonication-based fragmentation, i.e., Nugen Ovation Ultralow System V2 Kit (Son_N) and QIAGEN QIAseq Ultralow Input Library Kit (Son_Q); two kits applying endonuclease digestion, i.e., NEBNext Ultra II FS DNA Library Prep Kit (End_N) and QIAGEN QIAseq FX DNA Library Kit (End_Q); and one kit applying Tn5 tagmentation, i.e., Vazyme TruePrep DNA Library Prep Kit (Tn5_V) (Additional file 1: Table S2). Two experimental replicates were performed by different operators to evaluate the reproducibility of the methods. Shotgun sequencing was performed on all libraries, obtaining a median of 16.6 million reads per library (range 7.3–23.2 million, Additional file 2: Fig. S1a). The data from each library was separated into two parts, the synthetic sequins and the contaminants, which were used to evaluate library construction methods and in silico decontamination methods (Fig. 1, see rarefaction curves in Additional file 2: Fig. S1b and c).
WGA substantially changed the microbial composition with sub-nanogram templates
As expected, WGA elevated the DNA amount from 0.5–5000 pg to 18–34 μg. However, the proportion of reads originating from sequins was less in the WGA-based libraries than that in the non-WGA-based libraries when the template DNA was 50 pg or lower (median 86.0% vs. 99.2%, p < 0.01, Wilcoxon test). Meanwhile, the non-designated reads, including microbial and non-microbial contaminating reads and unclassified reads, increased markedly with the reduced amount of template DNA for WGA (Additional file 2: Fig. S2a). Besides, compared to the non-WGA-based libraries, the composition of sequins estimated from the WGA-based libraries was more distorted, especially with low template input, reflected by larger Jensen-Shannon distance (JSD) to the theoretical composition of sequins (p < 0.01, Wilcoxon test, Fig. 2a), poorer correlation between the input mass of each sequins component and corresponding read numbers (Fig. 2b), as well as forming a different cluster from theoretical sequins and non-WGA libraries on the principal coordinates analysis (PCoA) plot and heatmap (Additional file 2: Fig. S2b and c). In addition, the JSD between the sequins composition of experimental replicates of WGA-based libraries was greater than that of non-WGA-based libraries, indicating poor reproducibility of WGA in terms of component enrichment (Additional file 2: Fig. S2d).
Interestingly, we found that the fragment length was positively correlated with the magnitude of WGA enrichment bias (p < 0.001, linear regression, Fig. 2c), which was represented by the ratio between the abundance of each sequins component in WGA-based libraries and its mean abundance in non-WGA-based libraries. It suggested that the efficiency of WGA might be higher for longer template DNA. This is further supported by Nanopore sequencing data from a sputum sample, where the read lengths of the two dominant taxa rose from 323 to 2703 bp and from 381 to 3606 bp following WGA (Fig. 2d) respectively for Acinetobacter baumannii and Corynebacterium striatum. Meanwhile, the relative abundance of C. striatum, whose DNA template was longer, increased from 15.2 to 86.5% when WGA was implemented (Fig. 2e). Furthermore, we found that WGA treatment led to uneven genome coverage, particularly for 5 pg and 0.5 pg templates (Additional file 2: Fig. S2e and f) and that WGA biased toward DNA fragments with lower GC content (Additional file 2: Fig. S2g and h), which was in line with previous reports [5,6,7].
Collectively, WGA introduced a substantial amount of contaminants and showed poor fidelity for sub-nanogram templates, making it unsuitable for metagenomic profiling of low-biomass samples.
Evaluation of non-WGA-based DNA library preparation methods
We next evaluated the quality of libraries prepared using sub-nanogram templates without the implementation of WGA. First, the proportion of reads originating from sequins were higher than 90% for all methods using 500 pg and 5000 pg templates (94.81–99.77% with median 98.68%), but decreased dramatically when the template load became lower except for Tn5_V, with 93.7%, 78.21%, 66.29%, 1.64%, and 0.06% of total reads assigned to sequins for Tn5_V, End_N, End_Q, Son_N, and Son_Q, respectively, when the template load was 0.5 pg (Fig. 3a). Second, the library conversion rate (the rate between actual library quantity and theoretical library quantity) was the highest using End_N and End_Q, followed by Tn5_V, Son_N, and Son_Q (Fig. 3b). Third, the sonication-based libraries had lower complexity (i.e., higher duplication rate, the ratio of the number of duplicates over the number of total mapped reads) than the other methods (Fig. 3c). Moreover, the insertion size of library constructed with Tn5_V showed the most optimal distribution with a peak range of 292–371 bp, which was probably attributed to the stricter size selection applied in the protocol. In contrast, End_Q, End_N, and Son_Q showed smaller insertion sizes (peak 193–244 bp, 143–210 bp, 139–349 bp, respectively), while Son_N showed a larger insertion size with higher variance (range 181–634 bp, Additional file 2: Fig. S3a).
To test the fidelity of various library preparation methods, we calculated the JSD between the expected sequins composition and the measured profiles, which included both sequins and contaminating microbes. The endonuclease-based data showed the least deviation from the expected composition, followed by the tagmentation-based data, whereas the sonication-based data showed significantly higher deviation for samples with 50 pg or less input. After removing contaminants and only comparing the composition of sequins, we found similar results, with the exception that the performance of Son_N significantly improved (Fig. 3d). The fidelity might be associated with GC content bias, as there was a considerable shift of GC distribution of the reads toward lower GC values compared to the theoretical distribution, especially for the tagmentation-based method (Additional file 2: Fig. S3b). Besides, there was a weak positive correlation between the abundance bias (ratio between measured abundance and the theoretical value) and fragment length for all methods (p values < 0.001, linear regression), and the tagmentation-based method showed the highest coefficient of determination (R2 = 0.14, Additional file 2: Fig. S3c). Meanwhile, no correlation was observed between the length and GC content of sequins components (p = 0.76), suggesting that the fragment length may independently influence the fidelity. In addition, the reproducibility of all methods decreased gradually with reduced input, and endonuclease- and tagmentation-based methods performed better than sonication-based methods (Fig. 3e).
Summarizing three crucial metrics, i.e., fidelity of designated reads, the proportion of designated reads, and reproducibility, when there was 5000 or 500 pg input material, the performance of all methods was equally well except that the tagmentation-based method had lower fidelity; when there was 50 or 5 pg input, endonuclease-based methods showed better and more balanced performance on all three metrics than other methods; when there was 0.5 pg input, the tagmentation-based method and one endonuclease-based method (Enz_N) performed better than the other endonuclease-based method (Enz_Q), followed by sonication-based methods (Fig. 3f). Of note, the tagmentation-based method had a significantly higher proportion of designated reads than other methods with 0.5 pg input, suggesting its attractive advantage on samples with very low biomass.
Characteristics of contaminants introduced by library preparation
Contaminating reads originated from background DNA increased as the input load decreased, and the microbial reads accounted for 3.06–45.97% of total reads when the input load was 0.5 pg (Fig. 3a). The PERMANOVA analysis revealed that library preparation kits explained the largest variance in contaminating microbial composition (R2 = 36.4%, p < 0.001, Additional file 2: Fig. S4a), suggesting that a large fraction of the contaminants was kit specific. Surprisingly, input amount appeared as the second significant explaining factor in the PERMANOVA analysis (R2 = 15.6%, p < 0.001), suggesting that there were some endogenous contaminants in the sequins. We found that the abundance (normalized to total contaminating reads) of five genera, i.e., Escherichia, Gammaretrovirus, Citrobacter, Mastadenovirus, and Shigella was positively correlated with the input amount (p.adj < 0.05 and R2 > 0.55 for at least four kits, linear regression, Additional file 2: Fig. S4b), suggesting that they mainly originated from the sequins material and were thus filtered out in the following analyses. The variance explained by the input amount in the PERMANOVA analysis after the filtration reduced to 8.7% (p < 0.001).
We quantified the absolute amount of contaminating DNA using sequins as a spike-in control. Sonication-based methods had the highest amount of contaminants, with 15.59 pg for Son_Q (median) and 2.32 pg for Son_N, whereas other methods had fewer contaminants (0.05 pg, 0.04 pg, and 0.01 pg for End_N, End_Q, and Tn5_V, respectively, Fig. 4a). The most abundant contaminating genus of each kit had a relative abundance lower than 0.06% in libraries with 500 pg and 5000 pg input (Fig. 4b, Additional file 2: Fig. S4c), indicating that contamination was not a major concern when the input load was high. As the input decreased to 0.5 pg, the relative abundance of the highest contaminating genus increased to 21.4%, 14.2%, 1.3%, 1.4%, and 1.0% for Son_N, Son_Q, Tn5_V, End_N, and End_Q, respectively, and the proportion of remaining contaminants (out of all contaminating reads) was 91.1%, 87.1%, 76.0%, 65.9%, and 51.5% when setting a relative abundance threshold of 0.1% (Fig. 4b, Additional file 2: Fig. S4c).
Concerning the contaminating microbial composition for each kit, two sonication-based kits (Son_N and Son_Q), End_Q, End_N, and Tn5_V formed four distinct clusters on the PCoA plot (PERMANOVA R2 = 0.52, p < 0.001, Fig. 4c). The JSD between contaminating compositions of experimental replicates generally increased when the input amount was higher, suggesting greater randomness with a reduced contaminating fraction (Fig. 4d). We detected 494 core contaminating genera (with relative abundance higher than 0.1% in at least half of the libraries for each kit) that accounted for more than 89% of contaminating microbial reads in each kit (Fig. 4e, Additional file 1: Table S3). Of these genera, 110 have been found to be contaminants in previous studies [10,11,12,13,14, 21, 22]. Sixty-three core genera were shared by all five kits and accounted for more than 55% of contaminating reads in each kit. End_Q, Son_N, Tn5_V, and End_N had 120, 24, 21, and 12 unique contaminating genera respectively, while Son_Q had no unique ones (Fig. 4e). There were 32 high-abundance contaminants (greater than 1% in at least half of the libraries in each kit), which varied depending on the kit used (Fig. 4f). Among them, many genera are frequently identified in soil and water, e.g., Acinetobacter, Bradyrhizobium, and Methylobacterium. Of note, some genera are commensal microorganisms or potential pathogens that reside in human skin and mucosa, e.g., Cutibacterium, Staphylococcus, Corynebacterium, Acinetobacter, Streptococcus, and Klebsiella, with relative abundances up to 20.4%, 9.4%, 4.1%, 3.2%, 1.8%, and 0.3%, respectively, which could result in erroneous conclusions in human microbiome studies and pathogen detection. Finally, we found that the genome coverage of contaminating microbes was stochastic (Additional file 2: Fig. S4d), making it difficult to distinguish between contaminants and actual signals.
In silico decontamination methods recovered the real microbial community from high-quality libraries
Three types of most widely used decontamination methods, including filtering taxa with relative abundances less than fivefold or tenfold of that in negative controls (referred to as fivefold-NC and tenfold-NC), the Decontam method [17] (frequency mode, prevalence mode, and either mode), and the SourceTracker method [18], were employed to eliminate the impact of contamination. An optimum threshold for the Decontam method was determined as the one that resulted in the best recall with the premise of 100% precision (Additional file 2: Fig. S5a-e). True positives, false negatives, true negatives, and false positives were defined as contaminants accurately detected, contaminants missed, sequins detected as actual signals, and sequins misclassified as contaminants, respectively.
For Tn5_V, Enz_N, and Enz_Q libraries, which had a relatively low level of microbial contaminants (maximum < 10% of total reads, Fig. 3a), the JSDs between the decontaminated profile and the actual signal for Decontam-either, Decontam-frequency, and 5/tenfold-NC were small (< 0.05, Fig. 5a), with the most abundant contaminating genera having a relative abundance of 0.32% in the decontaminated profile, indicating that the above methods were able to recover the real microbial community from contaminated data. The recall for 5/tenfold-NC was close to 1, whereas the recall for Decontam-either and Decontam-frequency declined dramatically with the increased input load (Fig. 5b); however, there was little fidelity loss as the false-negative contaminants had low abundances.
For Son_Q and Son_N libraries that suffered heavy microbial contaminants (0.06–30.89% with a median of 16.69% and 0.06–45.97% with a median of 6.86%, respectively, Fig. 3a), Decontam-either, Decontam-frequency, and 5/tenfold-NC performed equally well when the input loads were 500 pg and 5000 pg, with the JSD between the decontaminated profile and the actual signal close to zero (Fig. 5a). However, the decontamination efficacy of the three methods decreased as the input load became lower despite of high recalls. Among them, Decontam-either was the most effective, followed by Decontam-frequency and 5/tenfold-NC (Fig. 5a, b). Interestingly, although Son_N and Son_Q libraries had similar levels of microbial contaminants when the input load was 0.5 pg, the decontamination methods performed much better on the former data (e.g., JSD for Decontam-either, 0.08 vs. 0.73, Fig. 5a), which could be due to significantly higher consistency of contaminants in Son_N libraries (Fig. 4d, e). Of note, although the best performing method Decontam-either removed at least 95.8% of the contaminating reads for Son_Q libraries with 0.5 pg and 5 pg inputs, there was still a large difference between the decontaminated profile and the actual signal (JSD > 0.5), highlighting the limitation of the current decontamination algorithms.
In addition, Decontam-prevalence and SourceTracker showed poor efficacy of decontamination for most libraries, owing to their low recalls and in part the low precisions of SourceTracker in some libraries (Fig. 5b, Additional file 2: Fig. S5f). Rarefying data to the same sequencing depth (7.3 million reads) did not change the above results.