Skip to main content

Table 2 Challenges of studying human gut virome and possible solutions

From: Studying the gut virome in the metagenomic era: challenges and perspectives

Steps

Challenges

Possible solutions

Nucleic acid extraction

• Existence of active and silent fractions of viromes

• Total nucleic acid isolation protocols (TNAI):

+ Allow characterization of microbiome along with virome potential = holistic picture of all components of the microbiome

+ High-throughput

– Lead to inflation of false-positive hits from bacteria in the subsequent data analysis

• Viral-like particle (VLP) isolation protocols:

+ Ensure true positives on viruses due to physical removal of bacteria by filtration

– Give a low-concentration output [79] that may complicate the genomic library preparation step

– Usually require multiple time-consuming steps of VLP and nucleic acid precipitation [78, 80]

• Combination of TNAI and VLP isolation protocol approaches [81]

Genomic library preparation

• Limited amount of viral genetic material available

• Use of more sensitive genomic library preparation kits

• MDA may lead to overrepresentation of circular ssDNA viruses [82] and underrepresentation of viruses with extreme GC content [83]

• Restricted use of MDA

• Studying RNA viruses requires additional effort due to the relative instability of RNA genetic material:

- Use of reverse transcriptase to convert RNA to cDNA

- Restricted usage of RNase in protocols handling both DNA and RNA viruses [84]

- May require separate isolation protocol (arising from the previous point) and, therefore, increase of the starting material

• Metatranscriptomics approaches

• Use of reverse transcription step

• Studying ssDNA viruses requires additional effort:

- Some of the WGA techniques that precede the genomic library preparation procedure might introduce biases into the representation of ssDNA viruses [77, 82, 85]

- The majority of current genomic library preparation procedures cannot handle ssDNA genomes due to the use of dsDNA adapters

- ssDNA viruses have been shown to have higher mutation rates than dsDNA viruses [86], thus increasing the microdiversity of the metagenome, which limits reference-based approach

• Use of ssDNA adaptors in adaptor-ligation reaction at the genomic library preparation step [77]

• Selection of an appropriate cut-off for coverage is complicated

• Studies report discoveries of a huge number of viruses at a depth of 1–15 × 106 reads per sample [60, 78,79,80]

Quality control

• Removal of bacterial sequences is complicated by the viral signals from prophages (both cryptic and inducible) carried by bacterial genomes

• Use of tools for identification of prophages in bacterial genomes [87,88,89], though some are limited to known prophages. The combination of multiple methods has been shown to enrich the set of detected prophages [90] and therefore prevent their concurrent removal with bacterial sequences.

Data analysis

• Existing databases do not fully represent viral diversity [91]

• Use of de novo assembly approaches

• Rapid evolution and diversity of viral genomes limits reference-based approaches

• Use of reference databases that include both cultured viruses and computationally identified viral contigs [25, 92]

• Use of a protein-based search

• Use of a profile hidden Markov model based on protein domains allows the identification of remote homologs [93]

• De novo assembly approach is sensitive to biases introduced during genomic library preparation and sequencing:

- Low DNA input for genomic library preparation decreases the percentage of reads that map back to the corresponding assemblies [94, 95]

- Use of a DNA amplification step might affect the distribution of read coverage [94, 96]

- Shifts in GC content during genomic library preparation [97] affect the completeness of genomes and cause assembly fragmentation

• Adjustment of the assembly pipeline according to applied genomic library preparation procedure [96]: use of modes suitable for an uneven distribution of read coverage such as single-cell SPAdes [98, 99] preceded by read de-duplication [96] or Velvet-SC [100]

• Use of genomic library preparation protocols without any amplification procedure (needs high DNA input, probably not applicable for viromics) [101, 102]

• Reproducibility of assembly results when combining different assemblers is complicated by technical challenges [103, 104] and the possibility of the appearance of chimera assemblies [104]

Â