Comparing whole-genome shotgun sequencing and DNA metabarcoding approaches for species identification and quantification of pollen species mixtures
Abstract
Molecular identification of mixed-species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole-genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k-mer identification method with reference libraries constructed from full-genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS-based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed-species pollen samples.
Citation
Bell, K. L. et al. Comparing whole‐genome shotgun sequencing and DNA metabarcoding approaches for species identification and quantification of pollen species mixtures. Ecol. Evol. (2021)