• The impacts of reference database selection, indicator threshold determination and target data preparation in the sequence data analysis of eDNA monitoring -- taking fish as the target in middle Yangtze River

    Subjects: Biology >> Ecology submitted time 2024-01-23

    Abstract: In the meta-barcoding based eDNA monitoring technology, the analysis and annotation of eDNA sequencing data serve as the foundation for obtaining accurate and reliable monitoring results. The selection of reference databases, the determination of analysis & annotation indicator thresholds, and the preparation of target data are the most critical technical steps in eDNA sequencing data analysis and annotation. To clarify the impacts of these three technical aspects and provide scientific support for the standardization of eDNA monitoring technology, the current study used two sets of COI gene sequence data from eDNA monitoring in the middle reach of the Yangtze River as the analysis objects and designed three sets of experiments to test 1) the impacts of different reference databases and species annotation algorithms on the annotation results, 2) the impacts of different OTU clustering sequence similarity and species annotation classification confidence (sequence consistency and sequence coverage) on the annotation results, and 3) the impacts of different target sequence data richness of each species on the annotation results. The results showed that: 1) under the Blast algorithm, the annotated species matched with three versions of nt library from NCBI were generally consistent (72%~78%); those matched with two local sequence reference libraries were also generally consistent (91%~96%); and the annotated species from the five result matched with these five sequence reference libraries were consistent in 52%~68%. The RDP Classifier algorithm annotated species matched with nt libraries covered over 95% of Blast algorithm annotated species, and increased by 151%~443% species, but most additional species were misannotated. The RDP Classifier algorithm annotated species matched with local sequence reference libraries covered 66%~85% of Blast algorithm annotated species, and there were several results only annotated to family or genus level. 2) When the OTU clustering sequence similarity threshold was set to 0.999, it obtained 154%~209% more OTUs than when set to 0.99, and 240%~490% more annotated OTUs of fish were obtained. The classification confidence threshold (Blast algorithm) had little effect on species composition when changed from 0.8 to 0.99, with over 94% consistency, but there was a significant difference when it was set to 0.7. 3) When the OTU clustering sequence similarity threshold was 0.999 and the classification confidence threshold was 0.9, the number of fish species and OTUs obtained from multiple sequences data annotation was the largest, and had the highest species annotation accuracy (81.49%), which increased by 7% fish species, 215% OTUs and 5% accuracy respectively compared to single sequences data annotation. In eDNA sequencing data analysis and annotation, accuracy can be improved by establishing and improving local reference databases, optimizing OTU clustering sequence similarity and species annotation classification confidence thresholds (sequence consistency and sequence coverage), increasing target sequence data richness. However, due to the limitation of species annotation algorithms, problems such as species annotation errors and omissions may persist in eDNA sequencing data analysis and annotation in the future. Then, the species annotation accuracy of eDNA monitoring (based on the COI gene) would always lower than 85%.

  • The small-scale temporal and spatial heterogeneity of eDNA monitoring and suggestions for duplicated eDNA sampling in large river

    Subjects: Biology >> Ecology submitted time 2023-07-06

    Abstract: Design of duplicated samples is the first key step for standardizing the processes of eDNA monitoring. Previous works have studied how many duplicated samples should be sampled. However, whether the duplicated samples should be sampled in a series of sites in space or in continuous moments in time has not been carefully discussed, although this question is very important for eDNA monitoring practice. To solve this problem, the current work took a case study in Wuhan section of Yangtze River, got 16 eDNA samples from June 27 to July 14, 2022 day by day (temporal group samples) and 16 eDNA samples across the transection of Yangtze River in June 28 and July 12, 2022 (spatial group samples), and then analyzed the detected species in these eDNA samples to identify the temporal and spatial heterogeneity of eDNA monitoring, so as to provide suitable suggestions for setting duplicated samples in eDNA monitoring practice in large river. The results showed that, for bacteria and metazoa, the total number of species detected in spatial group eDNA samples was more than that detected in temporal group eDNA samples, and the spatial heterogeneity of species detected in eDNA monitoring was greater than the temporal heterogeneity of which. While for the three taxonomies of fungi, algae and protozoa, there was an opposite status. Therefore, we suggest that to monitor environmental microorganisms and aquatic metazoa in large rivers, spatial duplicated sampling of eDNA monitoring should be given priority in duplicated samples design. To monitoring fungi, algae, protozoa, temporal duplicated sampling of eDNA monitoring should be given priority in duplicated samples design. At the same time, attention should be paid to the selection of sampling time when taking spatial duplicated sampling, and the selection of sampling point when taking temporal duplicated sampling. Moreover, maybe, more duplicated samples are needed when one focuses on the monitoring of a subdivision taxonomy.