Your conditions: 杨海乐
  • The impacts of reference database selection, indicator threshold determination and target data preparation in the sequence data analysis of eDNA monitoring -- taking fish as the target in middle Yangtze River

    Subjects: Biology >> Ecology submitted time 2024-01-23

    Abstract: In the meta-barcoding based eDNA monitoring technology, the analysis and annotation of eDNA sequencing data serve as the foundation for obtaining accurate and reliable monitoring results. The selection of reference databases, the determination of analysis & annotation indicator thresholds, and the preparation of target data are the most critical technical steps in eDNA sequencing data analysis and annotation. To clarify the impacts of these three technical aspects and provide scientific support for the standardization of eDNA monitoring technology, the current study used two sets of COI gene sequence data from eDNA monitoring in the middle reach of the Yangtze River as the analysis objects and designed three sets of experiments to test 1) the impacts of different reference databases and species annotation algorithms on the annotation results, 2) the impacts of different OTU clustering sequence similarity and species annotation classification confidence (sequence consistency and sequence coverage) on the annotation results, and 3) the impacts of different target sequence data richness of each species on the annotation results. The results showed that: 1) under the Blast algorithm, the annotated species matched with three versions of nt library from NCBI were generally consistent (72%~78%); those matched with two local sequence reference libraries were also generally consistent (91%~96%); and the annotated species from the five result matched with these five sequence reference libraries were consistent in 52%~68%. The RDP Classifier algorithm annotated species matched with nt libraries covered over 95% of Blast algorithm annotated species, and increased by 151%~443% species, but most additional species were misannotated. The RDP Classifier algorithm annotated species matched with local sequence reference libraries covered 66%~85% of Blast algorithm annotated species, and there were several results only annotated to family or genus level. 2) When the OTU clustering sequence similarity threshold was set to 0.999, it obtained 154%~209% more OTUs than when set to 0.99, and 240%~490% more annotated OTUs of fish were obtained. The classification confidence threshold (Blast algorithm) had little effect on species composition when changed from 0.8 to 0.99, with over 94% consistency, but there was a significant difference when it was set to 0.7. 3) When the OTU clustering sequence similarity threshold was 0.999 and the classification confidence threshold was 0.9, the number of fish species and OTUs obtained from multiple sequences data annotation was the largest, and had the highest species annotation accuracy (81.49%), which increased by 7% fish species, 215% OTUs and 5% accuracy respectively compared to single sequences data annotation. In eDNA sequencing data analysis and annotation, accuracy can be improved by establishing and improving local reference databases, optimizing OTU clustering sequence similarity and species annotation classification confidence thresholds (sequence consistency and sequence coverage), increasing target sequence data richness. However, due to the limitation of species annotation algorithms, problems such as species annotation errors and omissions may persist in eDNA sequencing data analysis and annotation in the future. Then, the species annotation accuracy of eDNA monitoring (based on the COI gene) would always lower than 85%.

  • Watershed biological information flow driven by natural runoff in Shaliu River Basin on Qinghai-Tibet Plateau indicated by environmental microbes

    Subjects: Biology >> Ecology submitted time 2023-07-06

    Abstract: The collection, transport and transformation of sediments, nutrients, organic matter, energy and information are key topics in the studies on ecosystem processes. However, there is no systematic literature on watershed information flow (WIF) in watershed ecology. To promote research on the WIF, we proposed the concept of watershed biological information flow (WBIF) by referencing the concept of biological information flow, and defined it as the path, processes and control of biological information transport, exchange, interaction and feedback among different spaces and systems along with watershed ecosystem processes. We proposed that the key of WBIF research should focus on 1) the WBIF between land and river, branch and main stream, upstream and downstream and different patches, 2) the periodical fluctuation and trending drift of the WBIF, and 3) the impacts of geomorphologic, hydrologic situations and human activities on WBIF. We conducted a case study on the WBIF in the Shaliu River basin indicated by the environmental microbes in riverine water and riparian soil using environmental DNA technology. Shaliu River is one of the main inflowing rivers of Qinghai Lake, which has a relative simple watershed ecosystem. In the river, there is a simple aquatic ecosystem with low biodiversity and a migratory fish Gymnocypris przewalskii which migration between river and lake. On the land, there are dominant grassland and limited human activities. To reveal the essential features of WBIF driven by natural runoff, we compared the bacterial community (indicated by operational taxonomic units (OTUs)) from upstream riverine water samples with from downstream riverine water samples and from riverine water samples with from adjacent riparian soil samples. Results showed that (1) the WBIF from soil to water was driven by surface flow and subsurface flow and filtrated by environment change. Its transport efficiency was 62.76% in rainy day and 44.16% in sunny day. Correspondingly, their transport capacity was 68.49% and 56.82%, respectively; their environmental attenuation was 8.38% and 22.38%, respectively. (2) The WBIF from upstream to downstream was driven by river flow and attenuated in transport. Its basic integrated transport efficiency was 97.41% per kilometer, in which the transport capacity was 99.42% per kilometer, the proportion of noneffective WBIF was 43.46%, and half-life distance of noneffective WBIF was 14.52 kilometers. (3) As the transport efficiency of the WBIF was mainly constrained by transport capacity of WBIF, precipitation drove the arising of surface flow,  then enhanced the power of erosion and transportation, and finally promoted the increase of WBIF transport capacity and efficiency. (4) The WBIF increased the detectable biodiversity of sink aquatic ecosystem, but the increase of detectable biodiversity is limited rather than accumulated along the river.
     

  • Simulating the impacts of parallel samples on the estimations of upstream-to-downstream watershed biological information flow

    Subjects: Biology >> Ecology submitted time 2023-07-06

    Abstract: Watershed biological information flow (WBIF) is defined as the path, processes and control of biological information transport, exchange, interaction and feedback among different spaces and systems along with watershed ecosystem processes, and could be partly described as the land-to-river and upstream-to-downstream bioinformation transportation (including organisms, nucleic acids, peptides and other biomarkers), which is driven by the hydrologic processes of watershed systems. The WBIF labels the transport of organic matter and energy. The WBIF integrates the ecological processes of environmental DNA (eDNA), including the origin, state, transport, and fate of eDNA, and makes it possible that the species composition in river system is monitored and assessed using eDNA. The WBIF estimation is the key for watershed ecosystem processes studying and riverine biodiversity monitoring. However, in practice, the parallel samples in each sampling site always are limited. And how parallel samples would impact WBIF estimation is unknown. Based on the principles of sampling survey, we hypothesized that parallel samples would not impact the accuracy of the WBIF estimation, but affect the precision of the WBIF estimation. Then, we transformed this hypothesis into a set of formulas and tested it with a series of analog computation. Results showed that the number of parallel samples (efficiency of detection) affected both the accuracy and precision of the WBIF estimation. The optimal WBIF estimation was less than the actual WBIF in any condition. Along with the increase of parallel samples (efficiency of detection), the optimal WBIF estimation gradually neared to the actual WBIF, the range of WBIF estimation gradually focused on the actual WBIF. In other words, more parallel samples (higher efficiency of detection) led higher accuracy and precision of the WBIF estimation. In addition, the actual WBIF affected both the accuracy and precision of the WBIF estimation too. Larger actual WBIF led higher accuracy and precision of the WBIF estimation. The relative relationship between the number of biological information types in upstream and downstream samples affected both the accuracy and precision of the WBIF estimation too.  The accuracy and precision of WBIF estimation would be higher when the number of biological information types in upstream samples was more than those in downstream samples. So, we suggest that in the work of watershed ecosystem processes studying and riverine biodiversity monitoring, the relationship between parallel sample number and detection efficiency should be assessed, the suitable parallel sample number should be estimated based on the reliability target of WBIF estimation, the sampling program should be designed with suitable parallel samples, the WBIF should be estimated based on all parallel samples of each sampling site, at last the estimated results of WBIF should be re-evaluated according to the posterior probability of WBIF in different conditions. The current work provided the framework and methodology reference for the post-evaluation.
     

  • The small-scale temporal and spatial heterogeneity of eDNA monitoring and suggestions for duplicated eDNA sampling in large river

    Subjects: Biology >> Ecology submitted time 2023-07-06

    Abstract: Design of duplicated samples is the first key step for standardizing the processes of eDNA monitoring. Previous works have studied how many duplicated samples should be sampled. However, whether the duplicated samples should be sampled in a series of sites in space or in continuous moments in time has not been carefully discussed, although this question is very important for eDNA monitoring practice. To solve this problem, the current work took a case study in Wuhan section of Yangtze River, got 16 eDNA samples from June 27 to July 14, 2022 day by day (temporal group samples) and 16 eDNA samples across the transection of Yangtze River in June 28 and July 12, 2022 (spatial group samples), and then analyzed the detected species in these eDNA samples to identify the temporal and spatial heterogeneity of eDNA monitoring, so as to provide suitable suggestions for setting duplicated samples in eDNA monitoring practice in large river. The results showed that, for bacteria and metazoa, the total number of species detected in spatial group eDNA samples was more than that detected in temporal group eDNA samples, and the spatial heterogeneity of species detected in eDNA monitoring was greater than the temporal heterogeneity of which. While for the three taxonomies of fungi, algae and protozoa, there was an opposite status. Therefore, we suggest that to monitor environmental microorganisms and aquatic metazoa in large rivers, spatial duplicated sampling of eDNA monitoring should be given priority in duplicated samples design. To monitoring fungi, algae, protozoa, temporal duplicated sampling of eDNA monitoring should be given priority in duplicated samples design. At the same time, attention should be paid to the selection of sampling time when taking spatial duplicated sampling, and the selection of sampling point when taking temporal duplicated sampling. Moreover, maybe, more duplicated samples are needed when one focuses on the monitoring of a subdivision taxonomy.
     

  • A framework for standardizing the processes of eDNA monitoring and an accessible vision of the future

    Subjects: Biology >> Ecology submitted time 2023-03-28

    Abstract: Environmental DNA (eDNA) is DNA extracted from any type of environmental sample (e.g. water, soil, sediment, air, mixture, etc.), which is a DNA mixture originated from different species and individuals, distinguish from a pure DNA sample extracted from a particular organism. eDNA monitoring refers the processes that 1) extracting DNA sample from environmental sample, 2) using definite species-specific primers or meta-barcoding primers to amplify and sequence eDNA sample, 3) clustering the operational taxonomic units (OTUs) and identifying their taxa against reference databases, 4) calculating the relative abundance of each OTUs/ species and other biodiversity indexes, 5) analyzing the corresponding ecosystem structure, processes or function. According to eDNA monitoring, a definite species (or other taxonomic units) in the sampling site could be identified, and the biological information about species composition, community structure, ecosystem processes, ecological function of the research area could be collected. eDNA monitoring has been applied in monitoring and early warning definite species, investigating and assessing biodiversity, detecting and analyzing community structure and function, studying and quantifying ecosystem processes and so on. eDNA monitoring could work in any type of environmental scene where there is unidentified DNA trace, such as in terrestrial environment, aquatic environment, air environment, body surface, organism (inner) surface and so on. As an emerging tool for documenting species presence without direct observation, allowing for sensitive and efficient detection, easy-to-standardize sampling and analyzing approach, comprehensive taxonomic groups coverage, less reliant on taxonomic expertise and auditable by third-party researchers, eDNA monitoring would be a prospective general method for species monitoring, community function predicting and ecosystem processes analyzing in future. Moreover, the objective scope of eDNA monitoring covers all environmental conditions and all biological taxonomies. However, to realize the prospective application vision of eDNA monitoring, there are ten crucial links that need to be standardized at both general level and definite level. 1) Design of duplicated samples for a region with definite environment conditions. The number of duplicated samples could be generally identified just using species accumulation curves. 2) Design of sampling time for a region with definite environment conditions. The interval of sampling time could be generally identified by quantifying the degradation ratio or the retention time of the eDNA from different taxonomic organisms in definite environment conditions. 3) Design of sampling sites for a region with definite environment conditions. The distance of sampling sites could be generally identified by quantifying the effective transportation distance or the spatial heterogeneity of the eDNA from different taxonomic organisms in definite environment conditions. 4) Design of sampling method. For different study areas, objects and aims, there are different optimal sample types (water, soil, sediment or other samples). Don’t combine different duplicated samples, or some rare species would be omitted because of their too weak signals. 5) Pretreatment of samples. Pretreatment of samples mainly refers filtration of water samples. It’s suggested that filtering water samples should use finer millipore glass fiber filter. Don’t remove large particles by prefiltering water sample, or some species signals could be removed. 6) Storage of samples. It’s suggested that samples could be stored at -20 or -80 centigrade, except water samples. Water samples should be kept cool in ice bath and be filtered as soon as possible. 7) Choosing of primers. The primers of metabarcoding of the 16S rRNA gene are widely used for detecting bacteria and archaea. The primers of metabarcoding of the ITS and 18S rRNA genes are widely used for detecting fungi. The primers of metabarcoding of the mitochondrial CO1, 12S rRNA and Cyt b genes are widely used for detecting metazoan. Metagenome is another choice for identifying species. 8) Experiment processes of DNA extraction, amplifying, sequencing and analyzing. As the experiment processes are more and more tending to be processed by commercial biolabs, a set of general experimental parameters is needed. 9) Taxonomic identification of OTUs. Good reference databases, either comprehensive reference databases or local customizable reference databases, are required. 10) Post-evaluation of results. Post-evaluation of results mainly pays attention on whether the number of duplicated samples is sufficient, whether the frequency of sampling is suitable, whether the spatial distance between sampling sites is suitable, whether the taxonomic identification of OTUs is accurate. Until now, there is no theoretical difficulty in standardizing these ten crucial links. Now, the mainly work is the accumulation of datasets and knowledge. Some studies on supporting the standardization have been processed. Parts of standardizing works have been organized both at home and abroad. We expect that the accumulation of crucial datasets and knowledge on eDNA monitoring in hot regions could finish in future several years, and then the eDNA monitoring could be a general work, even a long term basic work in hot regions. As the eDNA monitoring could produce comprehensive and standard datasets, along with the long term basic work of eDNA monitoring realizing, the long time series datasets could be used to detect the biodiversity (especially hiddenbiodiversity) variations and study the dynamic and evolution of ecosystem structure, processes, function and health. Moreover, we expect a series of datasets with high quality, rigour, availability and transparency in future to support the open science and the data-intensive scientific discovery and ecosystem management.

  • Quantifying the spatial resolution of eDNA monitoring: a case study in Middle Yangtze River in mean-flow period

    Subjects: Biology >> Ecology submitted time 2023-03-28

    Abstract:长江中游是长江极为重要的自由流淌河段,为中华鲟、长江江豚等水生生物提供了关键生境,开展常态化系统化eDNA (environmental DNA)监测对域内水生生物多样性评估和保护具有重要意义。eDNA监测的空间分辨率未量化限制了长江中游常态化eDNA监测的实施。为了量化长江中游eDNA监测的空间分辨率,我们探索建立了一个基于黑箱模型、简化过程和概率化表述的量化方法。本研究2020年6月(平水期)在长江中游设置30个采样断面,断面间隔在30 km左右,开展eDNA采样,进行高通量测序(原核生物用16S rRNA基因扩增子测序、真核生物用线粒体COI基因扩增子测序),根据流域生物信息流分析框架计算eDNA所能监测到的生物信息输移的量化特征,确定eDNA监测空间分辨率(系列)值及其可信度、覆盖度。结果显示长江中游平水期eDNA所能监测到的原核生物的生物信息输移能力为99.91%/km,非生命个体生物信息输移占比23.83%,非生命个体生物信息输移半衰距离为48.45 km;真核生物的eDNA输移能力为99.85%/km,非生命个体生物信息输移占比67.93%,非生命个体生物信息输移半衰距离为30.00 km。eDNA监测空间分辨率可信度和覆盖度之间存在权衡,原核生物eDNA监测空间分辨率的可信度与覆盖度平衡点在39 km,特征值在86%左右,真核生物eDNA监测空间分辨率的可信度与覆盖度平衡点在28 km,特征值在65%左右。研究建议不同监测目的可以根据需要选择不同监测空间分辨率:以河段单元内的物种组成为目的的监测,可优先覆盖度、牺牲可信度选择eDNA监测空间分辨率;以生物多样性空间结构为目的的监测,可优先可信度、牺牲覆盖度选择eDNA监测空间分辨率。原核生物90%以上覆盖度对应的空间分辨率为27 km(可信度为84.18%),真核生物90%以上覆盖度对应的空间分辨率为6 km(可信度为41.38%),80%以上覆盖度对应的空间分辨率为13 km(可信度为50.64%);原核生物90%以上可信度对应的空间分辨率为58 km(覆盖度为82.30%),真核生物90%以上可信度对应的空间分辨率为78 km(覆盖度为38.61%),80%以上可信度对应的空间分辨率为50 km(覆盖度为49.70%)。本研究可为长江中游eDNA监测断面设置提供量化参考,为其它河流或河段eDNA监测分辨率估算提供方法借鉴。