LifeWatch infrastructure mainly aims to aggregate and analyze big data in ecology, thus offering the public virtual services and tools to boost scientific research and dissemination (see https:// www.lifewatch.eu). In this context, the inter-university centre PlantData began collecting and organising published and original datasets on plant diversity at the national level to obtain a standardized database from different sources. Different data collection methods were applied for different purposes. The evaluation of data bias is useful to improve data aggregation and thus to inform users of the big data infrastructure about methodological limitations. To this end, a target dataset derived from a heterogeneous collection of vegetation plots in the Italian forests is compared with a probabilistic reference dataset to evaluate its potential gaps and limitations. We assembled a dataset composed of four regional databases, with about 18,000 vegetation plots from the whole of Italy, mainly following the preferential sampling. Our dataset was compared with the ICP-Forest dataset on plant diversity of Italian forests, composed of 201 vegetation plots, which followed a probabilistic sampling design. We compared the two datasets in terms of broad forest types and their occupied ranges, which resulted from a multivariate regression tree. We obtained three forest types comprising the warm temperate forest, the cool temperate forest, and the montane forest. The preferential dataset was spatially representative for the forest types in the study area due to the great number of plots included. This last characteristic is also reflected in the taxonomical and ecological representativeness of the dataset through the occurrence of rare species. However, the preferential dataset showed an oversampling in the warm temperate area, suggesting a preferential accumulation of data for the (sub-)Mediterranean forests, while the coniferous forest of the Alps has been undersampled.
The aggregation of preferential and probabilistic datasets to investigate Italian forests: a case-study through the LifeWatch infrastructure.
R. Canullo;S. Chelli;
2021-01-01
Abstract
LifeWatch infrastructure mainly aims to aggregate and analyze big data in ecology, thus offering the public virtual services and tools to boost scientific research and dissemination (see https:// www.lifewatch.eu). In this context, the inter-university centre PlantData began collecting and organising published and original datasets on plant diversity at the national level to obtain a standardized database from different sources. Different data collection methods were applied for different purposes. The evaluation of data bias is useful to improve data aggregation and thus to inform users of the big data infrastructure about methodological limitations. To this end, a target dataset derived from a heterogeneous collection of vegetation plots in the Italian forests is compared with a probabilistic reference dataset to evaluate its potential gaps and limitations. We assembled a dataset composed of four regional databases, with about 18,000 vegetation plots from the whole of Italy, mainly following the preferential sampling. Our dataset was compared with the ICP-Forest dataset on plant diversity of Italian forests, composed of 201 vegetation plots, which followed a probabilistic sampling design. We compared the two datasets in terms of broad forest types and their occupied ranges, which resulted from a multivariate regression tree. We obtained three forest types comprising the warm temperate forest, the cool temperate forest, and the montane forest. The preferential dataset was spatially representative for the forest types in the study area due to the great number of plots included. This last characteristic is also reflected in the taxonomical and ecological representativeness of the dataset through the occurrence of rare species. However, the preferential dataset showed an oversampling in the warm temperate area, suggesting a preferential accumulation of data for the (sub-)Mediterranean forests, while the coniferous forest of the Alps has been undersampled.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.