LifeWatch infrastructure mainly aims to aggregate and analyze big data in ecology, thus offering the public virtual services and tools to boost scientific research and dissemination (see https:// www.lifewatch.eu). In this context, the inter-university centre PlantData began collecting and organising published and original datasets on plant diversity at the national level to obtain a standardized database from different sources. Different data collection methods were applied for different purposes. The evaluation of data bias is useful to improve data aggregation and thus to inform users of the big data infrastructure about methodological limitations. To this end, a target dataset derived from a heterogeneous collection of vegetation plots in the Italian forests is compared with a probabilistic reference dataset to evaluate its potential gaps and limitations. We assembled a dataset composed of four regional databases, with about 18,000 vegetation plots from the whole of Italy, mainly following the preferential sampling. Our dataset was compared with the ICP-Forest dataset on plant diversity of Italian forests, composed of 201 vegetation plots, which followed a probabilistic sampling design. We compared the two datasets in terms of broad forest types and their occupied ranges, which resulted from a multivariate regression tree. We obtained three forest types comprising the warm temperate forest, the cool temperate forest, and the montane forest. The preferential dataset was spatially representative for the forest types in the study area due to the great number of plots included. This last characteristic is also reflected in the taxonomical and ecological representativeness of the dataset through the occurrence of rare species. However, the preferential dataset showed an oversampling in the warm temperate area, suggesting a preferential accumulation of data for the (sub-)Mediterranean forests, while the coniferous forest of the Alps has been undersampled.

The aggregation of preferential and probabilistic datasets to investigate Italian forests: a case-study through the LifeWatch infrastructure.

R. Canullo;S. Chelli;
2021-01-01

Abstract

LifeWatch infrastructure mainly aims to aggregate and analyze big data in ecology, thus offering the public virtual services and tools to boost scientific research and dissemination (see https:// www.lifewatch.eu). In this context, the inter-university centre PlantData began collecting and organising published and original datasets on plant diversity at the national level to obtain a standardized database from different sources. Different data collection methods were applied for different purposes. The evaluation of data bias is useful to improve data aggregation and thus to inform users of the big data infrastructure about methodological limitations. To this end, a target dataset derived from a heterogeneous collection of vegetation plots in the Italian forests is compared with a probabilistic reference dataset to evaluate its potential gaps and limitations. We assembled a dataset composed of four regional databases, with about 18,000 vegetation plots from the whole of Italy, mainly following the preferential sampling. Our dataset was compared with the ICP-Forest dataset on plant diversity of Italian forests, composed of 201 vegetation plots, which followed a probabilistic sampling design. We compared the two datasets in terms of broad forest types and their occupied ranges, which resulted from a multivariate regression tree. We obtained three forest types comprising the warm temperate forest, the cool temperate forest, and the montane forest. The preferential dataset was spatially representative for the forest types in the study area due to the great number of plots included. This last characteristic is also reflected in the taxonomical and ecological representativeness of the dataset through the occurrence of rare species. However, the preferential dataset showed an oversampling in the warm temperate area, suggesting a preferential accumulation of data for the (sub-)Mediterranean forests, while the coniferous forest of the Alps has been undersampled.
2021
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/460754
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact