The four “Sister Projects” AD4GD, FAIRiCUBE, USAGE and B-Cubed, financed by DG research and collaborating under the GDDS action group with the goal of developing the GDDS, have been joining forces to pool their understanding of the requirements to the GDDS and propose concrete steps forward. As part of this work, we have been investigating our requirements to metadata systems, both for data discovery as well as for understanding details required for further processing such as provenance and data quality. To this purpose, we held a virtual metadata workshop on September 17th to gain a better overview both towards requirements as well as available technologies being leveraged for this work across Sister Projects.
At present, two of the four projects (AD4GD and USAGE) are utilizing the spatial metadata specification from ISO 19115, one project (FAIRiCUBE) utilizes the new SpatioTemporal Asset Catalog STAC, while B-Cubed has been focusing on the functionality provided by the EBV GeoBon portal as well as the DarwinCore based metadata systems provided by GBIF. Due to this diversity in approaches, providing a harmonized view across the projects is currently not possible, illustrating a potential gap in the GDDS. After deliberations we determined that we will investigate DCAT and especially GeoDCAT as a foundational structure, that would allow for harmonization across these different approaches to metadata provision.
While most existing metadata have a good coverage of basic concepts, especially pertaining to gridded products, deficits have been identified. During the analysis of the metadata being provided by the 4 GDDS Sister Projects, the following areas were identified as requiring additional work:
- Details on the cell components: especially when dealing with gridded data beyond raw satellite data, details on what exactly is being provided in the data payload becomes sparce. Insights from terrestrial geospatial systems and recent developments on Observable Property models would be a valuable addition
- Data Provenance: too often, data is provided without concise provenance information detailing underlying source data and processing steps applied. Integration of the W3C PROV-O Ontology would help to bridge this gap.
- Data Quality: in order to provide dependable outputs, source data must be well vetted to assure that it is of necessary quality. Data quality metrics are often not provided together with available data. Both structures and examples would be most welcome
- We will be continuing this work during our upcoming GDDS Requirements Workshop September 30th – October 1st in Vienna, Austria.