Grooming the Metadata Workhorse

IMPACT Unofficial
3 min readAug 25, 2021

Here’s an exciting story about metadata. Seriously. IMPACT’s Analysis and Review of the Common Metadata Repository (ARC) team, led by Jeanné le Roux, recently published an article in Data Science Journal that explores the ways in which high-quality metadata powers the discovery of Earth observation data and the importance of quality assessments to make that happen.

The paper presents the framework developed by the ARC project team which is used to assess the quality of metadata records that describe NASA’s collection of approximately 8,000 Earth observation data products. To better facilitate search and discovery of these data products on the Internet, it is the descriptive metadata records, rather than the data itself, that are indexed. It is important that these records are in good shape and include all the relevant information needed to accurately find and access the data being described. The ARC framework provides a set of quality criteria — based around the aspects of correctness, completeness, and consistency — by which the team performs evaluations of a large number of metadata records in a systematic manner. By applying this framework, the team is able to identify areas of opportunity to improve the metadata records. This feedback is then provided to the people responsible for maintaining the records who can choose to update the metadata accordingly.

The ACR framework process

The process enabled by the metadata assessment framework incorporates both automated checks and targeted manual assessments, leveraging the strengths of both computers and humans. The automated checks performed by the machine quickly catch systemic issues in the metadata records and flag possible areas of concern. The assessments conducted by the ARC team members consider each metadata record as a cohesive unit and place an emphasis on whether each record conveys information that is helpful to users of discipline-specific data centers and to users of global catalogs.

At each step the framework incorporates the three fundamental focus areas of ARC’s assessment process. Correctness increases the extent to which the metadata reliably and accurately describes the data. Completeness results in a data object being described as robustly as possible to give it increased responsiveness to users’ search parameters. Consistency normalizes the semantic concepts and information in the same manner across multiple described data objects in order to make users’ searches more fruitful.

The ARC metadata review process

At its essence, the ARC framework is a recognition that metadata is the underappreciated and often overlooked workhorse that drives productive data search and discovery. At the beginning of the ARC project, initial assessments demonstrated that most metadata records use only about 40 percent of the available elements in a given standard. These records may meet minimum syntactic requirements, but do not utilize all available and applicable concepts that may provide greater search context for a user. The growing trend of data search across collection silos points to the need of frameworks such as the one developed by the ARC team to help with the hard work of normalizing descriptions across multiple heterogeneous data types. As Ms. le Roux points out, metadata quality matters regardless of the domain in which the data resides:

Cataloging data sets via metadata is not unique to the Earth sciences. The framework and lessons learned from the ARC process can be more broadly applied to other science disciplines as well.

The increasing interdisciplinary utilization of data argues for metadata aggregators to be empowered to make changes to metadata and then communicate those changes back to the discipline-specific data centers in order for metadata quality to be maintained across the domain. In the end, it is high quality metadata, and not the data itself, that increases the discoverability of data and the likelihood that data will be accessible and usable.

The article “Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments” is available online through the Data Science Journal.

More information about IMPACT can be found at NASA Earthdata and the IMPACT project website.

--

--

IMPACT Unofficial

This is the unofficial blog of the Interagency Implementation and Advanced Concepts Team.