AGU is almost here. Below are previews of three presentations you will not want to miss.
Phenomena Portal: Large-Scale Visual Exploration of Atmospheric Phenomena
IN028 — Applying Artificial Intelligence Tools and Services on Earth System Science Data II
The Earth science community is experiencing a high influx of remote sensing data due to recent advancements in sensor technology. This enables the community to extend their research on a larger scale than ever before. Unfortunately, traditional data processing techniques do not scale well to these new, high volume data sources. State-of-the-art machine learning (ML) pipelines have been proven to overcome these burdens in various other fields but are underexploited within the physical sciences community. Moreover, ML is reliant on labeled data, which is currently sparsely available in the Earth science domain. This is due to the fact that ML adoption is still in its early stages within the Earth and atmospheric science communities.
To address these issues, IMPACT developed the Phenomena Portal, a visual exploration tool that uses ML to detect various atmospheric phenomena on a global scale. This allows the Earth and atmospheric science communities to view trends in occurrences of phenomena, identify potential relationships between them, and analyze spatiotemporal patterns over time. These detections can also serve as initial labeled data for ML research pertaining to the respective phenomena. The tool also incorporates feedback from subject matter experts to further improve the model detection accuracy, thereby facilitating human-in-the-loop. Muthukumaran R. will present an overview of the ML model development and cloud deployment. He will also discuss the capabilities of the user interface for displaying the detections.
Es2Vec: Earth Science Metadata Suggestions and Analogical Reasoning
IN030 — Knowledge and Knowledge Graph Induction in Geoscience
As the volume of text-based Earth science research grows, it is increasingly possible to discover latent relationships in the literature. However, traditional methodologies are restricted by limited computational capabilities and intractable problem spaces. Advancements in natural language processing (NLP) have allowed IMPACT to use a comprehensive Earth science corpus to create a domain-specific word vector model, Es2Vec, which we have used to surface latent relationships between Earth science concepts, to generate improved keyword tags, and to explore the area of analogical reasoning.
By using Es2Vec with cosine proximity and domain filtering, we have successfully predicted a wide range of relationships, such as synonyms for common phenomena and the instruments most associated with particular authors. Additionally, IMPACT built a tool that uses Es2Vec to recommend keyword tags for dataset abstracts, potentially improving dataset search and discovery. Finally, Es2Vec has outperformed general word embeddings in surfacing complex word-pair relationships present in the corpus, such as property-instrument pairs (e.g. thermometer — temperature). Success in identifying these word pairs is key to ultimately predicting analogous casual relationships.
In this presentation, Carson Davis will demonstrate the capabilities of Es2Vec in surfacing domain-specific insights from an Earth science corpus.
Automated Metadata Scoring Approaches for Earth Observation Data
IN047 — Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
The Common Metadata Repository (CMR) contains metadata records that describe NASA’s Earth observation data products which are archived across twelve data centers known as Distributed Active Archive Centers (DAACs). To ensure that NASA’s data is discoverable, accessible, and usable, the Analysis and Review of CMR (ARC) team, located at Marshall Space Flight Center, assesses the quality of these metadata records. The ARC team uses a combination of automated and manual methods to check metadata records for dimensions of quality such as completeness, correctness, and consistency.
In addition to these quality assessments, the team is exploring metadata scoring methods in order to provide normalized results across the twelve DAACs. One method is to use automated metrics to assess metadata fields and then provide a numeric score, or grade, based on the analysis. To implement this process, two different approaches have been theorized and are currently being explored by the ARC team. Jenny Wood will describe ARC’s two proposed methodologies in more detail, and the pros and cons to using these metadata scoring methods.