Even More AGU Previews!
We are now not all that far from AGU Fall 2020, and our team is hard at work preparing a number of presentations. We have been posting a series of presentation previews (the first two are here and here). Below are three more presentations we will be sharing with you during the conference.
Improved Data Communication, Understanding and Discovery Using Algorithm Theoretical Basis Documents
Session IN047 — Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
Scientists and data repositories depend on the effective communication of the scientific and physical theories used to derive Earth observation datasets from raw instrument data. An understanding of these theories is important to understanding and properly using the data. The NASA Earth science data community communicates this information through Algorithm Theoretical Basis Documents (ATBDs). However, ATBDs lack a formal, standardized structure, which often results in ATBDs containing inadequate information to understand the algorithm. The non-standard structures also impede the ability to efficiently parse the document’s content for the desired information. Additionally, science teams typically provide ATBDs in human, but not machine readable formats, which makes it difficult for modern information processing technologies to process the data.
The Algorithm Publication Tool (APT) reconceptualizes ATBD content as metadata, or descriptions about the products they represent, to streamline authoring and dynamic updating, encourage consistent information across ATBDs and promote human and machine parsing of information in order to simplify data understanding.The APT alleviates the aforementioned issues by envisioning ATBDs as metadata and providing a simplified cloud-based template for ATBD authoring, review, and publication. Data users can easily search and discover ATBDs using the APT’s centralized repository. Brad Baker describes IMPACT’s effort to re-envision ATBDs as metadata and demonstrates how the tool supports data and information discovery.
Labeling and Managing Image Data for Machine Learning in the Earth Sciences
Session IN009 — Solving Training Data Bottlenecks for Artificial Intelligence/Machine Learning in Earth Science
While machine learning techniques for image classification have been around for a long time, storing and managing the vast number of images required as training data is still a problem for scientists. This is especially true for the field of Earth science, where only recently have experts begun using machine learning techniques for image-based phenomena classification.
Image Labeler, a fast and scalable cloud-based tagging platform for Earth science images, seeks to improve upon existing methods of managing images and their associated metadata. One such example is the maintenance of categorized folders of images on a local machine, a process that can be cumbersome and difficult to scale. The Image Labeler platform facilitates rapid development of image-based Earth science phenomena training datasets by allowing scientists to upload their existing imagery as well as extract new samples from open satellite imagery services made available through NASA’s Global Imagery Browse Service (GIBS). Image Labeler also supports GeoTIFF data, with capabilities such as displaying GeoTIFFs on an interactive map, drawing shapefiles over them, and tagging them with additional metadata. This allows scientists to perform spatiotemporal subsetting with geographic information and develop training data more quickly.
Built using modern web technologies, Image Labeler includes additional capabilities such as team collaboration for large-scale image tagging projects. Users can download their data in a machine-learning-ready format, allowing scientists to spend time on experimentation rather than on the collection of training data. In this presentation, Prasanna Koirala demonstrates how Image Labeler seeks to become a one-stop image data management solution for machine learning applications in Earth science.
Data store alternatives for the Multi-Mission Algorithm and Analysis Platform (MAAP)
Session IN042 — Lessons Learned on Supporting Analysis Ready Data (ARD) with Analytics Optimized Data Stores/Services (AODS) in Collaborative Analysis Platforms
In an era of continually advancing technologies, huge amounts of Earth observation data are being collected from a variety of sources or sensors with different types of formats and quality. As a result, storing, processing, and sharing those are challenging as the data volume exponentially grows. The Multi-Mission Algorithm and Analysis Platform (MAAP), a cloud-based collaborative system between the National Aeronautics and Space Administration (NASA) and the European Space Agency (ESA), has been launched to support global terrestrial carbon dynamics research by bringing together relevant data, algorithms and computing capabilities in order to more easily share and process data. In the future, the MAAP will support several high data volume satellite missions including the NASA-ISRO SAR mission (NISAR) and ESA’s Biomass mission.
These satellite data, in combination with other heterogeneous data collected from airborne and field campaigns, require new and innovative solutions for both managing and utilizing data. Therefore, improving the data store component in the MAAP is critical to meet those requirements. In the light of this goal, our study, first, provides an overview of data store implementations used by the Earth observation community, the techniques used to create these stores, and the systems in which those approaches are applied. Then, an evaluation of the various implementation options is presented leveraging MAAP’s criteria of identifying the most balanced data store solution for efficiently storing and accessing Earth observation data and for allowing efficient exploration and analysis with minimal additional user effort.