What to See at the American Geophysical Union Fall Meeting
IMPACT is energetically preparing for this year’s American Geophysical Union (AGU) fall meeting in Chicago. Below are teasers for in-person presentations you should definitely make time on your schedule to attend. You won’t be disappointed!
Technical Architecture and Design Strategies for Consolidating NASA Airborne and Field Data in the Catalog of Archived Suborbital Earth Science Investigations (CASEI)
Monday, 12 December 2022 at 10:00 AM
Until recently, airborne and field data and its associated metadata were typically accessible only by sifting through a variety of websites, publications, and disparate data discovery tools. The Catalog of Archived Suborbital Earth Science Investigations (CASEI) increases data’s findability, accessibility, and reusability, making the data more FAIR. ADMG has worked closely with technical experts to develop well-defined data-models to drive cloud-based end-user portals for accessing airborne data while simultaneously providing curation interfaces for efficient inventory maintenance. In this presentation, we will explore the technical strategies that were used to standardize airborne and field metadata content in CASEI, allow for synchronous curation, provide effective data and information management, and connect science users effectively with needed data and information.
The Science Discovery Engine: An Open Science Success Story
Monday, 12 December 2022 at 5:50 PM
NASA is committed to building an inclusive open science community over the next decade and is championing the new Open-Source Science Initiative (OSSI) to foster that community. One component of the OSSI cyberinfrastructure is the Science Discovery Engine (SDE). The goal of the SDE is to enable the discovery of data, software and documentation across the five SMD divisions: Astrophysics, Biological and Physical Sciences, Earth Science, Heliophysics and Planetary Science. The SDE increases accessibility to the wealth of NASA’s open science data and information. In this presentation, we will present our collaborative work to date to build the SDE and our vision for empowering open source science in the future.
Using Large Scale Language Models for Enabling NLP Abilities Within Earth Sciences
Monday, 12 December 2022 at 5:55 PM
Natural language processing (NLP) is used to solve a variety of tasks including summarization, text classification, keyword tagging, and sentiment analysis. The field of Earth science informatics has been leveraging these tasks for advancing the field. Large language models (LLM) are rich in associations between domain-specific concepts. We collaborated with the team at IBM Watson to create such a model called Bidirectional Encoder Representations from Transformers for Earth science (BERT-E). In this presentation, we showcase BERT-E’s performance on the downstream tasks and discuss its usefulness for multiple other NLP problems within Earth science with minimal downstream training. We also present our NLP datasets that can be used to validate LLMs.
An Overview of NASA’s Catalog of Archived Suborbital Earth Science Investigations (CASEI): Supporting FAIR and Open Access to Airborne and Field Data
Wednesday, 14 December 2022 at 2:45 PM
The Catalog of Archived Suborbital Earth Science Investigations (CASEI) is a unique inventory of NASA’s airborne and field Earth science data (observations of our planet not taken from space). NASA assigns a variety of archive centers responsibility for these data; however, browsing, searching, and locating the holdings for observations relevant to a particular interest can be an onerous process — especially if the user is not already very familiar with the original data collection effort. CASEI is removing these barriers for access and facilitating a more intuitive, holistic data discovery, search, and access experience for NASA’s airborne and field Earth observations. This presentation will provide a summary of the motivations for and the development of the CASEI system. Particular attention will be granted to how CASEI facilitates discovery and reuse of these lesser-known NASA data, supporting the open science vision and enhancing the return on investments made to collect these unique and varied observations.
Improving Commercial Data Discoverability and Access for NASA’s Commercial Smallsat Data Acquisition (CSDA) Program
Thursday, 15 December 2022 at 11:30 AM
The National Aeronautic and Space Administration (NASA) Commercial Smallsat Data Acquisition (CSDA) Program was established to identify, evaluate, and acquire data from commercial satellite companies that support and complement NASA’s Earth sciences missions and research goals. CSDA has been developing a data system to provide scalable, efficient, continuous, and repeatable data management processes for all commercial data acquired by NASA. This includes providing access to commercial data to approved science investigators through NASA and vendor operated interfaces. This presentation will provide an update to CSDA Program activities including new data availability, development of CSDA user interfaces and supporting technologies, challenges in managing large-volume diverse datasets, and long-term data preservation activities to retain data for scientific reproducibility.
Flooding Assessment Using Computer Vision on Smallsat Imagery
Thursday, 15 December 2022 at 5:45 PM
Rapid and accurate mapping of affected areas post flood is crucial in quantifying the damage to human life and property. Computer vision-based flood mapping can supplement traditional remote sensing by providing a faster, simplified, yet accurate workflow. The emergence of commercial small satellites in recent years has transformed Earth observation by providing high spatial and temporal resolution data which can be leveraged in disaster mapping. The objective of the current study is to demonstrate the advantages of the computer vision-based workflow as an alternative to the traditional remote sensing techniques for assessing flood damage using high resolution smallsat imagery. For this presentation, we will showcase an approach to map the extent of the 2022 Yellowstone floods by considering high resolution before and after floods imagery from Planet. We will present the analysis framework and compare our results to other remote sensing products (e.g., Landsat) to showcase the effectiveness of our approach. Our methodology could be extended to detect and assess flood related damages in urban environments as well.
Using High-Resolution Planet Data to Retrieve Aerosol Properties for the 2020 Wildfire Events
Thursday, 15 December 2022 at 10:20 AM
Atmospheric aerosols with diameters less than 2.5 µm (PM2.5) are an important contributor to urban air pollution. While anthropogenic emissions are the primary source of PM2.5 pollution in urban areas, episodic occurrence of smoke from wildfires add to urban pollution causing exceedances. This study provides an overview of the machine learning algorithm to detect aerosols in very high resolution Planet data, followed by retrieval of aerosol property using radiative transfer modeling. This talk will focus on the record-setting, 2020 wildfire season in California. In order to identify the aerosols in the PlanetScope scenes collected for this season, we examined the feasibility of three machine learning (ML) classifiers: minimum distance classifier, maximum likelihood classification, and random forest classifier. We also compared Planet retrievals to the MODIS Level 2 AOD product and found correlation, RMSE, accuracy and bias of 0.67, 0.75 and 0.11, respectively.
A cloud-native workflow for publishing, discovering, processing, and visualizing geospatial data
Thursday, 15 December 2022 at 2:55 PM
NASA is adopting analysis ready cloud optimized (ARCO) data to support multiple initiatives including open science. The Visualization, Exploration and Data Analysis (VEDA) project aims to provide a cloud-native data system that leverages ARCO. Some of the challenges in building such a system are standardized ARCO data publication, processing, and discovery. VEDA addresses these challenges by creating a standard lightweight workflow for ARCO data transformation, cataloging using SpatioTemporal Asset Catalogs (STACs), and application programming interfaces (APIs) for data dissemination. VEDA also leverages past NASA activities for data processing and visualization. The paper will discuss the technical details and showcase use cases that leverage VEDA.
Deploying a Self-Supervised Learning Based Model to Search Events Across Space and Time
Thursday, 15 December 2022 at 3:45 PM
The self-supervised learning (SSL) based model was introduced to the Earth science domain as a method to generate labeled data for supervised learning. SSL was used to capture the feature representations of optical satellite images (primarily Moderate Resolution Imaging Spectroradiometer, MODIS, True Color). The proof-of-concept (POC) demonstrated that models utilizing the learned representations along with 5% of available labels were able to significantly outperform a normal machine learning model trained from scratch with 5% of available labels. Moving from POC to an operational system is not always easy. For example, searching for similar representations of an image from over thousands of images will become prohibitively expensive unless careful consideration is given to cost and compute optimizations. In this presentation, we will discuss the architecture design, complexities, and lessons learned while scaling and deploying the SSL pipeline with 21 years of indexed satellite imagery.
Quality Assessment of the Harmonized Landsat/Sentinel-2 (HLS) Version 2.0 Data
Thursday, 15 December 2022 at 10:00 AM
The Harmonized Landsat/Sentinel-2 (HLS) project, a NASA-USGS collaboration, is operationally producing surface reflectance data for all non-Antarctic land at least once every 3 days with a 30m pixel size. To make the data comparable as if it came from the same sensor, the harmonization uses the same atmospheric correction code LaSRC developed by USGS and GSFC/University of Maryland, the same cloud/shadow masking code Fmask, a view angle effect correction with a BRDF model, the subtle adjustment of the Sentinel-2 bandpasses to the Landsat ones, and the gridding of the HLS surface reflectance and quality assessment data to a common spatial reference system. HLS data products are processed from L1T or L1C input data via cloud processing by the NASA IMPACT team, and distributed by the LP-DAAC. This paper demonstrates the effect of each of the successive harmonization steps on deriving the surface reflectance as an intrinsic property of the land surface and assesses the quality of the surface reflectance at a few globally distributed testing sites. We also discuss preliminary research into a merged “best pixel” composite HLS product that minimizes cloud cover and provides a regular temporal revisit.
Should we build Foundation AI models for Earth Science?
Friday, 16 December 2022 at 5:55 PM
Foundation models are AI models pre-trained on a comprehensive data set and used for various downstream tasks. The model captures emergent behavior within the modeled system from the data and is part of a growing trend in AI pushing for homogenization for a specific domain. Language models such as BERT or GPT-3 are the best-known examples of these models. Creating a foundation model includes data curation, training, and validation. Given the extensive data archive in Earth Science, foundation models hold the promise to have a meaningful impact and broad utility. For example, a foundation model built on Modern-Era Retrospective analysis for Research and Applications (MERRA) could benefit many atmospheric and weather prediction applications. Similarly, a foundation model using optical remote sensing data such as Harmonized Landsat Sentinel (HLS) can assist many land use/land cover applications. Are foundation models the ultimate instantiation of the Digital Twin concept? Since significant resources are required to build foundation models, how should we create consortiums with the private sector to tackle this? This presentation will raise these questions for discussion in this proposed session.