The International Geoscience and Remote Sensing Symposium (IGARSS) brings together scientists, researchers, practitioners, and policymakers from around the globe to share their knowledge, insights, and latest advancements in the domain. IMPACT is bringing presentations to IGARSS 2023 that highlight cutting-edge research, advancements in satellite technology and novel remote sensing applications. In the run-up to the event, this post describes several of these presentations so you can plan to attend.
Wed, 19 Jul, 16:21–16:33 Pacific Time (UTC -7)
Open science aims to transparently share scientific resources, but managing them collaboratively and authentically presents challenges. This presentation examines the use of blockchain technology to tackle these issues. Blockchain, an open-source and decentralized technology, enables secure and transparent exchange and verification of information. It operates through interconnected blocks that create an immutable record once validated by a network of computers.
The motivations for implementing blockchain in open science are diverse. Blockchain enhances transparency and reproducibility by storing all data and outcomes on the blockchain, facilitating easy verification. Additionally, blockchain incentivizes scientists and researchers by facilitating secure and open data sharing and rewarding contributors. Furthermore, blockchain enables accurate tracking and verification of science credits, including citations, and non-traditional attributes. By maintaining an accessible and tamper-proof record, a researcher’s achievements can be precisely assessed and recognized.
To demonstrate the practical implementation of blockchain in open science, an AWS-hosted blockchain was utilized to synchronize data across two organizations, with one acting as the authoritative data provider. The blockchain configuration enabled tracking, copying, citation, and download functionalities, and a dedicated dashboard was developed for monitoring. These use cases exemplify how blockchain effectively manages and verifies scientific resources in the context of open science.
Wed, 19 Jul, 15:57–16:09 Pacific Time (UTC -7)
This talk will cover the basics of building cloud-native data systems to support open science. Cloud-native technologies are necessary for achieving new scales of science due to the distributed nature of open science work and the large scale of remote sensing data archives. The presentation will cover cloud-native data formats such as COG, Zarr, Geoparquet, and COPC.io, along with complementary libraries that allow users to access only the necessary data bytes, reducing data transfer over the network and supporting users with low bandwidth networks. Data discovery will also be emphasized, with the adoption of STAC as an open-source data cataloging standard by most geospatial data providers, including NASA, AWS, Google Earth Engine, and Microsoft Planetary Computer. STAC catalogs enable users to access multiple catalogs, bringing them closer to discovering all relevant data.
For data access, the presentation will highlight various formats and libraries, including pangeo-forge, Zarr, Cloud-Optimized GeoTIFF (COG), Kerchunk, xpublish, Cloud-Optimized Point Clouds, and GeoParquet. Cloud-optimized data formats, such as COGs, provide internal metadata and enable clients to read subsets of data over the network without downloading entire files. As geospatial archives continue to grow, data providers need to optimize access by using different storage temperatures, resulting in lower costs for infrequently accessed data stored in archival formats, while cloud-optimized formats allow for quick access, potentially leveraging caching for ephemeral data storage. Data providers should design data formats and cataloging from acquisition to distribution to support cloud-optimized access, and community-based capacity building initiatives like NASA’s Openscapes and Project Pythia play a crucial role in unlocking the potential of the cloud.
Wed, 19 Jul, 15:57–16:09 Pacific Time (UTC -7)
The process of searching for Earth science phenomena in large archives of Earth observation satellite imagery data is complex and requires organizing and categorizing the images. Manual tagging is time-consuming and impractical due to the increasing volume and speed of satellite data acquisition. Previous attempts to automate tagging have used machine learning (ML) algorithms, but they require substantial time, computational resources, and labeled data. Additionally, re-indexing the data when a new phenomenon needs to be searched is computationally expensive. We propose an alternative data-driven framework that eliminates the need for manual indexing, labeling, or creating specialized ML classifiers.
Our method utilizes self-supervised learning (SSL) techniques to obtain feature vectors for satellite image search and retrieval. An approximate nearest neighbors (ANN) algorithm is employed to cluster and retrieve images with similar features, indicating similar Earth science phenomena. What sets our approach apart is the integration of this methodology with various cloud services, enabling efficient searching through millions of images in a short time. To demonstrate the framework, we developed a web interface to search 21 years of daily global satellite imagery. In this paper, we present our progress in implementing embedding-based search in remote sensing, along with the potential benefits, challenges, and limitations of the current framework.