AI-Ready: Making petabytes of data more discoverable and usable

IMPACT Unofficial
4 min readMar 29, 2021

NASA embraces open science. IMPACT works to enable open data for NASA tools such as Worldview which gives users access to over 450 terabytes of satellite imagery. Open data is critical to research. Before embarking on a scientific study related to particular phenomena, such as wildfires, scientists need to collect numerous examples of these phenomena. Locating these examples requires searching through 197 million square miles of satellite imagery each day across more than 20 years of data. Such an effort can produce a valuable trove of data, but the act of manually searching the data is cumbersome and laborious. Making large amounts of data more discoverable and usable for specific parameter extraction is a hard problem. A question such as “Can we use new techniques, such as self-supervised learning, to tackle our data discovery problem?” has a number of hidden questions:

•Can we find a needle in a haystack?
• Can we teach a machine to search fine-grained data without labels?
• Can we get artificial intelligence (AI) to present examples to a human when it gets confused?
• Can we scale up the search from gigabytes to terabytes to petabytes?
• Can we learn to represent rare events?
• Can we create tools that make it simple to ingest the data?
• Can we teach AI to focus on the interesting parts?
• Can we search several years of data covering the entire planet in under a second?

To tackle these questions, IMPACT embraced an open science approach and partnered with the SpaceML initiative, an international AI accelerator for citizen scientists and a branch of Frontier Development Lab in partnership with NASA, the SETI Institute, and Trillium Technologies Inc. SpaceML engages early career research engineers and connects them with mentors who are senior machine learning and software engineering experts. Current participants range from high school graduates to graduate students, all the way to industry professionals, as well as contributors from non-traditional computer science academic backgrounds, including two high school teachers transitioning their careers to data science.

Anirudh Koul, the driving force behind SpaceML’s Worldview Search, explains the driving impetus behind this initiative:

Each contributor is motivated by the impact they can have on the planet. And when determination finds opportunity and guidance, hard problems start to crack open. Reducing the time of manual data curation from several months to hours or even minutes opens new avenues of scientific exploration previously considered impractical. By making it available in open source as another tool in scientists’ toolbox, we hope to accelerate the process of making scientific discoveries.

This collaborative partnership with SpaceML envisions a generalizable package of machine learning operations (MLOps) components and workflows that can be utilized not only by Earth science tools and applications such as Worldview, but also by other teams working on datasets from Hubble Space Telescope to the NASA Solar Dynamics Observatory. Users would not need to understand programming or even ML to benefit from MLOps. Further, the collaboration embraces the goal of developing the underlying ML components from technology readiness level (TRL) 3, the point of sound software engineering, to TRL 9, a flight-ready and deployed solution.

The Worldview Image Search Pipeline

James Parr, the director of the Frontier Development Lab, explains the value of the Worldview image search pipeline this way:

We’re realizing that deploying mature machine learning outcomes for one specific use requires a similar cost and effort as building tools for multiple scenarios. So why not instead make a generalizable toolbox for space AI applications that makes it easy for others to adapt to their specific problem? SpaceML is the expression of this idea.

Example of an image search using the current MLOps prototype

The result of this effort is a set of open science tools that simplify the use of NASA’s Earth science archive for machine learning. By partnering with SpaceML, IMPACT also inspires a new generation of ML engineers to apply their ingenuity to making a difference to life on Earth.

Worldview Search technical contributors include Rudy Venguswamy, Ajay Krishnan, Tarun Narayanan, Jenessa Peterson, Daniela Fragoso, Kai Priester, Nathan Hilton, Stefan Pessolano, Surya Ambardar, Aaron Banze, Mike Levy, Abhigya Sodani, Fernando Lisboa, Shivam Verma, Suhas Kotha, Deep Patel, Erin Gao, Rajeev Godse, Sarah Chen, Esther Cao, Yujeong Zoe Lee, Mandeep Khokhar, Sumanth Ramesh, Walker Stevens, Subhiksha Muthukrishnan, Navya Reddy Sandadi, Leo Silverberg, Satyarth Praveen, Sherin Thomas, Dharini Chandrasekaran, Udara Weerasinghe, Meher Anand Kasam, Siddha Ganju and Anirudh Koul.

The Github tools are gradually being released publicly. Those that are listed at the bottom of this page:

More recent details on the project at this page:

More information about IMPACT can be found at NASA Earthdata and the IMPACT project website.



IMPACT Unofficial

This is the unofficial blog of the Interagency Implementation and Advanced Concepts Team.