Empowering Open Science with the Science Discovery Engine
In 2018, the NASA Science Mission Directorate (SMD) declared a long-term commitment to championing open science through their Strategy for Data Management and Computing, 2019–2024. The Open Source Science Initiative (OSSI) emerged from this strategic plan, and researchers are excited about the potential to more easily integrate and share information. One major recommendation from the scientific community was for the SMD to develop a capability to “support discovery and access to complex scientific data across Divisions” that enables open science (Strategy 1.3).
A team of researchers and developers began formulating a strategy in early 2020 to meet this ambitious SMD objective. Two years and close to 1,000,000 documents, datasets, and tools later, the Science Discovery Engine (SDE) search capability is ready for launch. The primary SDE development group operates within IMPACT, and many team members are associated with NASA’s Marshall Space Flight Center (MSFC). They have collaborated with several external partners to construct and refine features of the tool. The Enterprise Data Platform (EDP) and Mission Cloud Platform (MCP) teams within NASA’s Office of the Chief Information Officer (OCIO) assisted in deploying the powerful search capabilities of the SDE. Also, to ensure broad representation of NASA science efforts, the SDE team coordinated with a working group comprised of members from all NASA science focus areas (Astrophysics, Biological and Physical Sciences, Earth Science, Heliophysics, and Planetary Science). The working group continues to help identify content for potential inclusion in the SDE and provides guidance on future project development. The SDE team also works with Sinequa, the developer of the intelligent search platform, and Left Right Mind, a digital design consulting firm, to craft user-centered web interfaces.
The beta version of SDE was revealed on the SMD website on December 9 and to attendees at the American Geophysical Union (AGU) Fall Meeting in Chicago (December 12–16, 2022) had the opportunity to hear SDE team presentations and witness the application’s capabilities.
Constructing the SDE is a key step in NASA’s process of establishing and encouraging open science practices. The tool provides an infrastructure for vast quantities of NASA science information to be available and searchable in a single location, making it easier for science community members to collaborate and accelerate their work. Open data and information from all NASA science focus areas can be searched, filtered, and accessed.
The SDE provides quick access to a wide range of SMD science content.
Compiling and organizing information included in the SDE presents many challenges. First, the SDE team works to identify relevant data and information from a vast network of resources across SMD. Then, the team considers how to develop useful categories for encompassing such a wide range of topics. Depending on the specificity of a query, thousands of search results may be generated that include links to datasets, models, images, videos, software, or data analysis tools. To refine search processes, the SDE team developed an SMD vocabulary extraction workflow that leveraged over 50 glossaries, thesauri, and keywords across the SMD to generate term lists such as platforms, instruments, and missions. These lists are then used to create SMD-relevant filtering options to allow for guided exploration in the SDE.
Nearly a million scientific products are searchable within the SDE.
Kaylin Bugbee leads SDE team operations as a NASA research scientist and member of the OSSI team. Her expertise lies in data stewardship, informatics, and open science practices. When asked to describe the most important and exciting aspects of the SDE from her perspective, she responded:
To me, the most exciting thing about the SDE is how it makes the rich wealth of NASA’s open science data and information more accessible to an ever-growing community of users. This increased accessibility will open new pathways to scientific discovery and encourage more people to make use of the wealth of the open science data and information NASA provides.
Kaylin also explained how consolidating NASA’s science content in the SDE will assist researchers:
Before the SDE, information about science at NASA was spread out over 128 unique sources. These sources included websites, data repositories, code repos and document archives. For data specifically, over 84,000 science data products were found at 30+ different repositories, making it a challenge for new scientists to find data they may not be familiar with. The SDE will make the scientific process more efficient by decreasing the amount of time required to search for data and information.
Now that the SDE is available to the broader scientific community, Kaylin and the SDE team hope that it will quickly become a go-to source for reliable, accessible science information. They anticipate that the application will foster significant collaboration and innovation within and across science disciplines. According to Kaylin, the SDE team will continue expanding and refining the SDE:
This is only the beginning for the SDE. While we have brought in over 128 science information sources into the SDE, we plan to bring in more data and content from the five science topic areas in the coming months. We also plan to add enhanced features to the user interface and to further develop the SDE application programming interface (API).
You can try out the SDE for yourself and provide feedback about the search engine’s functionality via their website.
More information about IMPACT can be found on the NASA Earthdata and IMPACT project websites.