The Ethics of Large Language Models: Who Controls the Future of Open Science?
Kaylin Bugbee, Rahul Ramachandran
This blog post is a collaborative effort between Kaylin Bugbee, who is responsible for leading Science Discovery Engine Project–one of NASA’s Open-Source Science Initiative efforts, and Rahul Ramachandran, who is leading the AI foundation model effort within NASA/IMPACT. For some time, IMPACT has been studying the effects of large language models on science, particularly on the data lifecycle. However, with the introduction and widespread use of ChatGPT, the narrative has shifted significantly, prompting a deeper examination about the impact of generative large language models (LLMs) on NASA’s open science initiative.
What Are Large Language Models?
LLMs are machine learning models designed to process and generate human language based on large-scale text data. These models use neural networks, a type of machine learning algorithm, to learn patterns and make predictions based on vast amounts of text data . OpenAI’s GPT series, which includes GPT-2, GPT-3, and newer versions, is among the most well-known LLMs. GPT-3, for example, was trained on a dataset of over 45 terabytes of text data from various sources, such as books, articles, and web pages . Google’s BERT (Bidirectional Encoder Representations from Transformers) and Facebook’s RoBERTa (Robustly Optimized BERT Pre-training Approach) are other notable LLMs trained on large text data . These models have various applications such as language translation, chatbots, text summarization, and generating human-like text for news articles, chat messages, and creative writing.
Despite the potential benefits of LLMs in speeding up tasks such as writing papers, grants, and code, there are concerns about their reliability, particularly in returning false information . The learning process of LLMs relies on statistical patterns of language in large databases of online text, including untruths, biases, and outdated information . As a result, LLMs are unreliable in producing accurate information, particularly for technical topics on which they have had limited training data. Moreover, the fixed context length of LLMs also pose challenges in generating factually correct information since they start to become less coherent as the context length increases. Therefore, it is essential to be aware of the limitations of LLMs and validate their outputs before using them for critical tasks.
What Is ChatGPT?
ChatGPT is a conversational agent that utilizes the large language model GPT-3 (specifically, InstructGPT, fine-tuned by human feedback) to generate human-like responses to natural language inputs. As a killer application of LLM GPT-3, ChatGPT has the potential to revolutionize the way we interact with computers and digital services. ChatGPT is capable of understanding complex language structures and generating text that is coherent and contextually relevant. Furthermore, ChatGPT can be fine-tuned on specific tasks and domains to provide more specialized and accurate responses. ChatGPT can be used for various purposes, such as customer service, language translation, and even creative writing. In the scientific domain, ChatGPT can also help with writing manuscripts and generating code. Overall, ChatGPT represents a significant advancement in natural language processing and has the potential to enhance human-computer interaction in numerous ways.
While ChatGPT has potential, it should be noted that ChatGPT has several known shortcomings. First, generative LLMs are prone to producing errors or creating misinformation. Second, given that LLMs are often trained on historical documents, there is the potential to incorporate bias and outdated ideas into responses. GPT-3 has restricted its corpus to documents from 2021 and was not allowed to browse the Internet to circumvent this restriction . As technology continues to evolve, some of the issues may be resolved and LLM-based conversational agents like ChatGPT will likely become even more powerful and prevalent in various fields.
Implications of Large Language Models to Science
The emergence of tools based on LLMs like ChatGPT has provided researchers and scientists with a tool for editing manuscripts, writing or checking code, and brainstorming ideas . Although ChatGPT has made its debut in scientific literature, the use of AI-generated text has become a topic of debate, with some publishers grappling with the question of whether it is appropriate to cite ChatGPT as an author . While some publishers have prohibited the use of text generated by ChatGPT in scientific papers, citing it as scientific misconduct, others have yet to create policies regarding the use of AI tools in published literature . Nevertheless, the use of LLM-based generative tools like ChatGPT has raised concerns about their reliability and tendency to return false information, emphasizing the need for human oversight and the recommitment of scientists to giving careful attention to detail to maintain trust in science .
To address these concerns, researchers have suggested measures to enforce honest use, transparency in use, and detection and watermarking of AI-generated content . Detection tools for AI-generated content can help in flagging the use of LLM . However, as language models become more sophisticated, these tools may not be infallible, and the future of generative AI will depend on the ethical choices made by researchers.
Implications of These Generative Models to Open Science
Open science is defined as a collaborative culture enabled by technology that empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding . Open science adheres to a number of principles:
- Transparent Science ensures that the scientific process and results are visible, accessible, and understandable .
- Accessible Science makes scientific data, tools, software, documentation, and publications accessible to everyone .
- Inclusive Science welcomes participation and collaboration in the process of science from people and organizations with diverse backgrounds . This includes public engagement in the scientific process.
- Reproducible Science ensures that the scientific process and results are open so that they can be independently verified and validated .
The goal of open science is to accelerate the time to actionable science. While technology alone cannot achieve all of the goals of open science, it can aid in streamlining and optimizing the various steps involved in conducting research, from idea generation to data collection and analysis, to publication and dissemination of findings. New technologies, such as collaborative platforms, cloud computing, artificial intelligence, and high-throughput experimentation can help speed up data collection and analysis.
How Will Large Language Models Impact Open Science Positively?
LLMs have the potential to enable some of the key goals and principles of open science. By improving the productivity of scientists, LLMs accelerate the time to actionable science. This includes enhancing the efficiency of background research and literature reviews as well as streamlining the editing processes of both manuscripts and code. LLMs also have the potential to promote scientific accessibility and inclusivity, providing greater equity of access to scientific information for individuals who have an interest in science but lack specialized research expertise, including citizen scientists, educators, students, and the general public. By allowing users to ask questions in simpler language and receive information that is easier to understand, LLMs facilitate comprehension and encourage engagement with scientific material. Additionally, LLMs have the potential to overcome bias that is inherent in both humans and existing search engines. Humans tend to favor the outputs of known researchers  or collaborators while LLMs eliminate this bias and make science more equitable by providing the most relevant information, regardless of renown. Similarly, the media often practices ‘cafeteria science’ — the act of selecting research results based on what will generate the most clicks or align with a given group’s worldview . In addition, existing search engines for scientific publications  have built in bias as they depend on citation metrics or page rank to expose content that receives more views, rather than the content that more closely matches the original query.
How Will Large Language Models Impact Open Science Negatively?
Many of the downsides of using LLMs for science have been discussed elsewhere . For open science more specifically, LLMs are problematic for several reasons. First, LLMs, in their current iteration, are opaque and incapable of reproducing results. LLMs do not provide attribution for created content or information provided to the researcher, a requirement for the open science principle of transparency. At a systemic level, this is also problematic because science is built on a culture of receiving credit for publications and achievements. If scientists do not receive credit for work, this could undermine open science efforts by discouraging scientists from openly sharing their work in the future.
More broadly, scientists are required to provide a justification for a fact or finding which also supports transparency and reproducibility. Yet, in the current version of ChatGPT, justification is not required for purported facts it shares. This makes it difficult for a user to know when errors have occurred or misinformation has been provided. ChatGPT ‘speaks’ in an authoritative voice that makes it difficult for novices to detect what could be BS.
LLMs and conversational agents such as ChatGPT are now a permanent part of our technological landscape, and their usage and potential misuse cannot be prevented. To address ethical concerns surrounding the use of these tools, the scientific community needs to define clear policies on their proper use, such as those being considered by the NASA-sponsored AGU AI/ML Ethics workshop . Best practices on citing the use of these tools in addition to guidance on developing and openly sharing prompts will help demonstrate proper use and make workflows more transparent.
It is clear that technological advancements are outpacing the culture shift towards open science, which poses new questions on how to build LLMs that align with scientific ethics. One solution is to use only curated peer-reviewed material to generate responses, although this approach also has its own challenges. Making training data openly available can improve transparency, but it is not always possible, especially since high-quality journals remain closed to the public. Pre-print archives cannot be leveraged either, as they are not peer-reviewed, leading to the potential for misinformation to infiltrate the model.
Furthermore, there is a need to provide attribution to literature, data, and code leveraged by LLMs. Science is built on a culture of receiving credit for work done and discoveries made. Unless this model fundamentally changes, as a scientific best practice, technological solutions should provide full attribution and citation to the generated response. By doing so, scientific integrity can be maintained, and the use of LLMs in research can be improved. As LLM technology continues to advance, it is imperative that ethical considerations remain at the forefront of technological development in natural language processing.
Overall, while LLMs and tools like ChatGPT have tremendous potential, it is essential to ensure that their use aligns with scientific ethics and that proper guidelines are established to govern their ethical use.
Key points for this blog were first ideated, written and then fed to ChatGPT to generate a draft that was then manually edited.
Convert the following bullets to a scientific paragraph so most of the citations to other academic papers are kept, the text minimizes the use of jargon, the text grammar is correct, spelling errors are fixed, and the text is in active voice and has a clear sentence structure.
 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
 Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186).
 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
 Ramachandran, R., Bugbee, K., & Murphy, K. (2021). From open data to open science. Earth and Space Science, 8, e2020EA001562. https://doi.org/10.1029/2020EA001562
 Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63. https://doi.org/10.1126/science.159.3810.56.
 Bergstrom, Carl T. and Jevin D. West, Calling Bullshit: The Art of Skepticism in a Data-driven World. New York, Random House, 2020.
 West, Jevin & Bergstrom, Carl. (2021). Misinformation in and about science. Proceedings of the National Academy of Sciences of the United States of America. 118. 10.1073/pnas.1912444117.