Large language models such as ChatGPT are becoming more relevant in the workforce as a tool to increase work efficiency. In the several months since becoming publicly accessible, ChatGPT has become the hot tool in corporate America. ChatGPT can be used to automate various writing tasks such as social media posts, emails, and even code. Helping users quickly get the data they need, so far, ChatGPT has revolutionized how work gets done in the office.
However, as discovered by researchers, several accessibility issues make it a non-ideal tool for the average academic researcher or proprietor of a small business. Commercial language models like ChatGPT cannot be locally deployed or used for sensitive, private data. The training costs for language models that can be locally deployed or used for confidential information are also in the millions. This creates serious accessibility issues akin to the digital divide, which Iowa State researchers are working to solve.
Qi Li, Assistant Professor of Computer Science, is working on an approach to automatically find high-quality prompts using moderately sized pre-trained language models such as BERT that can be locally deployed and used for sensitive, private data. Li wants to use her research to tackle the accessibility gap, allowing academic researchers and small business practitioners to obtain high-quality annotations with minimal cost without additional privacy concerns, thus democratizing language models. Li has been honored with a Faculty Early Career Development (CAREER) Award for her outstanding research on information extraction from scientific documents. This esteemed accolade is bestowed upon promising early-career faculty members who demonstrate the potential to become influential academic role models in both research and education.
“The [research] tackles a variety of problems drawn from different information extraction settings, which will lead to new principles, methods, and technologies for machine learning, data mining, and natural language processing,” says Li. “The information extraction results will benefit many domains, specifically life science domains such as biomedicine, animal science, and agronomy, all of which involve processing massive unlabeled textual data. The project will speed up literature understanding and the curation process and promote new scientific discoveries.”