On May 15, 2023, Dr. Bui Hai Hung, CEO of VinAI, spoke at the second InnovaTalk webinar by VinFuture Foundation titled “ChatGPT and Beyond.” In front of an audience of over 230 members, including domestic and foreign scholars and experts, VinAI’s CEO represented the Vietnamese Science and Technology community, sharing insights about the challenges for the development of Large Language Models (LLMs) for low-resource languages.
VinAI features in InnovaTalk #2
Artificial intelligence (AI) has revolutionized the way we live, work, and communicate. The latest phenomenon, ChatGPT, developed by OpenAI, has rapidly gained popularity with 100 million users in just two months of operation. While ChatGPT impresses with its ability to provide high-quality answers to complex topics, it has also raised concerns regarding hallucination and the generation of made-up facts in a professional manner. The InnovaTalk webinar discussed both the vulnerabilities and solutions to this issue.
Joining three world-renowned AI experts, Dr. Bui Hai Hung delivered insights about ChatGPT, the capabilities and limitations of Large Language Models (LLMs), and the implications for the development of AI-driven technology control in the future.
With over 230 attendees, including domestic and foreign scholars and experts, VinAI’s CEO, well-known for his expertise at Google DeepMind, Adobe Research, Natural Language Comprehension Laboratory Nuance Nature, and the AI Center at SRI International, represented the Vietnamese Science and Technology community, providing clarifications and insights into opportunities and challenges for the development of Large Language Models (LLMs) for low-resource languages.
ChatGPT is not ready for half of the world’s population
From Dr. Bui Hai Hung’s perspective, the rapid adoption of ChatGPT highlights its lack of readiness to serve more than half of the global population who do not use English as their first language. As the founder and CEO of VinAI, he places great importance on low-resource languages, referring to non-mainstream languages with limited commercial relevance, expertise, time, and investment.
Dr. Bui’s analysis revealed at least 22 countries with a minimum of 50 million native speakers that fall into the low-resource language category. In terms of GDP, these populations make up at least 40% of the world’s total. “We are faced with the reality that technologies like ChatGPT are not yet prepared for them,” stated Dr. Bui.
The lack of high-quality pre-trained LLMs and publicly available large-scale, high-quality corpora for low-resource languages is a significant issue. Insufficient resources and limited knowledge exacerbate the problem of hallucination, leading to the generation of made-up facts and unnatural texts.
Dr. Bui proceeded to use Vietnamese as an example to illustrate the relatively lower and less efficient performance of ChatGPT in low-resource languages. He presented two instances where ChatGPT failed to provide accurate answers to questions about a popular novel “Tắt Đèn” by Ngô Tất Tố writer and a popular song “Cây đàn sinh viên” presented by Mỹ Tâm singer in Vietnam. These are facts that many Vietnamese individuals would be familiar with, but ChatGPT, in its attempt to respond, provided completely incorrect answers, claiming that Trịnh Công Sơn wrote the song “Cây đàn sinh viên” in 1958 – this song was composed by musician Quoc An in 2001.
Opportunities and challenges with the Large Language Models (LLM) for low-resource languages
Dr. Bui’s team at VinAI has developed Vietnamese-specific LLMs to address this problem. Although significant training and fine-tuning are required, this experiment has demonstrated the potential to tackle one of the critical challenges in LLM models for low-resource languages. The underlying reason may simply be a lack of locally specific and culturally specific training data.
According to Dr. Bui, low-resource LLMs must possess emerging capabilities in in-context learning, instruction following, and step-by-step reasoning while maintaining computational efficiency. This will enable new markets and use cases in local regions and enhance the responsibility and trustworthiness of LLMs for users.
“Low-resource LLMs are one of the keys to democratizing LLM technology to the world, and Vietnamese is a great test,” said Dr. Bui. He hopes that major tech companies will pay greater attention to non-mainstream languages and join the path to bring the benefits and accessibility of AI technology to people regardless of their geographical location.
Utilizing AI to better humanity
During the panel discussion on the risks associated with human-like chatbots in accuracy- and safety-critical areas such as healthcare (e.g., medical diagnoses) and finance (e.g., financial transactions), Dr. Bui emphasized that saying ‘I don’t know’ should be considered a last resort rather than training LLMs to fabricate answers that could have consequences.
When answering a student’s question about the possibility of shortening the training process for LLMs by correcting ChatGPT in real time, Dr. Hung shared that such a process might be achievable through fine-tuning the model. However, it would require significant engineering efforts. The most important task is not about adding more data but rather improving the computational efficiency of the current technology with the available resources.
As one of the pioneers in AI research, development, and application in Vietnam, VinAI is constantly searching for and producing the best solutions, applying artificial intelligence with high accuracy to bring a safer, more convenient, and economical experience for individuals and organizations. Find out more about VinAI’s products here.