While ChatGPT and Bard remain in the spotlight, a group of researchers in Singapore are hard at work developing large-scale language models (LLMs) trained primarily on Southeast Asian data.
This artificial intelligence (AI) model, called SEA-LION, is designed to be an alternative to mainstream LLMs, but tailored for Southeast Asia. This generative AI model is trained using data from 11 local languages, including Indonesian, Vietnamese, and Thai, with special attention to local culture and traditions.
The project is primarily funded by the Singaporean authorities and aims to improve AI adoption metrics among businesses and individual users in the region. Previous attempts using OpenAI’s ChatGPT resulted in unclear output due to differences between the training language and regional dialects.
“We are not trying to compete with the big LLMs,” said Leslie Teo, senior director of AI products at AI Singapore. “We’re trying to complement them so they can better represent us.”
Mainstream LLMs are typically trained in English, but despite the language’s reach, almost 50% of the world’s population is yet to tap into the full potential of generative AI chatbots. To solve this challenge, governments are scrambling local language datasets to design bespoke chatbots to complement existing services.
“We also need regional LLMs because they support technology independence,” said Nuurrianti Jalli, assistant professor at Oklahoma State University. “Less reliance on his LLM in the West could provide better privacy for local residents and better align with national and regional interests.”
SEA-LION is expected to have an immediate impact on people in Southeast Asia, especially local companies pivoting to AI. Paul Condylis, associate vice president of data science at Indonesian startup Tokopedia, says the LLM model will be an essential addition to connecting, improving and personalizing the customer experience.
Southeast Asia, like North America and Europe, has developed an impressive reputation for embracing emerging technologies. Along with AI, the region is opening its borders with the application of blockchain technology in the fields of finance, logistics, tourism, gaming, and entertainment.
Disadvantages of regional LLM
Although regional LLMs have been praised for their localization properties, experts have revealed that their usage is rife with bias and censorship. There are also obvious concerns that local AI systems do not contain enough information about global worldviews and may portray a “historical revisionist view.”
“These models may fail to surface important socio-political issues such as human rights violations, corruption and legitimate criticism of political power,” Jari said.
Some point to the use of regional LLMs by authoritarian governments to crack down on dissent and suppress minorities. To ensure that LLMs reflect people’s cultural nuances and remain neutral in their output, experts promote the use of high-quality training data that is free of bias and anti-democratic tendencies.
For artificial intelligence (AI) to function properly within the law and succeed in the face of growing challenges, it must integrate enterprise blockchain systems that ensure the quality and ownership of data input. This makes it possible to ensure data security while also guaranteeing immutability. of data. Check out CoinGeek’s coverage Learn more about this new technology Why enterprise blockchain is the backbone of AI.
See: Artificial intelligence needs blockchain
New to blockchain? Check out CoinGeek’s Blockchain for Beginners section. This is the ultimate resource guide to learn more about blockchain technology.