Content
summary Summary

A team of Japanese researchers is using Fuitsu's Fugaku supercomputer to train Fugaku-LLM, a large language model specifically adapted to the Japanese language and culture.

Large language models, such as OpenAI's GPT-4, are primarily developed by US companies and optimized for English. Existing models often struggle with the intricacies of Japanese language and culture, such as confusing rare characters or not applying cultural communication norms appropriately, the researchers say.

To address this, a team of researchers from Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, and the companies CyberAgent and Kotoba Technologies is developing Fugaku-LLM. The model is designed to conduct natural dialogues that consider Japanese polite language and other features of the language.

A unique aspect of Fugaku-LLM is that about 60 percent of its training data is in Japanese, with the rest in English, as well as mathematical and code data. Compared to models based on existing English models that are continually trained on Japanese, Fugaku-LLM has learned much of its information directly in Japanese, according to the research team.

Ad
Ad

The model was trained on the Japanese supercomputer Fugaku, which uses CPUs developed by Fujitsu instead of GPUs. With 13,824 processing nodes and 380 billion tokens used for training, Fugaku-LLM has 13 billion parameters.

The research team claims that Fugaku-LLM is the best open model developed in Japan with its own data, achieving a benchmark score of 9.18 on the Japanese MT bench for humanities and social sciences tasks provided by Stability AI.

The language models and source code of Fugaku-LLM are available on Hugging Face, Github, and the Fujitsu Research Portal for research and commercial purposes, as long as users comply with the Apache 2.0 license.

The Japanese government and companies such as NEC, Fujitsu, and SoftBank are investing hundreds of millions of dollars in developing their own language models. They want to promote research in their own country with more culturally sensitive models and become less dependent on large U.S. technology companies.

Whether that works out remains to be seen. OpenAI recently released a Japanese-optimized version of GPT-4, which is already being used in projects with the Japanese government.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • An interdisciplinary Japanese research team has used Fujitsu's Fugaku supercomputer to train the Fugaku-LLM large language model specifically for the Japanese language and culture.
  • Unlike existing models, which are mainly optimized for English, Fugaku-LLM has learned much of its information directly in Japanese, considering subtleties such as Japanese polite language.
  • With 13 billion parameters and a high benchmark score for humanities tasks, Fugaku-LLM is the best open model developed in Japan using its own data, according to the research team.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.