A team of Japanese researchers is using Fuitsu's Fugaku supercomputer to train Fugaku-LLM, a large language model specifically adapted to the Japanese language and culture.
Large language models, such as OpenAI's GPT-4, are primarily developed by US companies and optimized for English. Existing models often struggle with the intricacies of Japanese language and culture, such as confusing rare characters or not applying cultural communication norms appropriately, the researchers say.
To address this, a team of researchers from Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, and the companies CyberAgent and Kotoba Technologies is developing Fugaku-LLM. The model is designed to conduct natural dialogues that consider Japanese polite language and other features of the language.
A unique aspect of Fugaku-LLM is that about 60 percent of its training data is in Japanese, with the rest in English, as well as mathematical and code data. Compared to models based on existing English models that are continually trained on Japanese, Fugaku-LLM has learned much of its information directly in Japanese, according to the research team.
The model was trained on the Japanese supercomputer Fugaku, which uses CPUs developed by Fujitsu instead of GPUs. With 13,824 processing nodes and 380 billion tokens used for training, Fugaku-LLM has 13 billion parameters.
The research team claims that Fugaku-LLM is the best open model developed in Japan with its own data, achieving a benchmark score of 9.18 on the Japanese MT bench for humanities and social sciences tasks provided by Stability AI.
The language models and source code of Fugaku-LLM are available on Hugging Face, Github, and the Fujitsu Research Portal for research and commercial purposes, as long as users comply with the Apache 2.0 license.
The Japanese government and companies such as NEC, Fujitsu, and SoftBank are investing hundreds of millions of dollars in developing their own language models. They want to promote research in their own country with more culturally sensitive models and become less dependent on large U.S. technology companies.
Whether that works out remains to be seen. OpenAI recently released a Japanese-optimized version of GPT-4, which is already being used in projects with the Japanese government.