Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"

May 10, 2024 at 01:55 am

TOKYO, May 10, 2024 - (JCN Newswire) - - A team of researchers in Japan released Fugaku-LLM, a large language model (1) with enhanced Japanese language capability, using the RIKEN supercomputer Fugaku. The team is led by Professor Rio Yokota of Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, Shota Sasaki of CyberAgent, Inc, and Noriyuki Kojima of Kotoba Technologies Inc.

To train large language models on Fugaku, the researchers developed distributed training methods, including porting the deep learning framework Megatron-DeepSpeed to Fugaku in order to optimize the performance of Transformers on Fugaku. They accelerated the dense matrix multiplication library for Transformers, and optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication library on the Tofu interconnect D.

Fugaku-LLM has 13 billion parameters (2) and is larger than the 7-billion-parameter models that have been developed widely in Japan. Fugaku-LLM has enhanced Japanese capabilities, with an average score of 5.5 on the Japanese MT-Bench (3), the highest performance among open models that are trained using original data produced in Japan. In particular, the benchmark performance for humanities and social sciences tasks reached a remarkably high score of 9.18.

Fugaku-LLM was trained on proprietary Japanese data collected by CyberAgent, along with English data, and other data. The source code of Fugaku-LLM is available on GitHub (4) and the model is available on Hugging Face (5). Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license.

In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.

Background

In recent years, the development of large language models (LLMs) has been active, especially in the United States. In particular, the rapid spread of ChatGPT (6), developed by OpenAI, has profoundly impacted research and development, economic systems, and national security. Countries other than the U.S. are also investing enormous human and computational resources to develop LLMs in their own countries. Japan, too, needs to secure computational resources for AI research so as not to fall behind in this global race. There are high expectations for Fugaku, the flagship supercomputer system in Japan, and it is necessary to improve the computational environment for large-scale distributed training on Fugaku to meet these expectations.

Therefore, Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies have started a joint research project on the development of large language models.

Role of each institution/company

Tokyo Institute of Technology: General oversight, parallelization and communication acceleration of large language models (optimization of communication performance by combining three types of parallelization, acceleration of collective communication on the Tofu interconnect D)

Tohoku University: Collection of training data and model selection

Fujitsu: Acceleration of computation and communication (acceleration of collective communication on Tofu interconnect D, performance optimization of pipeline parallelization) and implementation of pre-training and fine-tuning after training

RIKEN: Distributed parallelization and communication acceleration of large-scale language models (acceleration of collective communication on Tofu interconnect D)

Nagoya University: Study on application methods of Fugaku-LLM to 3D generative AI

CyberAgent: Provision of training data

Kotoba Technologies: Porting of deep learning framework to Fugaku

GPUs (7) are the common choice of hardware for training large language models. However, there is a global shortage of GPUs due to the large investment from many countries to train LLMs. Under such circumstances, it is important to show that large language models can be trained using Fugaku, which uses CPUs instead of GPUs. The CPUs used in Fugaku are Japanese CPUs manufactured by Fujitsu, and play an important role in terms of revitalizing Japanese semiconductor technology.

By extracting the full potential of Fugaku, this study succeeded in increasing the computation speed of the matrix multiplication by a factor of 6, and the communication speed by a factor of 3. To maximize the distributed training performance on Fugaku, the deep learning framework Megatron-DeepSpeed was ported to Fugaku, and the dense matrix multiplication library was accelerated for Transformer. For communication acceleration, the researchers optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication on the Tofu interconnect D. The knowledge gained from these efforts can be utilized in the design of the next-generation computing infrastructure after Fugaku and will greatly enhance Japan's future advantage in the field of AI.

2. An easy-to-use, open, and secure, large language model with 13 billion parameters

In 2023, many large language models were developed by Japanese companies, but most of them have less than 7 billion parameters. Since the performance of large-scale language models generally improves as the number of parameters increases, the 13-billion-parameter model the research team developed is likely to be more powerful than other Japanese models. Although larger models have been developed outside of Japan, large language models also require large computational resources, making it difficult to use models with too many parameters. Fugaku-LLM is both high performance and well-balanced.

In addition, most models developed by Japanese companies employ continual learning (8), in which open models developed outside of Japan are continually trained on Japanese data. In contrast, Fugaku-LLM is trained from scratch using the team's own data, so the entire learning process can be understood, which is superior in terms of transparency and safety.

Fugaku-LLM was trained on 380 billion tokens using 13,824 nodes of Fugaku, with about 60% of the training data being Japanese, combined with English, mathematics, and code. Compared to models that continually train on Japanese, Fugaku-LLM learned much of its information in Japanese. Fugaku-LLM is the best model among open models that are produced in Japan and trained with original data. In particular, it was confirmed that the model shows a high benchmark score of 9.18 in the humanities and social sciences tasks. It is expected that the model will be able to perform natural dialogue based on keigo (honorific speech) and other features of the Japanese language.

Future Development

The results from this research are being made public through GitHub and Hugging Face so that other researchers and engineers can use them to further develop large language models. Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license. Fugaku-LLM will be also offered to users via the Fujitsu Research Portal from May 10th, 2024.

Acknowledgement

This research was supported by the Fugaku policy-supporting proposal "Development of Distributed Parallel Training for Large Language Models Using Fugaku" (proposal number: hp230254).

[1] Large language model :Models the probability with which text appears and can predict the text (response) that follows a given context (query).

[2] Parameter :A measure of the size of a neural network. The more parameters, the higher the performance of the model, but the more data is required for training.

[3] Japanese MT-Bench :Benchmark test provided by Stability AI

[4] GitHub :Platform used to publish open source software

[5] Hugging Face :Platforms used to publish AI datasets

[6] ChatGPT :A large language model developed by OpenAI, which has brought about a major social change, surpassing 100 million users in about two months after its release.

[7 ]GPU :Originally produced as an accelerator for graphics, but has recently been used to accelerate deep learning

[8] Continual learning :A method for performing additional training on a large language model that has already been trained. Used for training language models in different languages or domains.

About Fujitsu

Fujitsu's purpose is to make the world more sustainable by building trust in society through innovation. As the digital transformation partner of choice for customers in over 100 countries, our 124,000 employees work to resolve some of the greatest challenges facing humanity. Our range of services and solutions draw on five key technologies: Computing, Networks, AI, Data & Security, and Converging Technologies, which we bring together to deliver sustainability transformation. Fujitsu Limited (TSE:6702) reported consolidated revenues of 3.7 trillion yen (US$26 billion) for the fiscal year ended March 31, 2024 and remains the top digital services company in Japan by market share. Find out more: www.fujitsu.com.

Press Contacts

Fujitsu Limited

Public and Investor Relations Division

Inquiries (https://bit.ly/3rrQ4mB)

	1st Jan change	Capi.
FUJITSU LIMITED	+10.62%	27.28B
ACCENTURE PLC	-12.49%	193B
TATA CONSULTANCY SERVICES LTD.	+2.64%	167B
IBM	+5.67%	160B
AUTOMATIC DATA PROCESSING, INC.	+8.48%	104B
CROWDSTRIKE HOLDINGS, INC.	+36.51%	83.55B
RELX PLC	+12.22%	82.75B
INFOSYS LIMITED	-4.57%	72.35B
SNOWFLAKE INC.	-20.28%	54.73B
HCL TECHNOLOGIES LIMITED	-7.68%	43.68B

1st Jan change

Capi.

FUJITSU LIMITED

+10.62%

27.28B

ACCENTURE PLC

-12.49%

193B

TATA CONSULTANCY SERVICES LTD.

+2.64%

167B

IBM

+5.67%

160B

AUTOMATIC DATA PROCESSING, INC.

+8.48%

104B

CROWDSTRIKE HOLDINGS, INC.

+36.51%

83.55B

RELX PLC

+12.22%

82.75B

INFOSYS LIMITED

-4.57%

72.35B

SNOWFLAKE INC.

-20.28%

54.73B

HCL TECHNOLOGIES LIMITED

-7.68%

43.68B

Market Closed - Japan Exchange Other stock markets 02:00:00 2024-05-23 EDT			5-day change	1st Jan Change
2,354 ^JPY	+1.38%		+0.90%	+10.62%

05-22	Post Office ex-CEO Paula Vennells breaks down at inquiry	RE
05-07	ServiceNow and Fujitsu Announce Strategic Commitment to Launch Innovative Cross-Industry Solutions	CI

Post Office ex-CEO Paula Vennells breaks down at inquiry	05-22	RE
ServiceNow and Fujitsu Announce Strategic Commitment to Launch Innovative Cross-Industry Solutions	05-07	CI
Fujitsu Initiates 180 Billion Yen Share Buyback Program	04-26	MT
Fujitsu’s Attributable Profit Jumps 18.3% in Fiscal 2024	04-25	MT
Transcript : Fujitsu Limited, 2024 Earnings Call, Apr 25, 2024	04-25
Fujitsu Limited announces an Equity Buyback for 150,000,000 shares, representing 8.16% for ¥180,000 million.	04-25	CI
Fujitsu Limited authorizes a Buyback Plan.	04-24	CI
OVHcloud appoints Deputy Managing Director	04-23	CF
Oracle: massive investment plan in Japan	04-18	CF
Oracle Partners With Fujitsu to Offer Cloud Services in Japan	04-18	MT
Japan's Nikkei climbs 1.5% as investors scoop up beaten-down stocks	04-07	RE
Fujitsu Ltd. to Reorganize European Units, Liquidate U.K.-Based Holding Company	03-28	DJ
Fujitsu Completes Share Buyback Program	03-27	MT
Tranche Update on Fujitsu Limited's Equity Buyback Plan announced on April 27, 2023.	03-26	CI
Fujitsu Limited's Equity Buyback announced on April 27, 2023, has closed with 4,513,000 shares, representing 2.39% for ¥103,025.65 million.	03-21	CI
Nvidia to Provide Platforms for Japan's ABCI-Q Supercomputer	03-19	MT
Third Point Says R2 Semiconductor Unit’s Valuation Amid Patent Disputes Will Not Impact NAV	03-19	MT
Wronged UK postmasters to have convictions quashed	03-13	AN
Britain to quash wrongful Post Office convictions	03-12	RE
Fujitsu Limited and Carnegie Mellon University Develop AI-Powered Social Digital Twin Technology with Traffic Data from Pittsburgh	03-06	CI
Fujitsu Repurchases 24.3 Billion Yen Worth of Shares in February	03-04	MT
Marvell Technology Partnering with Dell, Fujitsu, Wind River on Networking Applications	02-26	MT
Transcript : Fujitsu Limited - Special Call	02-21
Shares in Bytes Technology tumble after sudden exit of chief executive	02-21	AN
Fujitsu Modifies Company Split; Subsidiary Now to Have Lower Equity Capital	02-15	MT

Fujitsu Limited

Equities

6702

JP3818000006

IT Services & Consulting

Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"

Latest news about Fujitsu Limited

Chart Fujitsu Limited

Company Profile

Income Statement Evolution

Ratings for Fujitsu Limited

Analysts' Consensus

EPS Revisions

Quarterly earnings - Rate of surprise

Sector Other IT Services & Consulting