To train large language models on Fugaku, the researchers developed distributed training methods, including porting the deep learning framework Megatron-DeepSpeed to Fugaku in order to optimize the performance of Transformers on Fugaku. They accelerated the dense matrix multiplication library for Transformers, and optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication library on the Tofu interconnect D.
Fugaku-LLM has 13 billion parameters (2) and is larger than the 7-billion-parameter models that have been developed widely in
Fugaku-LLM was trained on proprietary Japanese data collected by CyberAgent, along with English data, and other data. The source code of Fugaku-LLM is available on
In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.
Background
In recent years, the development of large language models (LLMs) has been active, especially in
Therefore,
Role of each institution/company
Fujitsu: Acceleration of computation and communication (acceleration of collective communication on Tofu interconnect D, performance optimization of pipeline parallelization) and implementation of pre-training and fine-tuning after training
RIKEN: Distributed parallelization and communication acceleration of large-scale language models (acceleration of collective communication on Tofu interconnect D)
CyberAgent: Provision of training data
Kotoba Technologies: Porting of deep learning framework to Fugaku
GPUs (7) are the common choice of hardware for training large language models. However, there is a global shortage of GPUs due to the large investment from many countries to train LLMs. Under such circumstances, it is important to show that large language models can be trained using Fugaku, which uses CPUs instead of GPUs. The CPUs used in Fugaku are Japanese CPUs manufactured by Fujitsu, and play an important role in terms of revitalizing Japanese semiconductor technology.
By extracting the full potential of Fugaku, this study succeeded in increasing the computation speed of the matrix multiplication by a factor of 6, and the communication speed by a factor of 3. To maximize the distributed training performance on Fugaku, the deep learning framework Megatron-DeepSpeed was ported to Fugaku, and the dense matrix multiplication library was accelerated for Transformer. For communication acceleration, the researchers optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication on the Tofu interconnect D. The knowledge gained from these efforts can be utilized in the design of the next-generation computing infrastructure after Fugaku and will greatly enhance
2. An easy-to-use, open, and secure, large language model with 13 billion parameters
In 2023, many large language models were developed by Japanese companies, but most of them have less than 7 billion parameters. Since the performance of large-scale language models generally improves as the number of parameters increases, the 13-billion-parameter model the research team developed is likely to be more powerful than other Japanese models. Although larger models have been developed outside of
In addition, most models developed by Japanese companies employ continual learning (8), in which open models developed outside of
Fugaku-LLM was trained on 380 billion tokens using 13,824 nodes of Fugaku, with about 60% of the training data being Japanese, combined with English, mathematics, and code. Compared to models that continually train on Japanese, Fugaku-LLM learned much of its information in Japanese. Fugaku-LLM is the best model among open models that are produced in
The results from this research are being made public through
In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications, such as the linkage of scientific simulation and generative AI, and social simulation of virtual communities with thousands of AIs.
Acknowledgement
This research was supported by the Fugaku policy-supporting proposal "Development of Distributed Parallel Training for Large Language Models Using Fugaku" (proposal number: hp230254).
[1] Large language model :Models the probability with which text appears and can predict the text (response) that follows a given context (query).
[2] Parameter :A measure of the size of a neural network. The more parameters, the higher the performance of the model, but the more data is required for training.
[3] Japanese MT-Bench :Benchmark test provided by Stability AI
[4]
[5] Hugging Face :Platforms used to publish AI datasets
[6] ChatGPT :A large language model developed by
[7 ]GPU :Originally produced as an accelerator for graphics, but has recently been used to accelerate deep learning
[8] Continual learning :A method for performing additional training on a large language model that has already been trained. Used for training language models in different languages or domains.
About Fujitsu
Fujitsu's purpose is to make the world more sustainable by building trust in society through innovation. As the digital transformation partner of choice for customers in over 100 countries, our 124,000 employees work to resolve some of the greatest challenges facing humanity. Our range of services and solutions draw on five key technologies: Computing, Networks, AI, Data & Security, and Converging Technologies, which we bring together to deliver sustainability transformation.
Press Contacts
Public and Investor Relations Division
Inquiries (https://bit.ly/3rrQ4mB)
Copyright 2024 JCN Newswire . All rights reserved.
© Japan Corporate News, source