SuperOffload Unveiled: Revolutionizing Large Language Model Training on NVIDIA's Grace Hopper Superchips

SuperOffload unlocks the full potential of NVIDIA's Superchips. It's a game-changer for training larger, more complex language models.

, and Administrator

2025 October 8 . 1:05 AM

1 min read

This picture contains a box which is in red, orange and blue color. On the top of the box, we see a... — This picture contains a box which is in red, orange and blue color. On the top of the box, we see a robot and text written as "AUTOBOT TRACKS". In the background, it is black in color and it is blurred.

SuperOffload Unveiled: Revolutionizing Large Language Model Training on NVIDIA's Grace Hopper Superchips

Researchers have unveiled SuperOffload, a groundbreaking system that significantly enhances the training of large language models (LLMs) on NVIDIA's Grace Hopper Superchips. This innovation, developed by Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase, and Minjia Zhang, represents a substantial stride towards more efficient and powerful AI development.

SuperOffload unlocks the full potential of Superchips, processors integrating GPUs and CPUs on a single package. It achieves up to a 2.5x throughput improvement compared to state-of-the-art offloading systems, thanks to its adaptive weight offloading and optimised Adam optimizer.

The system enables the training of significantly larger models. It can handle a 25 billion parameter model on a single Superchip, surpassing the capacity of GPU-only solutions by a factor of seven. Furthermore, with ZeRO-style data parallelism, SuperOffload allows for training a 50 billion parameter model using only four Superchips, a 2.5x increase over existing parallel training methods.

SuperOffload-Ulysses supports long-sequence training, achieving 55% multi-factor utilisation while training a 13 billion parameter model with sequences up to one million tokens on eight GH200 Superchips. This capability opens up new possibilities for training complex, context-rich models.

SuperOffload, developed by a team including Yao Meng, Jianyu Wu, Xutao Lv, Hongzhe Liu, and Xiaoguang Zhao, optimises LLM training on Superchips by efficiently utilising the combined resources of Hopper GPUs, Grace CPUs, and NVLink-C2C interconnects. Its evaluation alongside established methods has demonstrated its potential as a superior approach for large-scale training. This innovation paves the way for more efficient and powerful AI development, enabling the training of larger, more complex models.

Latest

In this picture I can see there is a ship sailing on the water. There are iron frames in the...

Industry

US Court Clears Way for Ørsted's Rhode Island Wind Farm

Ørsted's US expansion gets a boost. The Danish energy giant can now push ahead with its Rhode Island wind farm, despite earlier setbacks.

, and Administrator

2025 October 9

This image is clicked inside a room. There are tables, on the tables there are computers and there...

Retail

Target Launches First Accessible Self-Checkout Tech for Visually Impaired

Target's new self-checkout system offers audio prompts and braille labels, empowering visually impaired shoppers. It's a significant step towards inclusive retail.

, and Administrator

2025 October 9

In this image it looks like it is a mart. In the middle there is an entrance. Beside the entrance...

Smart-home-devices

Braun's Series 9 Pro+ Tops Electric Shaver Tests, Now Discounted on Amazon Prime Day

Experience up to six weeks of battery life and quick charging with this top-scoring shaver. Don't miss out on the Prime Day deal.

, and Administrator

2025 October 9

In the picture we can see a car engine with pipes, battery in it.

Industry

Eos Energy Kicks Off Commercial Production, Eyes 2 GWh by 2025

Eos Energy's first manufacturing line is now operational. With a focus on AI data centers and ambitious growth plans, investors are taking note.

, and Administrator

2025 October 9

SuperOffload Unveiled: Revolutionizing Large Language Model Training on NVIDIA's Grace Hopper Superchips

SuperOffload Unveiled: Revolutionizing Large Language Model Training on NVIDIA's Grace Hopper Superchips

Read also:

Related

Latest