NVIDIA’s new Blackwell structure has set unprecedented benchmarks within the newest MLPerf Inference v4.1, based on the NVIDIA Technical Weblog. The platform, launched at NVIDIA GTC 2024, includes a superchip based mostly on 208 billion transistors and employs the TSMC 4NP course of tailor-made for NVIDIA, making it the most important GPU ever constructed.
NVIDIA Blackwell Shines in MLPerf Inference Debut
In its inaugural spherical of MLPerf Inference submissions, NVIDIA’s Blackwell structure delivered exceptional outcomes on the Llama 2 70B LLM benchmark, attaining as much as 4x larger tokens per second per GPU in comparison with the earlier H100 GPU. This efficiency leap was facilitated by the brand new second-generation Transformer Engine, which leverages Blackwell Tensor Core know-how and TensorRT-LLM improvements.
In keeping with the MLPerf outcomes, Blackwell’s FP4 Transformer Engine managed to execute roughly 50% of the workload in FP4, reaching a delivered math throughput of 5.2 petaflops. The Blackwell-based submissions have been within the closed division, that means the fashions have been unmodified but met excessive accuracy requirements.
NVIDIA H200 Tensor Core GPU’s Excellent Efficiency
The NVIDIA H200 GPU, an improve to the Hopper structure, additionally delivered distinctive outcomes throughout all benchmarks. The H200, outfitted with HBM3e reminiscence, confirmed important enhancements in reminiscence capability and bandwidth, benefiting memory-sensitive functions.
For instance, the H200 achieved notable efficiency beneficial properties on the Llama 2 70B benchmark, with a 14% enchancment over the earlier spherical, purely by way of software program enhancements in TensorRT-LLM. Moreover, the H200’s efficiency surged by 12% when its thermal design energy (TDP) was elevated to 1,000 watts.
Jetson AGX Orin’s Big Leap in Edge AI
NVIDIA’s Jetson AGX Orin demonstrated spectacular efficiency enhancements in generative AI on the edge, attaining as much as 6.2x extra throughput and a pair of.4x higher latency on the GPT-J 6B parameter LLM benchmark. This was made doable by way of quite a few software program optimizations, together with using INT4 Activation-aware Weight Quantization (AWQ) and in-flight batching.
The Jetson AGX Orin platform is uniquely positioned to run complicated fashions like GPT-J, imaginative and prescient transformers, and Secure Diffusion on the edge, offering real-time, actionable insights from sensor information akin to pictures and movies.
Conclusion
In abstract, NVIDIA’s Blackwell structure has set new requirements in MLPerf Inference v4.1, attaining as much as 4x the efficiency of its predecessor, the H100. The H200 GPU continues to ship top-tier efficiency throughout a number of benchmarks, whereas Jetson AGX Orin showcases important developments in edge AI.
NVIDIA’s steady innovation throughout the know-how stack ensures it stays on the forefront of AI inference efficiency, from large-scale information facilities to low-power edge gadgets.
Picture supply: Shutterstock