NVIDIA's Blackwell Platform Breaks New Records in MLPerf Inference v4.1

Joerg Hiller
Aug 29, 2024 07:18

NVIDIA’s Blackwell structure units new benchmarks in MLPerf Inference v4.1, showcasing important efficiency enhancements in LLM inference.

NVIDIA’s new Blackwell structure has set unprecedented benchmarks within the newest MLPerf Inference v4.1, based on the NVIDIA Technical Weblog. The platform, launched at NVIDIA GTC 2024, includes a superchip based mostly on 208 billion transistors and employs the TSMC 4NP course of tailor-made for NVIDIA, making it the most important GPU ever constructed.

NVIDIA Blackwell Shines in MLPerf Inference Debut

In its inaugural spherical of MLPerf Inference submissions, NVIDIA’s Blackwell structure delivered exceptional outcomes on the Llama 2 70B LLM benchmark, attaining as much as 4x larger tokens per second per GPU in comparison with the earlier H100 GPU. This efficiency leap was facilitated by the brand new second-generation Transformer Engine, which leverages Blackwell Tensor Core know-how and TensorRT-LLM improvements.

In keeping with the MLPerf outcomes, Blackwell’s FP4 Transformer Engine managed to execute roughly 50% of the workload in FP4, reaching a delivered math throughput of 5.2 petaflops. The Blackwell-based submissions have been within the closed division, that means the fashions have been unmodified but met excessive accuracy requirements.

NVIDIA H200 Tensor Core GPU’s Excellent Efficiency

The NVIDIA H200 GPU, an improve to the Hopper structure, additionally delivered distinctive outcomes throughout all benchmarks. The H200, outfitted with HBM3e reminiscence, confirmed important enhancements in reminiscence capability and bandwidth, benefiting memory-sensitive functions.

For instance, the H200 achieved notable efficiency beneficial properties on the Llama 2 70B benchmark, with a 14% enchancment over the earlier spherical, purely by way of software program enhancements in TensorRT-LLM. Moreover, the H200’s efficiency surged by 12% when its thermal design energy (TDP) was elevated to 1,000 watts.

Jetson AGX Orin’s Big Leap in Edge AI

NVIDIA’s Jetson AGX Orin demonstrated spectacular efficiency enhancements in generative AI on the edge, attaining as much as 6.2x extra throughput and a pair of.4x higher latency on the GPT-J 6B parameter LLM benchmark. This was made doable by way of quite a few software program optimizations, together with using INT4 Activation-aware Weight Quantization (AWQ) and in-flight batching.

The Jetson AGX Orin platform is uniquely positioned to run complicated fashions like GPT-J, imaginative and prescient transformers, and Secure Diffusion on the edge, offering real-time, actionable insights from sensor information akin to pictures and movies.

Conclusion

In abstract, NVIDIA’s Blackwell structure has set new requirements in MLPerf Inference v4.1, attaining as much as 4x the efficiency of its predecessor, the H100. The H200 GPU continues to ship top-tier efficiency throughout a number of benchmarks, whereas Jetson AGX Orin showcases important developments in edge AI.

NVIDIA’s steady innovation throughout the know-how stack ensures it stays on the forefront of AI inference efficiency, from large-scale information facilities to low-power edge gadgets.

Picture supply: Shutterstock

What's Hot

Grayscale XRP Trust Surges 11.44% One Week After Launch, Here’s The Catalyst

BlockDAG’s Testnet Launch & 30,000x ROI Potential; GALA & ApeCoin Updates

#retirment #Biden #podcast #rational #conservative

NVIDIA’s Blackwell Platform Breaks New Records in MLPerf Inference v4.1

FINAL FANTASY XVI Launches on GeForce NOW, Expanding Cloud Gaming Offerings

SLB and NVIDIA Team Up to Enhance Energy Sector with Generative AI

LangChain Unveils LangGraph Templates for Python and JS

AI Tool Uses Sound Waves to Detect and Repair Leaky Water Pipes

Key Market Design Insights for Web3 Builders from a16z Crypto

Tether (USDT) Invests $1.5 Million in Sorted Wallet to Boost Financial Inclusion

Grayscale XRP Trust Surges 11.44% One Week After Launch, Here’s The Catalyst

BlockDAG’s Testnet Launch & 30,000x ROI Potential; GALA & ApeCoin Updates

#retirment #Biden #podcast #rational #conservative

SEC won’t judge ‘merits’ of Trump’s DeFi project, but same regulatory issues await

New York man to pay $36 million for forex and crypto fraud

Content

Market Tools

COMPANY

Connect