NVIDIA Posts Large AI Numbers In MLPerf Inference v3.1 Benchmarks With Hopper H100, GH200 Superchips & L4 GPUs

NVIDIA has launched its official MLPerf Inference v3.1 efficiency benchmarks working on the world’s quickest AI GPUs akin to Hopper H100, GH200 & L4.
NVIDIA Dominates The AI Panorama With Hopper & Ada Lovelace GPUs, Robust Efficiency Showcased In MLPerf v3.1
As we speak, NVIDIA is releasing its first efficiency benchmarks throughout the MLPerf Inference v3.1 benchmark suite which covers a variety of industry-standard benchmarks for AI use circumstances. These workloads vary from Recommender, Pure Language Processing, Giant Language Mannequin, Speech Recognition, Picture Classification, Medical Imaging, and Object Detection.
The 2 new units of benchmarks embody DLRM-DCNv2 and GPT-J 6B. The primary is a bigger multi-hot dataset illustration of actual recommenders which makes use of a brand new cross-layer algorithm to ship higher suggestions and has twice the parameter rely versus the earlier model. GPT-J on the opposite different is a small-scale LLM that has a base mannequin that is open supply and was launched in 2021. This workload is designed for summarization duties.

NVIDIA additionally showcases a conceptual real-life workload pipeline of an software that makes use of a variety of AI fashions to realize a required question or process. The entire fashions can be obtainable on the NGC platform.
By way of efficiency benchmarks, the NVIDIA H100 was examined throughout your entire MLPerf v3.1 Inference set (Offline) in opposition to opponents from Intel (HabanaLabs), Qualcomm (Cloud AI 100) and Google (TPUv5e). NVIDIA delivered management efficiency throughout all workloads.

To make issues a little bit extra attention-grabbing, the corporate states that these benchmarks have been achieved a few month in the past since MLPerf requires not less than 1 month between the submission time for the ultimate outcomes to be revealed. Since then, NVIDIA has provide you with a brand new know-how often known as TensorRT-LLM which additional boosts efficiency by as much as 8x as we detailed right here. We will anticipate NVIDIA to submit the MLPerf benchmarks with TensorRT-LLM quickly too.
However coming again to the benchmarks, NVIDIA’s GH200 Grace Hopper Superchip additionally made its first submission on MLPerf, yielding a 17% enchancment over the H100 GPU. This efficiency acquire is especially coming from greater VRAM capacities (96 GB HBM3 vs. 80 GB HBM3) and 4TB/s bandwidth.

The Hopper GH200 GPU makes use of the identical core configuration because the H100 however one key space that is helping within the boosted efficiency is the automated energy steering between the Grace CPU and the Hopper GPU. For the reason that Superchip platform consists of energy supply for each the CPU and GPU on the identical board, clients can primarily swap the ability from the CPU to the GPU and vice versa in any specific workload. This additional juice on the GPU could make the chip clock quicker and run quicker. NVIDIA additionally talked about that the Superchip right here was working the 1000W configuration.
In its debut on the MLPerf {industry} benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all information heart inference assessments, extending the main efficiency of NVIDIA H100 Tensor Core GPUs. The general outcomes confirmed the distinctive efficiency and flexibility of the NVIDIA AI platform from the cloud to the community’s edge.
The GH200 hyperlinks a Hopper GPU with a Grace CPU in a single superchip. The mixture gives extra reminiscence, bandwidth and a capability to mechanically shift energy between the CPU and GPU to optimize efficiency. Individually, H100 programs that pack eight H100 GPUs delivered the best throughput on each MLPerf inference check on this spherical.
Grace Hopper Superchips and H100 GPUs led throughout all MLPerf’s information heart assessments, together with inference for laptop imaginative and prescient, speech recognition and medical imaging, along with the extra demanding use circumstances of advice programs and the big language fashions (LLMs) utilized in generative AI. General, the outcomes proceed NVIDIA’s document of demonstrating efficiency management in AI coaching and inference in each spherical because the launch of the MLPerf benchmarks in 2018.
by way of NVIDIA
The NVIDIA L4 GPU which relies on the Ada Lovelace GPU structure additionally made a powerful entry in MLPerf v3.1. It was not solely in a position to run all workloads however did so very effectively, working as much as 6x quicker than trendy x86 CPUs (Intel 8380 Twin-Socket) at a 72W TDP in an FHFL kind issue. The L4 GPU additionally supplied a 120x enhance in Video/AI duties akin to Decoding, Inferencing, Encoding. Lastly, the NVIDIA Jetson Orion received an as much as 84% efficiency increase due to software program updates & exhibits NVIDIA’s dedication to bettering the software program stack to the subsequent stage.