NVIDIAGB200 NVL72
GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution that boasts a 72-GPU NVLink domain that acts as a single massive GPU and delivers 30X faster real-time trillion-parameter LLM inference. Starting from $3.99/hr
Available at the most cost-effective pricing
Launch your AI products faster with on-demand GPUs and a global network of data center partners
Bare metal
Complete control over a physical machine providing absolute performance & control
from $3.99/hr
- No noisy neighbours
- SpectrumX local networking
- 300Gbps external connectivity
- NVMe SSD storage
Looking to scale? Please contact us for enterprise solutions.
Speak with an expertThe NVIDIA GB200 NVL72 is perfect for a wide range of workloads
Deploying AI based workloads on CUDO Compute is easy and cost-effective. Follow our AI related tutorials.
Specifications
Starting from | $3.99/hr |
Architecture | NVIDIA Blackwell |
Configuration | 36 Grace CPU : 72 Blackwell GPUs |
FP4 Tensor Core | 1,440 PFLOPS |
FP8/FP6 Tensor Core | 720 PFLOPS |
INT8 Tensor Core | 720 POPS |
FP16/BF16 Tensor Core | 360 PFLOPS |
TF32 Tensor Core | 180 PFLOPS |
FP32 | 6,480 TFLOPS |
FP64 | 3,240 TFLOPS |
FP64 Tensor Core | 3,240 TFLOPS |
GPU Memory | Bandwidth | Up to 13.5 TB HBM3e | 576 TB/s |
NVLink Bandwidth | 130TB/s |
CPU Core Count | 2,592 Arm® Neoverse V2 cores |
CPU Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s |
Use cases
Supercharging Next-Generation AI and Accelerated Computing
GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer Engine which enables FP4 AI and when coupled with fifth-generation NVIDIA NVLink, delivers 30X faster real-time LLM inference performance for trillion-parameter language models.
Energy-Efficient Infrastructure
Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large NVLink domain architectures.
Massive-Scale Training
GB200 NVL72 includes a faster second-generation Transformer Engine featuring FP8 precision, enabling a remarkable 4X faster training for large language models at scale.
Browse alternative GPU solutions for your workloads
Access a wide range of performant NVIDIA and AMD GPUs to accelerate your AI, ML & HPC workloads
An NVIDIA preferred partner for compute
We're proud to be an NVIDIA preferred partner for compute, offering the latest GPUs and high-performance computing solutions.
Also trusted by our other key partners:
Pricing & reservation enquiry
Enquire about access today to test the GB200 NVL72 GPU Cloud, or reserve your GB200 NVL72 Cloud on CUDO Compute for as long as you want it, with unique contracts tailored to suit your needs.
Get started today or speak with an expert...
Available Mon-Fri 9am-5pm UK time