GPU

NVIDIA L40

Edit@6 months ago

Intergrated Memory(VRAM)
Capacity

48 GB

(GDDR6 384-bit)

Bandwidth

864 GB/s

123 Token/s

Vector Compute
FP64
1.41 T
FP32
90.52 T
FP16
90.52 T
BF16
INT32
INT8
X

NVIDIA L40 General-Purpose Float-Point performance (Vector Performance / Scalar Performance)

FP64: 1.41 TFLOPS

FP32: 90.52 TFLOPS

FP16: 90.52 TFLOPS

Matirx Compute
FP64
X
FP32
X
FP16
181.03 T
362.07 T
FP8
362.07 T
724.13 T
TF32
90.52 T
181.03 T
BF16
181.03 T
362.07 T
INT16
X
INT8
362.07 T
724.13 T
INT4
724.13 T
1448.26 T

NVIDIA L40 AI performance (Tensor Performance / Matrix Performance)

FP16: 181.03 TFLOPS, with sparsity: 362.07 TFLOPS

FP8: 362.07 TFLOPS, with sparsity: 724.13 TFLOPS

TF32: 90.52 TFLOPS, with sparsity: 181.03 TFLOPS

BF16: 181.03 TFLOPS, with sparsity: 362.07 TFLOPS

INT8: 362.07 TOPS, with sparsity: 724.13 TOPS

INT4: 724.13 TOPS, with sparsity: 1448.26 TOPS

Hardware Specs
NVIDIA L40 is a 5nm chip, has 76300 million transistors, launched by NVIDIA at 2022. It has 48 GB built-in(On-Board/On-Chip) memory with bandwidth up to 864 GB/s. It has 18176 general-purpose ALUs(CUDA cores/Shader cores) and 568 matrix cores(Tensor cores) .
Process Node
5 nm
Launch Year
2022

Vector(CUDA) Cores
18176
Matrix(Tensor) Cores
568
Core Frequency
735 ~ 2490 MHz
Cache
96MB