The machine-learning-performance beef between Intel and Nvidia has stepped up a notch, with the GPU giant calling out Chipzilla for spreading misleading benchmark results.
Intel is desperate to overtake Nvidia in the deep-learning stakes by claiming its 64-bit x86 chips are more capable than Nv’s at neural-network number-crunching tasks. This clash of the titans led to a war of words erupting last week when Intel boasted that its top-end two-socket Xeon Platinum 9282 data-center processor was faster than Nvidia’s Tesla V100 GPU at running ResNet-50, a popular convolutional neural network used at the heart of object-recognition and similar computer vision workloads.
The comparison, as we explained at the time, was rather apples versus pears. Now, Nvidia has punched back with a sarcastic blog post of its own.
“It’s not every day that one of the world’s leading tech companies highlights the benefits of your products,” Paresh Kharya, Nvidia’s director of product marketing, chuckled in his retaliatory piece. “Intel did just that last week, comparing the inference performance of two of their most expensive CPUs to NVIDIA GPUs.”
And here’s Intel’s Epyc response: Up-to 56-core, 4GHz 14nm second-gen Xeon SP chips, Agilex FPGAs, persistent mem
Intel claimed its high-end processors could churn through 7,878 images per second on the ResNet-50 model. So, yes, the two-socket Xeon is slightly faster than the 7,844 images per second scored by Nvidia’s Tesla V100, and the 4,944 images per second by its newer T4 chip. Bear in mind, however, that Intel is performing inference on these images at INT8 precision (8-bit integer), a lower precision than Nvidia’s mixed FP16 and FP32, meaning the Xeons are doing much less work.
It’s thus an unfair comparison, Nvidia argued. Intel also compared two of its hunky Xeon Platinum 9282 processors to just a single Tesla V100 and a single T4. Plus, we note, the 9282 has only just launched, and is not exactly in customers’ hands, whereas the V100 and T4 were launched in 2017 and 2018 respectively. Unavailable silicon versus available silicon.
Also if you look at other properties, such as energy efficiency and performance per processor, Nvidia’s Tesla V100 and T4s are better, Nvidia argued back. The two-socket Xeon Platinum 9282 pair crunched through 10 images per second per Watt, while the V100 came in at 22 images per second per Watt, and the T4 is even more efficient at 71 images per second per Watt.
Intel also scored lower for performance per processor at 3,939 images per second: Nvidia said it won here with its V100 at 7,844 images per second, and the T4 can manage 4,944 images per second.
They’re also a cheaper option too, if the estimates of the Xeon Platinum 9282’s price – up to $50,000 for a single chip – are correct. A standalone Tesla V100, on the other hand, can be purchased on Amazon for $5,999 and a T4 is going for even less at under $3,000 a pop.
It gets worse
Nvidia resumed bashing Intel by going a step further. ResNet-50 is small fry: it only has 25 million parameters. Why not use BERT, a language model with 340 million parameters, as a benchmark?
Google’s cunning AI linguist, Uncle Sam drills ML skills into .mil, Intel’s iffy CPU claims
“A measure of the complexity of AI models is the number of parameters they have,” Nvidia’s Kharya noted. “Parameters in an AI model are the variables that store information the model has learned. On an advanced model like BERT, a single NVIDIA T4 GPU is 56x faster than a dual-socket CPU server and 240x more power-efficient.”
Obviously, it’s all tit-for-tat marketing fluff. In Nvidia’s aforementioned BERT comparison, Kharya compared two of Intel’s Xeon Gold 6240 to its T4 GPU, rather than a Xeon Platinum 9282, to a T4 GPU: it’s not the best x86 processor for performing inference, and only has 18 CPU cores compared to the 56 on the 9282.
Here are the TDP numbers: the Xeon Gold 6420 musters 300 Watts to Nvidia’s 70 Watts for its T4. The Xeon Gold 6420 also has an energy efficiency of 0.007 sentences per Watt, compared to T4’s 1.7 sentences per Watt.
Nvidia continued with the low blows, and added one more comparison: a recommender system known as Neural collaborative filtering or NCF. Nvidia said its T4 GPU has 12x more performance and 24x higher energy than CPUs. But, again, take the numbers with a pinch of salt as it’s pitting a Xeon Gold 6140, an older processor model launched in 2017 and slightly less powerful than the Xeon Gold 4240 model, against a T4.
A single Xeon Gold 6410 has a processor TDP of 150 Watts, more than double the T4’s 70 Watts. Its performance efficiency is also lower at 19 samples per second per Watt, whereas the T4 clocks in at 397 samples per second per Watt.
“We want to let our customers and ecosystem know about the exceptional AI workload performance they can achieve on their general-purpose Intel Xeon CPU,” came Intel’s less than snappy retort to Nvidia’s goading, in an email to The Register on Tuesday night. “It’s exciting to us that a general-purpose architecture can be highly performant on so many workloads, including one as important as deep learning.” ®