Nvidia Taps Memory, Switch for AI
SAN JOSE, Calif. — At its annual GTC event, Nvidia announced system-level enhancements to boost the performance of its GPUs in training neural networks and a partnership with ARM to spread its technology into inference jobs.
Nvidia offered no details of its roadmap, presumably for 7-nm graphics processors in 2019 or later. It has some breathing room, given that AMD is just getting started in this space, Intel is not expected to ship its Nervana accelerator until next year, and Graphcore — a leading startup — has gone quiet. A few months ago, both Intel and Graphcore were expected to release production silicon this year.
The high-end Tesla V100 GPU from Nvidia is now available with 32-GBytes memory, twice the HBM2 stacks of DRAM that it supported when launched last May. In addition, the company announced NVSwitch, a 100-W chip made in a TSMC 12nm FinFET process. It sports 18 NVLink 2.0 ports that can link 16 GPUs to shared memory.
Nvidia became the first company to make the muscular training systems expected to draw 10 kW of power and deliver up to 2 petaflops of performance. Its DGX-2 will pack 12 NVSwitch chips and 16 GPUs in a 10U chassis that can support two Intel Xeon hosts, Infiniband, or Ethernet networks and up to 60 solid-state drives.
Cray, Hewlett Packard Enterprise, IBM, Lenovo, Supermicro, and Tyan said that they will start shipping systems with the 32-GB chips by June. Oracle plans to use the chip in a cloud service later in the year.
Claims of performance increases using the memory, interconnect, and software optimizations ranged widely. Nvidia said that it trained a FAIRSeq translation model in two days, an eight-fold increase from a test in September using eight GPUs with 16-GBytes memory each. Separately, SAP said that it eked out a 10% gain in image recognition using a ResNet-152 model.
Intel aims to leapfrog Nvidia next year with a production Nervana chip sporting 12 100-Gbit/s links compared to six 25-Gbit/s NVLinks on Nvidia’s Volta. The non-coherent memory of the Nervana chip will allow more flexibility in creating large clusters of accelerators, including torus networks, although it will be more difficult to program.
To ease the coding job, Intel has released as open source its Ngraph compiler. It aims to turn software from third-party AI frameworks like Google’s TensorFlow into code that can run on Intel’s Xeon, Nervana, and eventually FPGA chips.
The code, running on a prototype accelerator, is being fine-tuned by Intel and a handful of data center partners. The company aims to announce details of its plans at a developer conference in late May, though production chips are not expected until next year. At that point, Nvidia will be under pressure to field a next-generation part to keep pace with an Intel roadmap that calls for annual accelerator upgrades.
”The existing Nervana product will really be a software development vehicle. It was built on 28nm process before Intel bought the company and it's not competitive with Nvidia's 12nm Volta design,” said Kevin Krewell, a senior analyst with Tirias Research.
Volta’s added memory and NVSwitch “keeps Nvidia ahead of the competition. We're all looking forward to the next process shrink, but, as far as production shipping silicon goes, Volta still has no peer,” he added.
Among startups, Wave Computing is expected to ship this year its first training systems for data centers and developers. New players are still emerging.
Startup SambaNova Systems debuted last week with $56 million from investors, including Google’s parent Alphabet. Co-founder Kunle Olukotun’s last startup, Afara Websystems, designed what became the Niagara server processor of Sun Microsystems, now Oracle.