Tenstorrent has unveiled its next-gen Wormhole processor for AI workloads that promises to deliver decent performance at a low price. The company currently offers two PCIe add-on cards carrying one or two Wormhole processors as well as TT-LoudBox and TT-QuietBox workstations aimed at software developers. Today’s entire release is aimed at developers rather than those who will deploy the Wormhole cards for their commercial workloads.
“It’s always rewarding to get more of our products into the hands of developers. Providing development systems with our Wormhole™ board helps developers scale and work on multi-chip AI software.” said Jim Keller, CEO of Tenstorrent.In addition to this launch, we are delighted that the release and power-up of our second generation, Blackhole, is going very well.“
Each Wormhole processor integrates 72 Tensix cores (including five RISC-V cores supporting different data formats) with 108 MB of SRAM to deliver 262 FP8 TFLOPS at 1 GHz with a thermal design power of 160 W. A single-chip Wormhole n150 card integrates 12 GB of GDDR6 memory with a bandwidth of 288 GB/s.
Wormhole processors offer flexible scalability to meet the needs of different workloads. In a standard workstation configuration with four Wormhole n300 cards, the processors can merge to operate as a single unit, appearing as a large, unified network of Tensix cores to the software. This configuration allows the accelerators to work on the same workload, be distributed among four developers, or run up to eight separate AI models simultaneously. A key feature of this scalability is that it operates natively without the need for virtualization. In data center environments, Wormhole processors will scale both inside a machine using PCIe or outside a single machine using Ethernet.
Performance-wise, Tenstorrent’s single-chip Wormhole n150 card (72 Tensix cores at 1GHz, 108MB SRAM, 12GB GDDR6 at 288GB/s) is capable of 262 FP8 TFLOPS at 160W, while the dual-chip Wormhole n300 card (128 Tensix cores at 1GHz, 192MB SRAM, 24GB GDDR6 aggregated at 576GB/s) can deliver up to 466 FP8 TFLOPS at 300W (depending on the processor). Tom’s gear).
To put that 466 TFLOPS FP8 figure at 300W into context, let’s compare it to what AI market leader Nvidia has to offer at this thermal design power. Nvidia’s A100 doesn’t support FP8, but it does support INT8 and its peak performance is 624 TOPS (1,248 TOPS sparingly). In contrast, Nvidia’s H100 does support FP8 and its peak performance is a whopping 1,670 TFLOPS (3,341 TFLOPS sparingly) at 300W, which is a big difference from Tenstorrent’s Wormhole n300.
There’s a big catch, though. Tenstorrent’s Wormhole n150 is listed at $999, while the n300 is available at $1,399. In contrast, an Nvidia H100 card can retail for $30,000, depending on quantities. Of course, we don’t know if four or eight Wormhole processors can actually deliver the performance of a single H300, though they will at 600W or 1,200W TDP, respectively.
In addition to the cards, Tenstorrent offers developers pre-built workstations with four n300 cards inside the cheaper Xeon-based TT-LoudBox with active cooling and a premium EPYC-powered TT-QuietBox with liquid cooling.
Sources: Tenstorrent, Tom’s gear