Tenstorrent in the Real World: Benchmarks, Customers, and the Inference Bet That's Starting to Pay Off

Tenstorrent in the Real World: Benchmarks, Customers, and the Inference Bet That's Starting to Pay Off

Across five pieces, we have built up a picture: an inference market growing from USD 106 billion in 2025 to a projected USD 255 billion by 2030; a NVIDIA-dominated landscape with CUDA lock-in and pricing pressure; an AMD that has caught up on silicon but still trails on software; and a Tenstorrent that is betting on architectural divergence to break the GPU paradigm for inference. The question for this final piece is the only one buyers actually care about: is it working? What do the published benchmarks say, who is buying or licensing, and what is the real-world impact on user-facing inference workloads? This is the closing Part 6 of the AI Inference Hardware series.

Yashwanth

Introduction

Across five pieces, we have built up a picture:

  • An inference market growing from USD 106 billion in 2025 to a projected USD 255 billion by 2030.
  • A NVIDIA-dominated landscape with CUDA lock-in and pricing pressure.
  • An AMD that has caught up on silicon but still trails on software.
  • A Tenstorrent that is betting on architectural divergence - open RISC-V, open software, mesh MIMD instead of SIMT, Ethernet scale-out instead of NVSwitch, GDDR6 + distributed SRAM instead of HBM - to break the GPU paradigm for inference.

The question for this final piece is the only one buyers actually care about:

Is it working?

What do the published benchmarks say? Who is buying or licensing? What is the real-world impact on user-facing inference workloads?

The Benchmark Scorecard

Let's be honest about what the data shows. As of mid-2026, independently verified third-party benchmark coverage of Tenstorrent hardware is sparse - much sparser than NVIDIA's MLPerf submissions or AMD's published numbers. Most published Tenstorrent benchmarks come from one of three sources:

  • Tenstorrent's own marketing and engineering blog posts
  • Academic papers with microbenchmarks
  • Independent technology blogs (Spheron, Moor Insights & Strategy, EE Times)

Within those constraints, headline numbers worth taking seriously:

1. Galaxy Blackhole on DeepSeek R1 671B

Demonstrated at TT-Deploy in 2026:

  • 350+ tokens per second per user across 16 Galaxy units (512 Blackhole chips total)
  • Batch size 32, ~4-second time-to-first-token on 100K context
  • Prefill and decode running on the same hardware
  • Cost claim: USD 6 per million tokens, vs implied USD 30/M on NVIDIA → 5x TCO advantage

Architectural fit story matters: DeepSeek R1 is an MoE model, and the mesh MIMD architecture is well-suited to expert routing without the warp divergence cost a SIMT GPU pays.

2. Wormhole Galaxy on Llama 70B

  • ~4,000-5,000 tokens/sec at batch 32 on TT-Metal.
  • 8x H100 SXM5 node running vLLM: 2,500-3,500 tok/s at the same batch size.
  • Caveat: Wormhole numbers from controlled single-model benchmark runs without a production serving layer; H100 numbers include vLLM's PagedAttention, continuous batching, queue management, routing overhead.

3. TTS Workload (Academic)

  • Published academic benchmark on "Lightning V2 on Tenstorrent."
  • 4x lower cost per inference vs NVIDIA L40S.
  • Attributable to distributed dataflow architecture, distributed on-chip SRAM, 1:1 thread-to-core mapping.

4. Single-Chip Headroom

  • Blackhole at 745 TFLOPS FP8 sits roughly in the A100 class on raw compute.
  • Tenstorrent's lead: cost per FLOP at rack scale.
  • Blackhole boards at USD 999 (p100) - USD 1,399 (p150) vs USD 25,000-30,000 per H100 SXM, with comparable order-of-magnitude FLOP throughput per dollar at the silicon level.

5. Galaxy Blackhole Rack Specification

  • USD 110,000 base configuration
  • 23 PFLOPS of FP8 compute
  • 6.2 GB on-chip SRAM at 2.9 PB/s aggregate
  • 1 TB DRAM at 16 TB/s aggregate
  • 56 x 800G Ethernet ports for 11.2 TB/s scale-out
  • 4-Galaxy supercluster starts at USD 440,000

For comparison: NVIDIA GB200 NVL72 system carries a list price reportedly north of USD 3 million (though direct apples-to-apples FLOPS comparison is complicated by precision differences).

6. ASPLOS Microbenchmarks (2025)

  • Single Tensix core sustained nearly its theoretical 32-element-per-cycle throughput.
  • Mandelbrot parallel test on all Tensix cores: 22.4x speedup over a single-core CPU.
  • Cache-free architecture did not hinder single-core performance for regular memory access patterns.

The pattern across these numbers is consistent: Tenstorrent is not the fastest chip per accelerator on most workloads, but it is genuinely competitive on cost per token and energy per token for workloads where its architectural fit is good - large MoE models, long-context inference, and data-movement-dominated patterns.

Who Is Actually Buying

The Tenstorrent customer roster has grown noticeably in 2025 and 2026, and falls into three buckets.

Bucket 1: IP Licensees Building Their Own Silicon

This is the largest source of Tenstorrent's revenue today.

1.1 LG Electronics

  • Licensed both the Tensix AI core IP and the Ascalon CPU IP.
  • Initial deal: smart TV chiplets.
  • 2024 expanded partnership: system-on-chips across LG's product line, including automotive and on-device AI products.
  • LG CEO William Cho: "Tenstorrent is bringing the industry's best AI and RISC-V technology to this collaboration."
  • LG also participated in Tenstorrent's Series D round.

1.2 Hyundai Motor Group

  • Invested in Tenstorrent.
  • Committed to using its designs in future Hyundai, Kia, and Genesis vehicles.
  • Hyundai Mobis elected Tenstorrent COO Keith Witek to its board - first time the Korean supplier appointed someone from the AI semiconductor industry as a non-standing director.
  • Strategic logic: future vehicles will run extensive on-device AI for autonomous driving, in-cabin experience, robotics; Hyundai wants control over that silicon roadmap.

1.3 Japan's LSTC

  • Leading-edge Semiconductor Technology Center, backed by the Japanese government and partnered with Rapidus on 2nm manufacturing.
  • Selected Tenstorrent's RISC-V and chiplet designs for an AI accelerator project.
  • Strategically significant: Japan's compute sovereignty program is one of the most concrete national efforts to build an alternative to NVIDIA.

1.4 Others

  • SingularityNet - Swiss AI consortium.
  • UnsungFields - Japan-focused partnership.

Bucket 2: Sovereign AI and Government-Aligned Compute

2.1 Tenstorrent x Infinia (UAE)

  • Formalized at Abu Dhabi Finance Week 2025.
  • Sovereign AI systems in the GCC region.
  • Positions Tenstorrent silicon as the foundation for compute infrastructure the UAE wants to operate independently of US hyperscaler clouds.

2.2 CHASSIS Program

  • Research initiative on chiplet-based systems.

2.3 Cyprus

  • Ongoing arrangements targeting sovereign compute.

Common argument: the EU AI Act and several national AI programs explicitly require auditable, open-source compute stacks for certain workload categories. Tenstorrent's architecture passes that requirement in ways NVIDIA's proprietary stack cannot.

Bucket 3: Developers and Edge

Smallest revenue bucket today but strategically significant for ecosystem development.

  • Blackhole p100 at USD 999
  • Blackhole p150 at USD 1,399
  • TT-QuietBox developer workstation at USD 11,999
  • Razer partnership (CES 2026) - Thunderbolt-attached compact AI accelerator for laptops

Strategic bet: a developer who starts on a USD 999 Blackhole card and contributes patches to TT-Forge or TT-Metal becomes a proof point for enterprise procurement evaluating the platform.

Where the Impact Actually Lands on User-Facing Workloads

Three workload patterns are increasingly common and where Tenstorrent's architectural fit is good.

Pattern 1: Long-Context Reasoning Models in Production

  • DeepSeek R1, GPT-OSS 120B, Llama 3 70B, and the broader reasoning-model category all consume increasing amounts of context.
  • Agentic systems often pass tens or hundreds of thousands of tokens of prompt history.
  • The KV-cache for these workloads at production concurrency dominates VRAM use.
  • Tenstorrent's distributed SRAM and mesh-pooled DRAM are well-suited.
  • The 350+ tokens/sec/user on DeepSeek R1 demonstration backs the claim.

For a production team running a reasoning agent that costs USD 30/M tokens on NVIDIA, a 5x TCO advantage would mean cutting the inference bill from USD 300,000/month to USD 60,000/month at constant traffic - money that can fund the engineering investment required to operationalize the platform.

Pattern 2: Mixture-of-Experts Inference at Scale

  • MoE models (DeepSeek V4-class, Mixtral derivatives) are increasingly the default for frontier inference.
  • Better quality-per-token, but workloads where SIMT GPUs leave performance on the table due to expert-routing divergence.
  • The MIMD architecture of Tensix cores, with each core executing its own instruction stream, maps onto MoE routing naturally.
  • As MoE becomes the dominant inference pattern at the frontier, the architectural fit story grows stronger.

Pattern 3: Edge and On-Device Inference

This is the LG and Hyundai play:

  • Smart TVs running on-device LLMs for content understanding, voice assistants, personalization.
  • Vehicles running multimodal models for in-cabin experience and ADAS.
  • Tenstorrent's RISC-V Tensix IP is the only commercially licensable accelerator IP at this scale that lets a system integrator design its own SoC with a high-performance AI block - without paying NVIDIA's pricing or living inside NVIDIA's software constraints.

The ARM-style IP licensing model is uniquely well-fitted to the edge.

How This Changes the Buyer's Calculus

1. Standard Workloads at Moderate Scale

  • Right answer in 2026 remains NVIDIA H100 or H200, served with vLLM or TensorRT-LLM, on a neo-cloud at USD 2.10-2.60/GPU-hour.
  • Ecosystem maturity gap is real - paying the NVIDIA premium is paying for risk reduction.

2. Very High Volume (500M+ tokens/day)

  • With hardware procurement authority and an in-house optimization team, the calculus shifts.
  • Tenstorrent's Galaxy Blackhole at USD 110,000/rack with claimed 5x TCO advantage on DeepSeek-class workloads becomes worth a serious POC investment.
  • Engineering cost of writing kernels in TT-Metalium or extending TT-Forge is real but bounded.
  • Inference cost savings at scale: millions of dollars annually.

3. Sovereign AI Buyers

  • Government compute programs, regulated industries with data residency requirements, hyperscalers building custom silicon.
  • Tenstorrent occupies a defensible position no other player matches at this maturity level.
  • Combination of open RISC-V ISA + fully open-source software stack + multi-foundry chip supply (Samsung today, 2nm talks underway) + IP licensing model gives structural control they cannot get from NVIDIA or AMD.

4. System Integrators

  • Building AI-enabled products - automotive, consumer electronics, robotics, edge appliances.
  • Tenstorrent's IP licensing offer is the closest thing to ARM-for-AI that exists today.
  • LG, Hyundai, and Japan-LSTC deals show this is real revenue and an expanding wedge.

What to Watch For Next

Three things will determine whether Tenstorrent's bet pays off over the next 18 months.

1. Software Ecosystem Maturity

  • A production-grade equivalent to vLLM - with PagedAttention, continuous batching, OpenAI-compatible API - running natively on Tenstorrent hardware would be a significant unlock.
  • TT-Forge is improving rapidly (800+ models tested in CI).
  • Developer hub and bounty programs are funding community contributions.
  • Gap to the NVIDIA serving stack is the single biggest practical barrier.

2. Independent Benchmark Coverage

  • MLPerf inference submissions on Tenstorrent Galaxy Blackhole would matter enormously.
  • Today, most published numbers come from Tenstorrent's own benchmarking.
  • Third-party validation against vLLM-on-NVIDIA at production-realistic SLAs would either confirm the cost advantage story or expose where it falls short.

3. Cloud Availability

  • Tenstorrent Galaxy hardware is not yet on public cloud marketplaces as of mid-2026.
  • Path to broad developer adoption runs through cloud providers.
  • Neo-clouds offering Wormhole or Blackhole instances at per-hour pricing would dramatically lower friction of evaluation.
  • The TT-Deploy initiative announced in 2026 points in this direction.

Conclusion

The architectural thesis Tenstorrent is testing - that the post-GPU era of AI compute is mesh MIMD on open RISC-V with open software - is not yet proven. NVIDIA's continued execution on Blackwell and Rubin, AMD's MI355X and MI400 roadmap, and the maturation of the CUDA and ROCm ecosystems all argue that the GPU paradigm has years of headroom left.

But Tenstorrent's bet is increasingly defensible:

  • The architecture is well-fitted to where inference workloads are heading.
  • The open stack is uniquely positioned for sovereign and regulated procurement.
  • The IP licensing business is generating real revenue.
  • The chip-and-IP hybrid model is genuinely differentiated.

As of mid-2026, the company is no longer "interesting in theory." It is a serious second-tier player with named enterprise IP customers, a tangible production benchmark on DeepSeek R1, and a strategic position in the sovereign AI conversation that NVIDIA cannot easily occupy.

For buyers willing to look past the current ecosystem gap, that combination matters. For everyone else, it is worth watching closely - because if the cost-per-token claims hold up under independent scrutiny, the inference market will look meaningfully different in 2027 than it does today.

Sources: Tenstorrent TT-Deploy newsroom post (2026), wccftech coverage of the Galaxy Blackhole launch, Sacra company analysis (April 2026), Moor Insights & Strategy Tenstorrent inference analyst note (June 2026), Spheron Tenstorrent vs NVIDIA comparison (April 2026), Tom's Hardware coverage of Blackhole product launches and the p150 firmware revision, The Logic profile on Tenstorrent's sovereign AI customer strategy, KED Global and Korea Economic Daily coverage of LG and Hyundai partnerships, DCD coverage of the Hyundai-Kia-Samsung funding round, Tekedia coverage of the Series D round, EE Times interviews with Jim Keller on edge IP strategy, "Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent" (academic paper), ASPLOS 2025 Tenstorrent Blackhole microbenchmarking paper, Tenstorrent official documentation and GitHub repositories.

GPU NET


Our Official Channels:

Website | Twitter | Telegram | Discord

More Stories

Arrow leftArrow left
Try our Planetary Grid of Compute Now!