Tenstorrent: The Jim Keller Bet to Break NVIDIA's Inference Monopoly

Tenstorrent: The Jim Keller Bet to Break NVIDIA's Inference Monopoly

If you are going to challenge NVIDIA, the first question is: what could you possibly be doing differently? CUDA has 18 years of compounding investment. The H100 to B200 to Rubin roadmap is on track. AMD is the closest thing to a credible second source and is still climbing. Tenstorrent's answer is not 'do GPUs better.' It is 'GPUs are the wrong architecture for what's coming, the software is closed, and the moat is built on a moat - let's pull every leg out at once.' This is Part 3 of the AI Inference Hardware series, walking through the Jim Keller pedigree, the four-part architectural thesis, the chip-and-IP hybrid business model, and the sovereign AI play.

Yashwanth

Introduction

If you are going to challenge NVIDIA, the first question is: what could you possibly be doing differently?

  • CUDA has 18 years of compounding investment.
  • The H100 → B200 → Rubin roadmap is on track.
  • AMD is the closest thing to a credible second source and is still climbing.
  • Cerebras, Groq, SambaNova, Graphcore - every one of these companies has tried, and each has carved out a niche without dislodging the incumbent.

Tenstorrent's answer is not "do GPUs better." It is:

GPUs are the wrong architecture for what's coming, the software is closed, and the moat is built on a moat - let's pull every leg out at once.

That answer has attracted USD 700M+ in funding at a roughly USD 2.6B valuation, with backers including Jeff Bezos, Bezos Expeditions, Samsung Catalyst Fund, Hyundai Motor Group, Fidelity, and Eclipse Ventures. It has also attracted a roster of engineering talent that, if you squint, looks like a reunion tour for the best silicon designers of the last twenty years.

The Jim Keller Pedigree

Jim Keller is one of the most consequential CPU architects of the modern era. His track record reads like an industry index:

  • AMD K8 Athlon
  • Apple A4 and A5 chips
  • AMD's Zen architecture that revived the company
  • Tesla's original FSD chip
  • A stint at Intel shaping their chip strategy

He joined Tenstorrent as CTO in 2020 and became CEO not long after, anchoring the company's strategic direction around RISC-V and open architecture.

The company itself was founded in 2016 by Ljubisa Bajic, Ivan Hamer, and Milos Trajkovic. Bajic - a former NVIDIA senior architect and AMD IC designer - was clear from the start about the thesis. As he told Design News in an early interview:

"GPUs are essentially at the end of their evolutionary curve. They've done a great job; they've pushed the field to the point where it is now. But in order to make any kind of order of magnitude type jumps, GPUs are going to have to go."

That framing has been the throughline. Tenstorrent is not trying to win by building a faster GPU. It is trying to win by arguing that the GPU paradigm is wrong for the shape of AI workloads coming next, especially Mixture-of-Experts models, agentic inference, and long-context reasoning.

Other Key People

  • Wei-Han Lien - Chief architect of Tenstorrent's Ascalon RISC-V CPU IP. Worked at NexGen → AMD → PA Semi → Apple (contributed to A6, A7, arguably M1).
  • Raja Koduri - Formerly head of Intel's discrete GPU effort, earlier lead at AMD Radeon. Joined Tenstorrent's board.

The Four-Part Thesis

What Tenstorrent is actually betting on can be summarized as four interlocking architectural choices, each of which contrasts deliberately with NVIDIA and AMD.

1. RISC-V Instead of Proprietary ISAs

  • RISC-V is an open instruction set architecture maintained at UC Berkeley, royalty-free, with no single national or corporate owner.
  • Tenstorrent uses RISC-V for everything - the "baby" cores inside each Tensix compute tile, the larger Ascalon application cores, and the SiFive-licensed general-purpose cores in earlier designs.
  • Strategic argument: an open ISA is a precondition for the rest of the stack being open.
  • Practical argument: RISC-V's modular design enables architectural customization (five RISC-V processors per Tensix core, each handling a distinct pipeline stage) in ways x86 or proprietary architectures simply cannot.

2. Tensix Cores Instead of SIMT GPU Streaming Multiprocessors

A Tensix core is fundamentally different from a CUDA SM. Each contains:

  • 5 small RISC-V cores for control and instruction dispatch
  • Dedicated matrix engine (FPU) and vector engine (SFPU)
  • Pack and unpack units
  • 1.5 MB of local SRAM

Key differences from GPUs:

  • No SIMT execution model.
  • No warp scheduling.
  • No hardware multithreading hiding latency behind context switching.
  • Each Tensix core executes its own instruction stream cooperatively, with explicit data movement between tiles over a mesh network-on-chip.
  • Deliberate absence of cache hierarchies provides deterministic, consistent memory access patterns.

The trade-off: harder to program than CUDA, but much more transparent and predictable.

3. Mesh-Based Scale-Out via Ethernet, Not Proprietary Fabric

  • NVIDIA's NVLink and InfiniBand: proprietary, expensive, switch-heavy interconnects.
  • AMD's Infinity Fabric: same shape.
  • Tenstorrent integrates 400 Gbps Ethernet (Wormhole) or 800G QSFP-DD (Blackhole) directly onto each chip.
  • A processor scales out by connecting to another over a passive QSFP-DD cable. No switches.
  • The on-chip mesh and the off-chip Ethernet fabric form a single logical network - a cluster of chips can be programmed as one large mesh of Tensix cores.
  • Company calls this TT-Fabric and presents it as a 10x TCO advantage for AI data center design.

4. Open-Source Software, Top to Bottom

The full stack is Apache 2.0 licensed on GitHub:

  • TT-Metalium - Low-level SDK; OpenCL-like C++ interface giving direct access to RISC-V cores, NoC, matrix and vector engines.
  • TT-NN - Higher-level operator library with a PyTorch-like Python API.
  • TT-Forge - MLIR-based compiler that ingests PyTorch, JAX, or ONNX and lowers through TT-NN and TT-Metalium to the hardware.

No encrypted APIs. No black boxes. The same compiler an engineer at Tenstorrent uses to optimize Llama 3 70B is the compiler a developer at any customer can fork, audit, and modify.

The Chip-and-IP Hybrid Business Model

Tenstorrent does something unusual in the chip industry: it sells its own silicon and also licenses the IP for customers to build their own chips.

Most companies pick one:

  • ARM licenses IP, doesn't make chips.
  • NVIDIA makes chips, doesn't license IP for competitive use.
  • Tenstorrent does both, and most bookings to date have come from IP deals rather than chip sales.

The IP Customer List

1. LG Electronics

  • Licensed both the Tensix AI core IP and the Ascalon CPU IP.
  • Initial deal targeted smart TV chiplets.
  • Expanded 2024 partnership covers system-on-chips across LG's product line.
  • LG also participated in Tenstorrent's Series D funding round.

2. Hyundai Motor Group

  • Invested in Tenstorrent and committed to using its designs in future Hyundai, Kia, and Genesis vehicles.
  • Hyundai Mobis added Tenstorrent COO Keith Witek to its board - first time the Korean supplier had appointed someone from the AI semiconductor industry as a non-standing director.

3. Japan's LSTC (Government-Backed)

  • Leading-edge Semiconductor Technology Center selected Tenstorrent's RISC-V and chiplet designs for a 2nm AI accelerator project.
  • Strategically significant: Japan's compute sovereignty program is one of the most concrete national efforts to build an alternative to NVIDIA.

4. Others

  • SingularityNet - Swiss AI consortium.
  • UnsungFields - Japan-focused partnership.
  • Razer - CES 2026 deal on a Thunderbolt-attached compact AI accelerator for laptops.

The Series D Round

  • USD 693 million at a USD 2.6 billion valuation, December 2024.
  • Co-led by Samsung Securities and Seoul-based AFW Partners.
  • Jeff Bezos, LG Electronics, and Hyundai Motor Group also participated.
  • As much a validation of the IP business as the chip business.

The Sovereign AI Play

The other distinctive piece of Tenstorrent's strategy is geographic and political, not just technical. The company has explicitly oriented its sales effort toward governments and enterprises in Europe, Middle East, and Asia that want compute outside the NVIDIA + US hyperscaler stack.

This is the "sovereign AI" wedge:

  • The EU AI Act and several national AI programs now require auditable, open-source compute stacks for certain workload categories.
  • UAE - through Tenstorrent's collaboration with Infinia at Abu Dhabi Finance Week.
  • Japan - through LSTC and the UnsungFields partnership.
  • Cyprus - through ongoing arrangements.
  • Parts of the European Commission's compute infrastructure procurement programs.

Common requirement: compute that is not dependent on a single foreign supplier.

For these customers, "openness" is not an ideological preference. It is a procurement criterion. A fully open-source software stack on a fully open ISA on chips fabricated through multiple foundries - Samsung today, with talks underway with multiple 2nm fabs - passes that criterion in ways NVIDIA's proprietary stack cannot.

"I'm targeting organizations that want to own their own computing roadmap." - David Bennett, Chief Customer Officer, Tenstorrent

That sentence is the cleanest summary of who Tenstorrent is selling to.

What Tenstorrent Is Not (Yet)

Honesty matters here, especially because this is a research-grade piece and not marketing.

Tenstorrent is not, today, a drop-in NVIDIA replacement for most production inference teams:

  • Spheron's April 2026 deep-dive concluded that the Tenstorrent software stack is "early in the development cycle" for Blackhole.
  • Most verified model support and TT-Metal documentation targets the older Wormhole generation.
  • There is no production equivalent to vLLM's OpenAI-compatible endpoint with PagedAttention and continuous batching.
  • Most HuggingFace models not in Tenstorrent's verified list require manual kernel rewrites - not a one-day task.

Direct Competitors Are Also Active

  • Cerebras - 1,500+ tokens/sec on DeepSeek-R1-Distill-Llama-70B (claimed 57x faster than GPUs).
  • Groq - Built a low-latency inference business around its LPU architecture.
  • SambaNova - 360+ tokens/sec on the same Llama-70B distill model.

Why Tenstorrent's Bet Is Different

It is not chasing the highest single-benchmark number. It is building a fully open, general-purpose, scale-out platform - chips, IP, and software - that gives customers control over their compute roadmap in a way no other player offers at this scale.

The bet:

  • Inference workloads continue to evolve toward MoE, agentic systems, and long-context reasoning.
  • An architecture purpose-built for explicit data movement and cooperative compute will outpace one retrofitted from graphics.
  • Buyers who value openness and supply diversity will pay attention.

Coming Next

In Part 4, we dig into the actual chips - Grayskull, Wormhole, Blackhole, the Galaxy server, and Galaxy Blackhole - and what the open published benchmarks say about how they compare to traditional GPUs.

Sources: Sacra company analysis of Tenstorrent (April 2026), Tom's Hardware coverage of Jim Keller and RISC-V partnerships, EE Times interviews, The Logic profile on Tenstorrent's sovereign AI strategy, Design News interview with Ljubisa Bajic, Tekedia funding round coverage, KED Global LG partnership coverage, DCD coverage of the Hyundai investment, Moor Insights & Strategy analyst note, Spheron Tenstorrent vs NVIDIA comparison (April 2026), Tenstorrent official documentation at docs.tenstorrent.com.

GPU NET


Our Official Channels:

Website | Twitter | Telegram | Discord

More Stories

Arrow leftArrow left
Try our Planetary Grid of Compute Now!