Tenstorrent: The Jim Keller Bet to Break NVIDIA's Inference Monopoly

If you are going to challenge NVIDIA, the first question is: what could you possibly be doing differently? CUDA has 18 years of compounding investment. The H100 to B200 to Rubin roadmap is on track. AMD is the closest thing to a credible second source and is still climbing. Tenstorrent's answer is not 'do GPUs better.' It is 'GPUs are the wrong architecture for what's coming, the software is closed, and the moat is built on a moat - let's pull every leg out at once.' This is Part 3 of the AI Inference Hardware series, walking through the Jim Keller pedigree, the four-part architectural thesis, the chip-and-IP hybrid business model, and the sovereign AI play.

Yashwanth

June 11, 2026

Introduction

If you are going to challenge NVIDIA, the first question is: what could you possibly be doing differently?

CUDA has 18 years of compounding investment.
The H100 → B200 → Rubin roadmap is on track.
AMD is the closest thing to a credible second source and is still climbing.
Cerebras, Groq, SambaNova, Graphcore - every one of these companies has tried, and each has carved out a niche without dislodging the incumbent.

Tenstorrent's answer is not "do GPUs better." It is:

GPUs are the wrong architecture for what's coming, the software is closed, and the moat is built on a moat - let's pull every leg out at once.

That answer has attracted USD 700M+ in funding at a roughly USD 2.6B valuation, with backers including Jeff Bezos, Bezos Expeditions, Samsung Catalyst Fund, Hyundai Motor Group, Fidelity, and Eclipse Ventures. It has also attracted a roster of engineering talent that, if you squint, looks like a reunion tour for the best silicon designers of the last twenty years.

The Jim Keller Pedigree

Jim Keller is one of the most consequential CPU architects of the modern era. His track record reads like an industry index:

AMD K8 Athlon
Apple A4 and A5 chips
AMD's Zen architecture that revived the company
Tesla's original FSD chip
A stint at Intel shaping their chip strategy

He joined Tenstorrent as CTO in 2020 and became CEO not long after, anchoring the company's strategic direction around RISC-V and open architecture.

The company itself was founded in 2016 by Ljubisa Bajic, Ivan Hamer, and Milos Trajkovic. Bajic - a former NVIDIA senior architect and AMD IC designer - was clear from the start about the thesis. As he told Design News in an early interview:

"GPUs are essentially at the end of their evolutionary curve. They've done a great job; they've pushed the field to the point where it is now. But in order to make any kind of order of magnitude type jumps, GPUs are going to have to go."

That framing has been the throughline. Tenstorrent is not trying to win by building a faster GPU. It is trying to win by arguing that the GPU paradigm is wrong for the shape of AI workloads coming next, especially Mixture-of-Experts models, agentic inference, and long-context reasoning.

Other Key People

Wei-Han Lien - Chief architect of Tenstorrent's Ascalon RISC-V CPU IP. Worked at NexGen → AMD → PA Semi → Apple (contributed to A6, A7, arguably M1).
Raja Koduri - Formerly head of Intel's discrete GPU effort, earlier lead at AMD Radeon. Joined Tenstorrent's board.

The Four-Part Thesis

What Tenstorrent is actually betting on can be summarized as four interlocking architectural choices, each of which contrasts deliberately with NVIDIA and AMD.

1. RISC-V Instead of Proprietary ISAs

RISC-V is an open instruction set architecture maintained at UC Berkeley, royalty-free, with no single national or corporate owner.
Tenstorrent uses RISC-V for everything - the "baby" cores inside each Tensix compute tile, the larger Ascalon application cores, and the SiFive-licensed general-purpose cores in earlier designs.
Strategic argument: an open ISA is a precondition for the rest of the stack being open.
Practical argument: RISC-V's modular design enables architectural customization (five RISC-V processors per Tensix core, each handling a distinct pipeline stage) in ways x86 or proprietary architectures simply cannot.

2. Tensix Cores Instead of SIMT GPU Streaming Multiprocessors

A Tensix core is fundamentally different from a CUDA SM. Each contains:

5 small RISC-V cores for control and instruction dispatch
Dedicated matrix engine (FPU) and vector engine (SFPU)
Pack and unpack units
1.5 MB of local SRAM

Key differences from GPUs:

No SIMT execution model.
No warp scheduling.
No hardware multithreading hiding latency behind context switching.
Each Tensix core executes its own instruction stream cooperatively, with explicit data movement between tiles over a mesh network-on-chip.
Deliberate absence of cache hierarchies provides deterministic, consistent memory access patterns.

The trade-off: harder to program than CUDA, but much more transparent and predictable.

3. Mesh-Based Scale-Out via Ethernet, Not Proprietary Fabric

NVIDIA's NVLink and InfiniBand: proprietary, expensive, switch-heavy interconnects.
AMD's Infinity Fabric: same shape.
Tenstorrent integrates 400 Gbps Ethernet (Wormhole) or 800G QSFP-DD (Blackhole) directly onto each chip.
A processor scales out by connecting to another over a passive QSFP-DD cable. No switches.
The on-chip mesh and the off-chip Ethernet fabric form a single logical network - a cluster of chips can be programmed as one large mesh of Tensix cores.
Company calls this TT-Fabric and presents it as a 10x TCO advantage for AI data center design.

4. Open-Source Software, Top to Bottom

The full stack is Apache 2.0 licensed on GitHub:

TT-Metalium - Low-level SDK; OpenCL-like C++ interface giving direct access to RISC-V cores, NoC, matrix and vector engines.
TT-NN - Higher-level operator library with a PyTorch-like Python API.
TT-Forge - MLIR-based compiler that ingests PyTorch, JAX, or ONNX and lowers through TT-NN and TT-Metalium to the hardware.

No encrypted APIs. No black boxes. The same compiler an engineer at Tenstorrent uses to optimize Llama 3 70B is the compiler a developer at any customer can fork, audit, and modify.

The Chip-and-IP Hybrid Business Model

Tenstorrent does something unusual in the chip industry: it sells its own silicon and also licenses the IP for customers to build their own chips.

Most companies pick one:

ARM licenses IP, doesn't make chips.
NVIDIA makes chips, doesn't license IP for competitive use.
Tenstorrent does both, and most bookings to date have come from IP deals rather than chip sales.

The IP Customer List

1. LG Electronics

Licensed both the Tensix AI core IP and the Ascalon CPU IP.
Initial deal targeted smart TV chiplets.
Expanded 2024 partnership covers system-on-chips across LG's product line.
LG also participated in Tenstorrent's Series D funding round.

2. Hyundai Motor Group

Invested in Tenstorrent and committed to using its designs in future Hyundai, Kia, and Genesis vehicles.
Hyundai Mobis added Tenstorrent COO Keith Witek to its board - first time the Korean supplier had appointed someone from the AI semiconductor industry as a non-standing director.

3. Japan's LSTC (Government-Backed)

Leading-edge Semiconductor Technology Center selected Tenstorrent's RISC-V and chiplet designs for a 2nm AI accelerator project.
Strategically significant: Japan's compute sovereignty program is one of the most concrete national efforts to build an alternative to NVIDIA.

4. Others

SingularityNet - Swiss AI consortium.
UnsungFields - Japan-focused partnership.
Razer - CES 2026 deal on a Thunderbolt-attached compact AI accelerator for laptops.

The Series D Round

USD 693 million at a USD 2.6 billion valuation, December 2024.
Co-led by Samsung Securities and Seoul-based AFW Partners.
Jeff Bezos, LG Electronics, and Hyundai Motor Group also participated.
As much a validation of the IP business as the chip business.

The Sovereign AI Play

The other distinctive piece of Tenstorrent's strategy is geographic and political, not just technical. The company has explicitly oriented its sales effort toward governments and enterprises in Europe, Middle East, and Asia that want compute outside the NVIDIA + US hyperscaler stack.

This is the "sovereign AI" wedge:

The EU AI Act and several national AI programs now require auditable, open-source compute stacks for certain workload categories.
UAE - through Tenstorrent's collaboration with Infinia at Abu Dhabi Finance Week.
Japan - through LSTC and the UnsungFields partnership.
Cyprus - through ongoing arrangements.
Parts of the European Commission's compute infrastructure procurement programs.

Common requirement: compute that is not dependent on a single foreign supplier.

For these customers, "openness" is not an ideological preference. It is a procurement criterion. A fully open-source software stack on a fully open ISA on chips fabricated through multiple foundries - Samsung today, with talks underway with multiple 2nm fabs - passes that criterion in ways NVIDIA's proprietary stack cannot.

"I'm targeting organizations that want to own their own computing roadmap." - David Bennett, Chief Customer Officer, Tenstorrent

That sentence is the cleanest summary of who Tenstorrent is selling to.

What Tenstorrent Is Not (Yet)

Honesty matters here, especially because this is a research-grade piece and not marketing.

Tenstorrent is not, today, a drop-in NVIDIA replacement for most production inference teams:

Spheron's April 2026 deep-dive concluded that the Tenstorrent software stack is "early in the development cycle" for Blackhole.
Most verified model support and TT-Metal documentation targets the older Wormhole generation.
There is no production equivalent to vLLM's OpenAI-compatible endpoint with PagedAttention and continuous batching.
Most HuggingFace models not in Tenstorrent's verified list require manual kernel rewrites - not a one-day task.

Direct Competitors Are Also Active

Cerebras - 1,500+ tokens/sec on DeepSeek-R1-Distill-Llama-70B (claimed 57x faster than GPUs).
Groq - Built a low-latency inference business around its LPU architecture.
SambaNova - 360+ tokens/sec on the same Llama-70B distill model.

Why Tenstorrent's Bet Is Different

It is not chasing the highest single-benchmark number. It is building a fully open, general-purpose, scale-out platform - chips, IP, and software - that gives customers control over their compute roadmap in a way no other player offers at this scale.

The bet:

Inference workloads continue to evolve toward MoE, agentic systems, and long-context reasoning.
An architecture purpose-built for explicit data movement and cooperative compute will outpace one retrofitted from graphics.
Buyers who value openness and supply diversity will pay attention.

Coming Next

In Part 4, we dig into the actual chips - Grayskull, Wormhole, Blackhole, the Galaxy server, and Galaxy Blackhole - and what the open published benchmarks say about how they compare to traditional GPUs.

Sources: Sacra company analysis of Tenstorrent (April 2026), Tom's Hardware coverage of Jim Keller and RISC-V partnerships, EE Times interviews, The Logic profile on Tenstorrent's sovereign AI strategy, Design News interview with Ljubisa Bajic, Tekedia funding round coverage, KED Global LG partnership coverage, DCD coverage of the Hyundai investment, Moor Insights & Strategy analyst note, Spheron Tenstorrent vs NVIDIA comparison (April 2026), Tenstorrent official documentation at docs.tenstorrent.com.

GPU NET

Our Official Channels:

Website | Twitter | Telegram | Discord

Understanding BERT: A State of the Art Model for NLP Using Deep Bidirectional Transformers

BERT recently got popular after its debut in 2018, courtesy of Google AI Language, short for Bidirectional Encoder Representations from Transformers. This new tool has become super important in the world of AI, especially for understanding human language. It’s like having a Swiss army knife for language related challenges, capable of handling tasks ranging from understanding sentiments in text to identifying important names and phrases.

Sujal Sripathi

July 16, 2024

Community Program

Assessing Large Language Models for Program Synthesis

Can big computer programs make new ones? Some experts think they can, especially the really big ones. These programs are great at understanding language and creating complex computer code. People who know a lot about coding are impressed because these programs can write difficult programs easily. It shows how smart computers have become at understanding language and making new things with it. This is where prompt engineering comes in. Engineers use special instructions or prompts to help these programs learn to do cool things like creating new computer programs. By guiding them with exact directions, engineers make sure these programs can comprehend & write complex code right.

Sujal Sripathi

July 12, 2024

Community Program

Compute Is Already an Asset Class. Tokenization Decides Who Gets to Own It.

Wall Street spent three years quietly rebuilding GPUs into investment-grade collateral. Tokenization is the layer that decides whether you're on the cap table or watching from outside and GPUnet's RWA Pool puts real GPU hardware within reach.

GPUNET

July 21, 2026

AI Inference

Tenstorrent in the Real World: Benchmarks, Customers, and the Inference Bet That's Starting to Pay Off

Across five pieces, we have built up a picture: an inference market growing from USD 106 billion in 2025 to a projected USD 255 billion by 2030; a NVIDIA-dominated landscape with CUDA lock-in and pricing pressure; an AMD that has caught up on silicon but still trails on software; and a Tenstorrent that is betting on architectural divergence to break the GPU paradigm for inference. The question for this final piece is the only one buyers actually care about: is it working? What do the published benchmarks say, who is buying or licensing, and what is the real-world impact on user-facing inference workloads? This is the closing Part 6 of the AI Inference Hardware series.

Yashwanth

June 14, 2026

AI Inference

NVIDIA vs AMD vs Tenstorrent: An Architectural Deep Dive on Inference

This piece tries to do the most: line up NVIDIA, AMD, and Tenstorrent side by side at the architectural level - execution model, memory hierarchy, interconnect, software stack, and the strategic shape of each company's bet - and explain why Tenstorrent's choices, while less proven, are particularly well-fitted to the direction inference workloads are heading. The question is not 'which is best today' but 'which architectural lineage is best matched to where inference is going, and why does that matter for buyers and operators?' This is Part 5 of the AI Inference Hardware series.

Yashwanth

June 13, 2026

AI Inference

Inside the Tenstorrent Chips: Grayskull, Wormhole, Blackhole, and Galaxy

A company can have the most compelling thesis in the world, but it lives or dies on the silicon. This piece walks through the actual Tenstorrent chips - what they are, what they cost, and what the published, mostly open-source benchmarks say about how they stack up against NVIDIA and AMD. We cover the full product ladder from Grayskull to Galaxy Blackhole, explain what makes a Tensix core architecturally different, and walk through the published benchmarks honestly - including where the numbers come from and what caveats apply. This is Part 4 of the AI Inference Hardware series.

Yashwanth

June 12, 2026

AI Inference

Tenstorrent: The Jim Keller Bet to Break NVIDIA's Inference Monopoly

Yashwanth

June 11, 2026

AI Inference

The Real Cost of AI Inference in 2026: A Practical Breakdown

If you take one thing from this piece, it should be this: the headline GPU hourly rate is almost never the right number to optimize. What matters is cost per million tokens, and that number is shaped by three multipliers - the GPU itself, the serving stack (vLLM, TensorRT-LLM, SGLang), and the precision and batching strategy you run. Get all three right and the same workload that costs USD 5 per million tokens can cost USD 0.20. This is Part 2 of the AI Inference Hardware series, walking through real cloud GPU rates, the cost-per-million-tokens math, the serving stacks that actually move the needle, and the self-host vs API breakeven point in 2026.

Yashwanth

June 10, 2026

AI Inference

The Inference Era: How NVIDIA and AMD Are Fighting for the Next AI Goldmine

For most of the AI boom so far, the headlines have belonged to training. But somewhere between GPT-4 and the agent-driven applications now shipping inside every product team's roadmap, the center of gravity quietly shifted. Training is still expensive, but inference - actually running those models in production, every second, against real user traffic - is where the money is now being spent. And it is where the hardware fight is heating up the fastest. This piece is Part 1 of a 6-part deep dive into the inference hardware landscape, walking through the market shape, NVIDIA's dominance, AMD's catch-up, and the cracks in the GPU paradigm that are opening doors for challengers.

Yashwanth

June 9, 2026

Tutorials

GPU Quest: Road to TGE

GPU.net is advancing decentralized computing by enabling GPU resource sharing, and its upcoming Token Generation Event (TGE) marks a significant milestone. The “Road to TGE” campaign on token.gpu.net provides a structured way for participants to earn rewards and engage with the project. This overview explains the campaign, its components, and how you can get involved in a professional and straightforward manner.

Surya Ranjith

May 11, 2025

Tutorials

GPU SUBNETS - A New Era

In a world where centralized GPU computing is expensive and restrictive, Subnets on GAN Chain offer a decentralized revolution. By connecting creators, users, and investors to a global pool of GPU resources, Subnets deliver affordable, scalable, and community-governed computing power. With tools to simplify project deployment, incentives for participation, and AI-optimized resource allocation, GPU.NET is not just solving today's GPU challenges — it's building the future. Join the movement: create, innovate, and grow with Subnets on GAN Chain.

Surya Ranjith

May 10, 2025

Provider Guide

Complete Guide on running a GPU Provider Nodes

This guide aims to minimize the friction in using documentation, providing you with a streamlined approach to set up your Provider GPU node. We'll walk you through the essential steps, ensuring you gather all the correct procedures effortlessly. With this guide, you'll have a clear path to running your Provider node efficiently. Let's dive into the steps and make the setup process as smooth as possible.

DJAL

August 12, 2024

Validator Guide

Complete Guide on running a GPU Validator Node

This guide aims to minimize the friction in using documentation, providing you with a streamlined approach to set up your validator GPU node. We'll walk you through the essential steps, ensuring you gather all the correct procedures effortlessly. With this guide, you'll have a clear path to running your validator node efficiently. Let's dive into the steps and make the setup process as smooth as possible.

DJAL

August 12, 2024

Community Program

Large Multimodal Models (LMMs) vs Large Language Models (LLMs)

Large multimodal models (LMMs) are a big change because they can handle different types of data like text, images, and audio. But they are complex and need a lot of data, which can be tricky at times. From the start, it was evident that AI would need to be multifunctional and serve as a single platform for various purposes, and LMM exactly is that.

Sujal Sripathi

August 9, 2024

HPC

Supercomputing and High-Performance Computing: Understanding the Differences

There is plenty of discussion about High Performance Computing (HPC) these days, especially because the demand for AI clusters has surged, leading to a greater emphasis on top notch computing power. For a long time, high performance computing has shown it can accurately model and predict many physical properties and events. Such performance have deeply impacted our world, helping create wealth and enhancing our quality of life.

Sujal Sripathi

July 25, 2024

NLPs