
MediaTek Dimensity 9600: How Google TPU v7 Ironwood Is Quietly Rewiring Next-Gen Mobile AI Efficiency
Smartphone chips are no longer just tiny CPUs and GPUs squeezed onto a board. In 2025 they have become miniature data centers, juggling camera pipelines, gaming workloads, connectivity, and increasingly heavy on-device AI. That is why Google’s new Ironwood TPU v7 and its unexpected partnership with MediaTek matter far beyond the walls of Google’s data centers. What starts in the cloud rarely stays there. The lessons learned from building a cutting-edge AI accelerator are already flowing into MediaTek’s upcoming flagship mobile platform, the Dimensity 9600, with efficiency as the headline promise.
On paper, Ironwood is positioned as the first truly credible Application-Specific Integrated Circuit (ASIC) challenger to NVIDIA’s Blackwell-class GPUs for AI workloads. Under the surface, however, it also serves as a live, massively scaled experiment in power delivery, memory bandwidth, and interconnect design – exactly the kind of expertise that translates beautifully into smartphone silicon, where every milliwatt counts. MediaTek’s role in this TPU generation is therefore more than a footnote; it is an R&D shortcut that would have been nearly impossible to buy with money alone.
Why Google’s Ironwood TPU v7 Is a Big Deal
For years, NVIDIA has dominated AI compute with its GPU stacks, turning CUDA-powered platforms into the default choice for training and running neural networks. TPUs, by contrast, are laser-focused ASICs tuned for specific AI workloads. They are harder to design but reward that effort with outstanding performance per watt when you hit the sweet spot of their targeted use cases. Google’s Ironwood TPU v7 arrives right when the AI world is shifting from gigantic, exploratory models to more efficient, deployed systems where inference cost and latency matter more than raw training throughput.
That context explains why Ironwood’s competitiveness on inference is so important. Early figures indicate that, on real-world inference tasks, Ironwood lands very close to NVIDIA’s latest GPUs in raw performance while pulling ahead on total cost of ownership. Hardware purchase price, operating power, cooling, and data center floor space are all part of that TCO equation. The cheaper it becomes to deploy inference at scale, the more AI products Google – and its customers – can ship, and the more pressure there is to push similar efficiency wins into consumer hardware such as the Dimensity 9600.
Inside the Ironwood TPU v7 Architecture
Ironwood’s design revolves around a dual-chiplet package that behaves like a single, tightly integrated accelerator. Each chiplet houses a sophisticated collection of functional blocks designed to attack different aspects of AI computation:
- TensorCore with systolic array architecture: This is the heart of the TPU, built to execute dense matrix multiplications – the core math operation behind modern neural networks. A systolic array streams data through a grid of compute units, dramatically cutting the number of expensive memory reads and writes from high-bandwidth memory (HBM). The result is more throughput per joule and far better utilization of the available memory bandwidth.
- Vector Processing Unit (VPU): Neural networks are full of element-wise operations – activation functions like ReLU, GELU, and their relatives, plus normalizations and other transforms. The VPU is tuned for this kind of vector math, cleaning up everything that does not fit neatly into pure matrix multiplication.
- Matrix Multiply Unit (MXU): Complementing the TensorCore, the MXU focuses on matrix multiplication workloads that require different dataflows or precisions. Together, these units give Ironwood the flexibility to handle a wide range of AI model architectures without falling back to inefficient general-purpose compute.
- Two SparseCores per chiplet: Sparsity is a crucial optimization for modern models, where many weights or features can be ignored without hurting accuracy. SparseCores are built to thrive on irregular, data-dependent memory access patterns, which are common when processing embeddings. These embeddings compress massive categorical feature sets – like entire vocabularies or giant ID lists – into dense vector spaces, and they are critical to recommendation systems, language models, and search.
- 96 GB of HBM: Each TPU package is fed by a huge pool of high-bandwidth memory, allowing models with large parameter counts or extensive embeddings to stay resident without constantly spilling to slower system memory.
All of this would not deliver its promised performance without a serious interconnect story. Ironwood’s two chiplets are joined by a die-to-die (D2D) link that is around six times faster than a traditional one-dimensional inter-chip interconnect (ICI), helping the package act like a unified accelerator. At the system level, 64 of these chips are wired together via ICI inside a single rack, with each chip enjoying roughly 1.2 TB/s of bidirectional ICI bandwidth. This 64-chip assembly is what Google calls a cube.
Scale that up, and multiple cubes are connected through an Optical Circuit Switch (OCS) network to form a superpod. A full Ironwood superpod tops out at 144 cubes, or 9,216 chips in total. That is an enormous sea of matrix engines, vector units, and SparseCores, stitched together so models can be sharded or pipelined across thousands of accelerators. It is cloud-scale AI infrastructure built explicitly for Google and its customers – and MediaTek has been involved right in the middle of that effort.
MediaTek’s Unexpected but Crucial Role
Historically, Google leaned heavily on Broadcom for its TPU designs, working shoulder-to-shoulder on multiple generations of the accelerator. With Ironwood, that script shifted. Reports from early 2025 revealed that MediaTek was handed responsibility for designing Ironwood’s input/output (I/O) modules, the logic that mediates communication between the TPU, memory, and surrounding peripherals. It is not glamorous on a spec sheet, but it is a highly sensitive part of the chip where latency, power, signal integrity, and reliability must all be balanced.
This partnership marks a subtle but meaningful vote of confidence from Google. I/O paths are the veins and arteries of a modern accelerator; any inefficiency there cascades into wasted cycles and wasted power. UBS analysts estimate that the collaboration around Ironwood could generate roughly 4 billion dollars’ worth of business for MediaTek, but the strategic upside may be even bigger: first-hand experience building a cornerstone component of one of the world’s most advanced AI ASICs.
From Cloud ASIC to Dimensity 9600: What Really Transfers
It is important to note that a data center TPU and a smartphone application processor (AP) are not interchangeable. One is a purpose-built ASIC optimized for rack-scale AI workloads; the other is a highly integrated SoC that must juggle cameras, connectivity, gaming, and UI responsiveness under a strict power and thermal cap. MediaTek cannot simply lift Ironwood blocks and drop them into the Dimensity 9600.
What does translate, however, is know-how. Working on Ironwood’s I/O modules gives MediaTek deep insight into how to move data quickly and efficiently while still keeping tight control over power. That same philosophy can be reused in at least three key areas of the Dimensity 9600 design:
- Smarter power gating: By borrowing techniques from TPU-scale I/O, MediaTek can refine how individual blocks in the Dimensity 9600 are shut off when idle. More granular and aggressive power gating for I/O controllers, memory interfaces, and AI accelerators means slashing leakage without hurting responsiveness.
- Finer-grained voltage scaling: Data center chips rely heavily on sophisticated voltage and frequency scaling to stay inside power envelopes while running close to their performance limits. The same principles can be applied to the Dimensity 9600’s cores and on-chip fabrics, letting each block sip exactly the voltage it needs for the moment’s workload instead of using a one-size-fits-all setting.
- Refined clock gating and timing strategies: The better you understand real-world data paths, the more aggressively you can gate clocks or slow down specific domains when full speed is not required. That directly improves battery life and thermals in smartphones, especially under sustained AI loads like real-time translation or on-device generative models.
These changes become even more relevant when you consider MediaTek’s recent flagship architectures, which have moved away from traditional clusters of ultra-low-power efficiency cores. Without those tiny cores to offload background tasks, the burden falls on power management, clock gating, and the scheduler to keep big and mid cores from wasting energy. Any tricks learned in the crucible of Ironwood – where every watt is multiplied thousands of times across a superpod – are invaluable in that context.
MediaTek’s Broader AI Ambition
MediaTek is reportedly also developing its own dedicated AI chips beyond smartphone SoCs. In that arena, the TPU knowledge transfer is even more direct: everything from systolic array design trade-offs to SparseCore-style handling of irregular memory access can inform future accelerators aimed at edge servers, automotive platforms, or specialized consumer devices.
At the same time, Dimensity 9600 remains the company’s most visible flagship. It is the chip that will end up in premium Android phones, where consumers will test MediaTek’s claims about sustained performance, AI features, and battery efficiency. Expect the company to highlight not only headline AI TOPS numbers but also real-life experiences: longer 4K recording without overheating, faster on-device image enhancement, snappier voice assistants that work offline, and generative tools that do not instantly drain the battery.
What This Means for Users and the Wider Industry
For everyday users, the Google–MediaTek partnership is unlikely to appear in marketing slogans, but its impact could be very tangible. A more efficient Dimensity 9600 means phones that hold performance longer during gaming or AI-heavy photo processing, while still delivering better battery life. It also opens the door for richer on-device AI assistants, personalization models that never leave your handset, and advanced camera pipelines that run sophisticated neural networks frame after frame.
For the industry, the move is a clear signal that the AI hardware race is fragmenting. NVIDIA’s Blackwell GPUs are still the training workhorses in many scenarios, but specialized ASICs like Ironwood are carving out a powerful niche in inference and vertically integrated stacks. MediaTek, by embedding TPU-inspired efficiency into its mobile roadmap, is positioning itself as a serious alternative to Qualcomm at the high end, especially in regions where value and power efficiency are more important than brand prestige.
The Road Ahead for Dimensity 9600
Until the Dimensity 9600 ships and independent benchmarks arrive, much of this story is about potential rather than proof. Still, the direction is unmistakable. By helping Google build Ironwood’s I/O foundation, MediaTek has earned a seat at the table where some of the world’s most demanding AI hardware decisions are made. The techniques refined there – from brutal, rack-scale power budgeting to intricate interconnect timing – are precisely the kind of expertise that can turn a good mobile SoC into an exceptional one.
If MediaTek successfully distills those lessons into the Dimensity 9600 and its successors, we are likely to see a new generation of Android flagships that feel faster, stay cooler, and last longer, all while running far more AI locally. In other words, the path from Google’s TPU superpods to your next smartphone might be shorter than it looks.
2 comments
All this is cool until carriers shove bloatware on top and ruin the experience anyway 🤦
So basically my next phone battery depends on some fancy Google server chip collab… I’m weirdly ok with this 😅