Cracks in NVIDIA’s Moat

Compute is a high-certainty demand with extremely clear metrics. Unlike consumer products like Apple’s or Tesla’s — where decisions involve many factors (brand, sentiment, ecosystem stickiness) and customers can be “irrational” — B2B large customers are extremely rational and cold-blooded: as soon as a better alternative or a competitor exists, they will switch without hesitation, or at minimum proactively cultivate a second supplier to maintain bargaining leverage. NVIDIA’s monopoly is not indestructible: it is just early enough, large enough, and ecosystem-deep enough. This is a clear lane; challengers will keep appearing, and once it slows down, it will be caught up. This article addresses four questions: (1) why did challengers suddenly appear → (2) where will the impact land → (3) what else accelerates this process → (4) when does it happen, and how do we watch it.

A Different Perspective — why NVDA isn’t invincible

NVIDIA HGX B200 baseboard · 8 Blackwell GPUs + NVLink Switch
NVIDIA HGX B200 baseboard · 8 Blackwell GPUs + NVLink Switch — the de facto industry standard for today’s AI training market. Blackwell launch sold out through mid-2026; FY26 data-center revenue $215.9B (+89% YoY), Q4 $62.3B. This is NVIDIA’s brightest moment — and the starting line for the challenger pack.
Image: Wikimedia Commons / NVIDIA · Public Domain.

NVIDIA’s moat is real — CUDA, NVLink, HBM lock-up, CoWoS capacity, 25 years of ecosystem accumulation — but the real bedrock of every B2B story is customer rationality. This is the largest difference vs Apple or Tesla. Apple sells 200M iPhones a year to consumers, where emotion and brand make up a large fraction of decisions; same for Tesla. But GPUs sold to OpenAI / Anthropic / Microsoft / Google / Meta / ByteDance — the buyers are the world’s most rational, cold-blooded engineering teams — they look at token/dollar, token/watt, and total cost per training run, not the logo, not the brand.

This means NVIDIA’s monopoly has a built-in instability: the moment it slows down, large customers will immediately cultivate a second supplier. Look at what’s already happened — Google’s in-house TPU (now sold externally), Amazon’s in-house Trainium, Microsoft’s in-house Maia, Meta’s in-house MTIA, and OpenAI committing $20B to Cerebras + co-developing an inference chip with Broadcom. Every one of them is actively “growing a second supplier.” This isn’t conspiracy; it’s the ABCs of B2B procurement.

MetricValueNote
FY26 data-center revenue$215.9B+89% YoY · Q4 $62.3B · ~89% of total revenue · four CSPs combined 61% of quarterly revenue (NVIDIA Newsroom)
Non-GAAP gross margin~75%Q4 75% · guidance mid-range 75% · B200 per-card margin ~84% (ASP $35–40K · BOM cost ~$6.4K)
AI accelerator market share80–90%Significant variance across sources · 2024 peak 87% · 2026 expected ~75%
Cost per training run$50M–$500MMagnitude for very large models · error cost extremely high → no one wants to bet on immature hardware · training-fortress causation

NVIDIA’s monopoly is not invincible: the moat is real, but it is just the sum of three things — “early enough + large enough + far-reaching ecosystem” — not a structurally insurmountable position. Once it slows down, challengers will swarm in; customer rationality reverse-accelerates this process.

— The question here is not “whether,” but “when” and “how to watch”

Architectural Convergence — why challengers suddenly appeared

Traditional DL Era · A Hundred Flowers
2012–2017 · one best architecture per task
AlexNet architecture · 2012
AlexNet
2012 · CNN starting point
ResNet residual block · x + F(x) · 2015
ResNet
2015 · residual connections
BERT model · Transformer Encoder · 2018
BERT
2018 · Encoder-only
LSTM cell · 1997
LSTM
1997 · RNN / sequence
U-Net architecture · 2015
U-Net
2015 · image segmentation
GAN architecture · 2014
GAN
2014 · generative adversarial
Decoder-only Era · Convergence
2020–2026 · GPT / Llama / Gemini / Claude same backbone
Decoder-only Transformer architecture · dominant from 2017 onward
Decoder-only Transformer · GPT / Llama / Gemini / Claude / DeepSeek all based on this
Variants are only in attention grouping: MHA → MQA → GQA → MLA
The six images on the left represent the “one best architecture per task” diversity from 1997–2018 — RNN (LSTM), CNN (AlexNet / ResNet / U-Net), generative models (GAN), early Transformer encoders (BERT, encoder-only); the right side represents the convergence post-2020 — the industry settled on a single decoder-only Transformer backbone (GPT / Llama / Gemini / Claude / DeepSeek same family), with variants reduced to fine-tuning of attention grouping (MHA → MQA → GQA → MLA). This is unprecedented stability in ML history, and its significance parallels x86’s victory back in the day: for the first time, the target of hardware optimization has become certain.
Architecture diagrams from Wikimedia Commons (AlexNet · ResNet residual block · BERT · LSTM · U-Net · GAN · Full GPT Architecture), CC BY-SA 4.0 / CC0 Public Domain.

Challengers didn’t appear from nowhere; there’s a frequently overlooked structural change behind them: model architecture is converging. The CNN era was a hundred flowers — AlexNet / VGG / ResNet / Inception / MobileNet / EfficientNet — each task with its own “best architecture”; hardware didn’t know which to optimize for, and general-purpose GPUs swept the field. The Transformer era is the opposite: from the original 2017 paper to 2026 frontier models, the backbone has barely changed — Pre-LayerNorm, RoPE, RMSNorm, SwiGLU, Residual, KV Cache are all old techniques from years ago.

Today’s “innovation” is essentially limited to a few attention-grouping variants (MHA → MQA → GQA → MLA), with GQA now the de facto default for open-source large models; Mamba / SSM and other challengers have ultimately appeared in “hybrid architecture” form rather than as replacements for Transformer — a NeurIPS 2025 paper even proves that Mamba and Transformer belong to the same complexity class and are essentially equivalent. This is unprecedented stability in ML history. Its significance parallels x86’s win — once architecture converges, the target of hardware optimization becomes certain for the first time.

Hardware Design Freedom Unlocked by Convergence

  1. Operators can be fully fixed. Cutting from hundreds of operators down to a few core ones (MatMul, Softmax, RoPE, LayerNorm) is enough; all non-essential circuitry can be removed. The cost of GPU generality is many transistors wasted on paths that will never be invoked by LLMs.
  2. Dataflow can be planned at compile time. Transformer compute graphs are statically regular, with no need for the GPU’s complex dynamic scheduling — GPUs waste many transistors on warp schedulers, memory coalescing, and other circuitry that adapts to “unknown workloads.”
  3. Memory hierarchy can be tailored. Parameter size, KV-cache size, and activation size are all known in advance. A challenger can design SRAM/HBM/DRAM hierarchies directly for “7B model + 32K context” rather than reserving buffers for “arbitrary workloads” like GPUs do.
  4. Numeric precision can be aggressive. int4 / int8 is already inference norm; FP4 / FP6 is hot research. GPUs still have to retain FP32 / FP64 circuits for the scientific computing market — entirely redundant for LLMs.
  5. Extreme: bake weights into silicon. Taalas’s approach pushes “architectural convergence” to the logical extreme — etching the 32 layers of Llama 3.1 8B weights as physical transistors directly on the chip. The chip is the model itself, can’t be changed, but perf/watt is claimed to be 1000× better than a GPU cluster. See the “Training-Inference Split” section.

The Moat Wasn’t Breached — It Was Bypassed

CUDA’s thickness advantage was premised on “needing to support hundreds of operators” — once models converge, that premise disappears.

Analogy: in the past Office’s moat was thousands of feature components; today people use Notion to write docs, and fewer than 50 of those thousands of features are touched — the moat wasn’t breached, it was bypassed.

This is why so many AI chip startups suddenly appeared in 2024–2026: it isn’t that capital suddenly grew, it’s that the design target finally stabilized, fundamentally raising the win rate of the bet. Layered on top of that, with AI coding tools like Claude Code / Codex becoming pervasive, the cost structure of software migration is being rewritten — CUDA’s moat is being doubly eroded.

Training-Inference Split — two markets, two logics

Google TPU v4 card · central ASIC + 4 HBM stacks · liquid-cooled package
Google TPU v4 — central ASIC + 4 HBM stacks, liquid-cooled. Measured to deliver 4.7× price/performance + 67% lower energy on inference workloads. Anthropic / Meta / Midjourney have already moved part of their inference workloads to TPU — this is not “possible future,” this has already happened.
Image: Wikimedia Commons / Google · CC BY 4.0.

Mixing training and inference together leads to severe misjudgment of NVIDIA’s true position. Compute allocation is flipping dramatically: in 2023 training:inference ≈ 2:1, in 2025 it’s 1:1, in 2026 it will be 1:2, and by 2029 close to 1:4. Lenovo executives at CES 2026 said directly “future is 20% training, 80% inference”; Jensen himself acknowledges the inference market will eventually be “about a billion times” larger than training. Over a single AI system’s lifecycle, inference accounts for 80–90% of total cost, training only 10–20%.

This means: NVIDIA’s true money-printing market (training) is becoming the relatively smaller one; the truly massive market (inference) is precisely where its moat is thinnest.

Training vs Inference · Scissors of Compute Share

2023–2029 estimates · % of total AI compute

2025 is the watershed — inference exceeds training for the first time. By 2029, training is only ~20%, inference ~80%. NVIDIA’s dominance in training is unassailable in the short term, but every inference workload characteristic works against it.

Training Market: Fortress, NVIDIA Unassailable Short Term

  1. Compute extremely concentrated. Tens of thousands of cards collaborating in a single training run, with extreme demands on NVLink / InfiniBand; non-GPU paths lack this complete ecosystem layer.
  2. Customers extremely concentrated. Fewer than 10 players globally do frontier pre-training — OpenAI, Anthropic, xAI, Google, Meta, Microsoft, ByteDance, Mistral, DeepSeek, Cohere — short decision paths, each speaks to Jensen directly.
  3. Error cost extremely high. One training run costs hundreds of millions of dollars and takes half a year. No one will gamble on immature hardware — “nobody got fired for buying NVIDIA” is the 2026 version of “nobody got fired for buying IBM.”
  4. Heaviest software stack. CUDA + NCCL + cuDNN + Megatron + DeepSpeed + various fused kernels — full-stack migration cost is enormous. Anthropic took over a year to migrate training workloads to TPU.
  5. Model architecture still evolves on the training side. MoE, long context, test-time scaling, sparse activation, mixed-precision training — training still needs hardware flexibility, which is the last comfort zone for GPU generality.
  6. Conclusion. In the training market, NVIDIA likely retains 80%+ share through 2028. Google TPU is the only competitor that has demonstrated training capability, but it’s only opened up to a handful of customers (Anthropic + Apple AFM training), not market competition.

Inference Market: Soft Spot, Every Characteristic Works Against NVIDIA

  1. Can be deployed in a distributed way. Single card or small clusters suffice; NVLink advantage is completely irrelevant. Cerebras / Groq / various in-house ASICs all hold their ground here.
  2. Compute pattern is highly regular. The same attention + FFN invoked millions of times — perfect for an ASIC. GPU generality is pure waste here.
  3. Low precision suffices. int8 / int4 will run; FP4 is experimentally usable. The FP32 / FP64 circuits retained on GPUs are silicon wasted for nothing.
  4. Memory wall matters more than compute wall. This is the natural advantage of alternative architectures (wafer-scale Cerebras, in-memory computing D-Matrix, near-memory Etched / MatX) — they are all designed for “memory bandwidth equals performance”; GPUs are not.
  5. Customers are fragmented and cost-sensitive. They look at tokens / dollar and tokens / watt, not peak compute. Google TPU delivers 4.7× price-performance and 67% lower energy on inference — Anthropic, Meta, and Midjourney have already migrated parts of their inference workloads.
  6. Model has converged. Hardware can fully optimize for the now-fixed Transformer (see “Architectural Convergence” section).

“The perception that GPUs were the only answer was NVIDIA’s biggest moat. Now that perception is collapsing.”

— Andrew Feldman, Cerebras founder · 2026

Extreme Path · Taalas: Model Vendors Could Directly Become Hardware Players

  1. Technical principle · weights = physical transistors. The 32 layers of Llama 3.1 are etched as physical transistors on the chip. No HBM, no loading weights from memory. An input vector enters, the electrical signal flows through the transistors corresponding to each layer, then to the next, until a token is produced. The whole chip = the model itself, unchangeable.
  2. Performance · 17,000 tok/s · 1000× perf/watt vs GPU cluster. 17,000 tokens/sec on Llama 3.1 8B; a single 250W air-cooled card matches an entire GPU cluster. Taalas uses an automated design flow, compressing the weights-to-silicon cycle to 2 months, only swapping the top metal mask — the architecture has been stable for 7–8 years without major change, substantially lowering depreciation risk.
  3. If it works · model vendors → hardware players. OpenAI sells GPT-Edition inference cards, Anthropic sells Claude-Edition inference boxes, bypassing both cloud vendors and NVIDIA. Enterprise deployment shifts from “buy GPU + download weights + configure CUDA” to “buy a box, plug it in.” Privacy and IP are solved simultaneously — weights physically reside in the customer’s data center, and extracting them is equivalent to reverse-engineering a chip. OpenAI’s $20B commitment to Cerebras procurement + cooperation with Broadcom on in-house inference chips is an early signal of exactly this logic.

Challenger Landscape (2026)

Funding / customers / approach basis · data sources Fortune, CNBC, Digitimes, TrendForce (2026 Q1).

CategoryPlayerApproach / AdvantageCustomers / StatusKey numbers
Hyperscaler in-houseGoogle · TPUASIC · train + inference dual stackInternal + Anthropic + Apple AFM trainingv4/v5p · inference 4.7× price-performance
Hyperscaler in-houseAmazon · Trainium / InferentiaASIC · inference focusAnthropic · AWS internalTrainium2 ramping
Hyperscaler in-houseMicrosoft · MaiaASIC · in-house cloud inferenceAzure internal + OpenAI partialMass production since 2024
Hyperscaler in-houseMeta · MTIAASIC · recs + LLM inferenceMeta internalMTIA v2 deployed
In-house vs GPU scissorsTrendForce 2026 shipment growthASIC +44.6% vs GPU +16.1%Structural · not cyclical2027 ASIC = 15%+ of data center
Independent vendorsCerebras · WSEWafer-scale · in-packageOpenAI reportedly procured $20B · IPO in progressMid-May IPO valuation $30B+
Independent vendorGroq · LPULow-latency token-batchAlready acquired by NVIDIA for $20B IP / team (2025-12)The acquisition itself proves the threat
Independent vendorD-Matrixin-memory computeMicrosoft-backed2026 $5B-class funding
Independent vendorSambaNovareconfigurable dataflowIntel reportedly signed acquisition term sheetAcquisition in progress
Independent vendorEtched / MatX / Ayar LabsNear-memory / optical interconnect2026 each took $5B-class fundingSilicon photonics trend
Independent vendorFuriosa / Positron / TaalasKorean + US newcomers · model-in-siliconTaalas: Llama 3.1 8B @ 17K tok/s1000× perf/watt
Geopolitical · ChinaHuawei Ascend · Alibaba PPU · Cambricon · BirenExport controls force maturationChina AI-chip market ~$50B/yr (Jensen estimate)2025 domestic substitution acceleration
Geopolitical · EuropeFractile · Axelera · Euclyd · OptalysysPhotonics / analog computeEarly stageLonger horizon
Startup total funding2026 global AI-chip startupsCapital has bet “NVIDIA can’t win forever”2026 cumulative funding$8.3B

Data sources: Fortune (2026-01), CNBC (2026-04-17), Digitimes, TrendForce 2026 Q1. Cerebras mid-May IPO timing is publicly disclosed; OpenAI-Cerebras $20B procurement intent is via media reports. NVIDIA itself has invested $4B as a hedge (photonic computing, analog / physical computing).

Political Risk — independent variable

US Capitol · west facade · the final decision venue for NVIDIA political risk
US Capitol — the final arena for NVIDIA’s relationship with the China market. China has historically accounted for 20–25% of NVIDIA data-center revenue; Jensen self-estimates China’s AI-chip market at $50B/yr — meat he is unwilling to give up, but with geopolitics layered on top of tech competition, he is highly likely to lose the both-sides position entirely.
Image: Wikimedia Commons / Public Domain.

NVIDIA’s political risk is widely underestimated. Jensen, on Dwarkesh Patel’s podcast on 2026-04-15, repeatedly made “pledge-of-loyalty”-style statements and got into a dispute with the host: when asked about the national-security risks of selling advanced chips to China, he lost his composure and replied “You’re not talking to somebody who woke up a loser.” Critics argue he repeatedly dodged direct answers on national-security questions, only emphasizing “selling more US technology is a good thing,” and was seen as overly utilitarian. Earlier on the Bg2 podcast, he called the “China hawk” label “a badge of shame,” prompting commentary that he was “auditioning for a Global Times editorial.”

Fundamental Differences vs Apple · Tesla

  1. Apple · supply-chain exposure · contributing to China. Apple’s China exposure is manufacturing supply chain + partial sales — India / Vietnam are diversifying. Money flow: China manufacturing → Apple procurement → supply-chain workers benefit. The narrative can be framed as “contributing to China.” Apple comes with built-in universal values, with a friendly posture in Congressional questioning.
  2. Tesla · market + factory · contributing industrial capability. The Tesla Shanghai factory is a template for the Chinese EV supply chain; Cybertruck / Model Y are sold to Chinese consumers. Musk comes with built-in MAGA credentials — at most subject to internal Democratic / Republican controversy, but no external “pro-CCP” treason-class risk.
  3. NVIDIA · selling to China · strategic materials · suspicion of treason. NVIDIA is “selling to China,” and what’s sold is officially classified as “militarily critical technology” — this is a qualitative difference, not a quantitative one. Layered on Jensen’s ethnic Chinese identity + repeatedly questioned statements as “overly utilitarian,” US grassroots have natural distrust toward a pro-communist ethnic-Chinese tech giant, and Congressional members will inevitably cater to this voter market.

Decoupling Script · Passive Rather Than Active

A signature event of “publicly endorsing Taiwan independence + denouncing the CCP” has low probability; the more likely path is passive ejection.

Jensen’s instinct is merchant-style evasion — after saying Taiwan is a “country” in 2024, he proactively clarified the next day with “this is not a geopolitical statement,” smoothing things over — rather than political stance-taking. So the more likely decoupling script is:

So NVIDIA will inevitably be politically handled before Apple / Tesla. This is a second acceleration curve independent of the challenger structure.

Timing and Signals — when & how to watch

NVIDIA Endeavor HQ · Santa Clara · triangular roof · aerial view
NVIDIA Endeavor HQ · Santa Clara · triangular-roof building — Blackwell entirely sold out through mid-2026, Q4 data center +75% YoY, this is the brightest moment. But between “decline almost certain” and “starting next year” there’s still a long road; the real inflection signal is not market share, but gross margin.
Image: Wikimedia Commons / 2017 · CC BY-SA 4.0.

This is the easiest place in the entire analysis to misjudge. NVIDIA’s decline is almost certain, but between “almost certain” and “starting next year” there is still a long road. Supply-chain lock-in (HBM, CoWoS, ODM full-rack capacity pre-booked years out) + training-side software inertia + large customers unwilling to switch all at once — these three buffer layers are very thick. The hard part for shorts isn’t direction, it’s timing.

The pie is growing: four major cloud vendors combined 2026 capex approaches $600B; Goldman Sachs estimates 2025–2027 total capex of $1.15T. Even with all challengers in position, NVIDIA AI data-center share in 2028 likely remains above 60% — share down but absolute revenue could still grow.

Three-Phase Rhythm

  1. Phase 1 · 2026–2027 · Apparently still winning. Total revenue still grows, gross margin holds at 70–75%. Hyperscaler in-house ASICs capture 10–15% share, but training market expansion offsets inference decline. On the surface NVIDIA appears unscathed — this is the phase most easily fooled by appearances.
  2. Phase 2 · 2027–2028 · The real inflection window. Inference market absolute size exceeds training; the dollar amount of inference outflow starts to suppress training growth, overall share visibly declines, gross margin slides toward 65–70%. This is the real inflection window — non-GAAP gross margin breaking 70% is a structural signal.
  3. Phase 3 · post-2028 · Fortress shaken. Google TPU, the AMD MI series, and possibly OpenAI’s in-house silicon make substantive breakthroughs on the training side; if the “model-in-silicon” path succeeds, the entire inference hardware market is reshuffled, and gross margin reverts toward below 60%. GPUs go from “allocation goods” back to “commodities.”

Non-GAAP Gross Margin · Three-Phase Trajectory

2024–2030 estimates · historical + three-phase neutral expectation

Reference: in 2022 the gaming-GPU cycle saw gross margin fall from 64% to 56% (over one year) — a reference for the speed of cyclical supply-demand reversal. The structural inflection will be slower but harder to recover from. The real alarm is non-GAAP gross margin breaking 70% and failing to recover.

Primary Alarms · Tracking Three Non-GAAP Gross Margin Thresholds

  1. Tier-1 alarm · gross margin < 70% · challengers have meaningful pricing leverage. This is a structural signal, not cyclical fluctuation. It means major customers are negotiating with alternatives in hand; the 35% premium model of Blackwell Ultra is hard to sustain.
  2. Tier-2 alarm · gross margin < 65% · entering pricing-power-loss spiral. Even scale effects can’t rescue. 2–3 consecutive Q-on-Q declines confirm.
  3. Tier-3 alarm · B-series / Rubin price cuts for inventory clearance · GPUs back to commodities. From “allocation goods” back to “commodities.” Once price cuts for inventory clearance appear, it means the structural supply-demand reversal is complete.

Secondary Signals

  1. Hyperscaler in-house ASIC share. Whether “in-house ASIC share” in capex breaks 25%. Currently Google + Amazon combined ~15%; watch 2027 H2 data.
  2. Inference product-line pricing pressure. Pricing pressure on inference-optimized SKUs like B200 NVL will loosen earlier than training flagships.
  3. Model vendors’ in-house / procured non-GPU inference hardware actual shipments. Deals like OpenAI-Cerebras $20B should be judged by actual delivery, not announcement; the tape-out pace of OpenAI’s in-house silicon (with Broadcom) is a key node.
  4. Cerebras post-IPO valuation performance. Market pricing of alternative paths. If post-IPO valuation holds at $30B+, the public market endorses the challenger narrative; if it breaks below $15B, the market still believes the NVIDIA monopoly.
  5. China market actual revenue magnitude. Lagging indicator of geopolitical progress. If export controls tighten further and revenue share falls from 25% to below 5% — it’s the marker of the political script materializing.
  6. Sovereign AI-class legislation. Any legislation at the US Congress level that requires “pre-approval for sensitive compute shipments” is a signal that the political script is accelerating.

Synthesis — the inflection signal

Cerebras Systems · Sunnyvale Arques Avenue · HQ of a challenger
Cerebras Systems HQ · Sunnyvale · 1237 E. Arques Avenue — challengers don’t need to immediately defeat NVIDIA; they only need to give major customers a second option. OpenAI has signed a $20B procurement intent, mid-May IPO valuation $30B+. This is the textbook sample of “major customer cultivating a second supplier.”
Image: Wikimedia Commons / CC BY-SA 4.0.

The full chain of reasoning closes.

Six Structural Observations

  1. Architectural convergence frees hardware design degrees. Transformer is the new x86; CUDA wasn’t breached, it was bypassed.
  2. Challenger landscape now formed. Hyperscaler ASIC + tier-1 startups + China domestic substitution + model vendors going to silicon; 2026 global startups have raised $8.3B.
  3. Training fortress holds but inference soft spot exposed. 2025 is the watershed, 2029 train:inference at 1:4. Every inference workload characteristic works against NVIDIA.
  4. Model vendors may step into the hardware layer. OpenAI investing $20B in Cerebras + Broadcom in-house inference chip + Taalas model-in-silicon — the entire AI value chain may be re-sliced.
  5. Political risk accelerates this process. NVDA’s China exposure differs fundamentally from AAPL / TSLA’s — “selling strategic materials” vs “contributing to supply chain”; Jensen’s ethnic Chinese background + repeatedly questioned statements as “overly utilitarian” mean in the US Congressional voter market, logically he must be handled before Apple / Tesla.
  6. Time window longer than intuition suggests. HBM / CoWoS / ODM lock-in + training software inertia + large customers’ reluctance to switch wholesale — these three buffer layers are thick. The hard part for shorts isn’t direction, it’s timing.

Bottom Line

Dominance remains, and remains strong. But beyond the moat, challengers have moved from “catching up” to “bypassing.”

What really needs to be watched isn’t “who’s catching up” but “whether NVIDIA’s own inflection signal has appeared” — non-GAAP gross margin is that inflection signal.

Tracking cadence:

Gross margin staying above 70%, the monopoly structure stays stable; once it breaks 70% and cannot recover, that’s the moment the inflection signal is triggered.

Apple’s real advantage in the AI era isn’t “I can also build models,” it’s “I don’t need to build models” (see /zh/signals/apple-ai). NVIDIA’s real danger isn’t “someone is chasing” — it’s “can’t slow down” — the rationality of B2B customers will reverse-accelerate the whole process. These two are two sides of the same structural observation.

— Contrarian research · 2026-04-26

References — official · papers · industry research · media

Official and First-hand

Papers and Challengers

Industry Data / Research Institutions

Media and Podcasts