Research

I want to feel this world the way a child does, and share my curiosity.

112 tags

2026-06-10

What Happens When the U.S. Sanctions You?

Three parts: the nine-tier intensity ladder of U.S. sanctions — what each tier does to you, with cases; three archetypal sanctioned countries (Iran, North Korea, Russia) and how ordinary people, officials and merchants fare in each; and 11 China-related cases — ZTE, Huawei, Jinhua, Bank of Kunlun, Carrie Lam, the three telecoms, Shandong teapot refineries, Tencent/CATL, and Shanghai Heiying's Zhou Shuai.
- US Sanctions
- OFAC
- SDN List
- Entity List
2026-06-08

The Compute Metrics Landscape: From FLOPs to MFU, Every Number Explained Through Three Generations of Flagship GPUs

Lowercase FLOPs is a count, uppercase FLOPS is a rate; 1 MAC ≈ 2 FLOPs; the vast majority of LLM workloads are bound by bandwidth, not compute. This note works through five categories — compute volume, compute throughput, memory access, efficiency, and deployment — giving several real worked examples per metric, all cross-referenced against the real specs of A100→H100→B200→Rubin, with hand-drawn Rooflines for three GPU generations.
- GPU
- Roofline
- FLOPS
- MFU
2026-06-03

Apple AI Inference Architecture Research

With WWDC 2026 approaching, this note maps Apple's two local AI inference paths (Neural Engine vs. GPU + Neural Accelerator), traces compute and bandwidth evolution across recent silicon generations, and benchmarks them against Nvidia across four dimensions — compute, memory capacity, bandwidth, and precision. The verdict: Apple's edge is "local, power-efficient, and capable of running large models (especially MoEs)" — not peak throughput on dense large models.
- Apple
- Apple Silicon
- Local Inference
- Neural Engine
2026-06-01

Data Center Power: Supply Side and Demand Side

AI has overnight made electricity the most expensive bottleneck in data centers. On one side is "where the power comes from" — from utility grids to behind-the-meter fuel cells, gas turbines, and nuclear; on the other is "where the power goes" — the breakdown across GPUs, other server components, cooling, and networking. This note is my structured survey of the supply and demand sides of this space.
- data center
- power
- fuel cell
- gas turbine
2026-06-01

23 Flagship AI Data Centers — An Overview

Epoch AI's Frontier Data Centers database cross-references satellite imagery and public permit filings to track the world's largest AI data centers. All 13 operational sites plus 10 under construction or planned are laid out here — owner, tenant, capacity, power source, capital structure, and build pace. A snapshot of who is moving mountains.
- data centers
- AI infrastructure
- Epoch AI
- hyperscale
2026-05-21

A Tour of NVIDIA's GPU Programming Stack — From PTX to CuTe DSL

PTX, CUDA C++, CUTLASS, CuTe, Triton, CuTe DSL — every entry point in NVIDIA's GPU programming ecosystem converges on the same exit (PTX → SASS). Arranged as a continuous ladder from "convenient" to "extreme" — default to stock libraries, Triton in Python, CUTLASS in C++, PTX as a patch — the whole landscape collapses into one mental model.
- GPU
- CUDA
- Triton
- CUTLASS
2026-05-21

Storage in CPUs and GPUs: Types, Fabrication, and Design Rationale

A complete tour of the memory hierarchy from register file to hard disk — covering the circuit principles, manufacturing processes, and hardware homes of SRAM / DRAM / HBM / Flash / HDD, and the single physical cause-and-effect chain behind "why some are fast, some expensive, some volatile, and some high-capacity."
- storage
- memory
- SRAM
- DRAM
2026-05-18

Low-Precision Data Formats in Large Language Models

One article that lays out every floating-point and integer format used in LLM training and inference at the bit level — definitions, dynamic ranges, where they live in a Transformer, hardware support from V100 to Rubin, and the precision choices behind Llama 4 and DeepSeek-V3.
- LLM
- data formats
- quantization
- FP8
2026-05-16

Cracks in NVIDIA's Moat — A Bear-Case Memo on NVDA

Four structural reasons to be short NVDA — a moat with a clearly defined target, a CUDA stack being eroded by AI coding, a specialization path that forces NVIDIA to carry compatibility debt while challengers travel light, and a position at the center of a geopolitical vortex. Layered with one timing constraint (HBM / CoWoS / rack-level ODM capacity locked through 2027 gives a 12–24-month buffer) and an eventual market-cap inversion mean-reversion — Apple and Google's ecosystem ceiling is structurally above one chip.
- NVIDIA
- AI
- GPU
- Semiconductors
2026-05-12

The AI Inference Chip Spectrum — Seven Gradients from General GPU to Model-Etched Silicon

From NVIDIA GPUs to Taalas model-etched silicon, the 2026 AI inference chip landscape forms a seven-gradient spectrum from general to specialized. Each step right brings a 3-10× speedup at the cost of flexibility. The photonic compute path is blocked by the diffraction limit; the interconnect path has scaled.
- AI Inference
- Chip Architecture
- NVIDIA
- TPU
2026-05-11

CPU vs GPU Distributed Computing — Two Engineering Implementations of the Same BSP Theory

MapReduce's Reduce and NCCL's AllReduce both descend from MPI; Spark stage and PyTorch DDP step both descend from BSP. Same vocabulary, divergent engineering constraints — let's lay out fault tolerance, communication granularity, sync frequency, and programming model side by side.
- distributed
- GPU
- NCCL
- MPI
2026-05-11

A Decade of GPU Architecture Evolution and the Parallel Expansion of the CUDA Programming Model

From Pascal to Rubin, compute grew 2380× over a decade while CUDA core throughput grew only 10×; hardware complexity was absorbed by Tensor Cores, the programming model expanded from Thread to five layers, while torch.matmul did not change by a single line.
- GPU
- CUDA
- NVIDIA
- Tensor Core
2026-05-11

A Layer Map of the Modern Data Engineering Ecosystem

Organized by source / ingestion / processing / storage / query / serve into six layers, this note unpacks 16 mainstream open-source and commercial projects in the current data stack — their history, design, users, and typical usage.
- Data Engineering
- Data Stack
- Kafka
- dbt
2026-05-09

CoreWeave and Nebius — Two Divergent Paths for GPU Clouds

Side-by-side view of two listed Neocloud companies — CoreWeave, born of crypto mining, with Q1 2026 revenue of $2.1B and backlog of $99.4B; Nebius, restructured from Yandex, with ARR of $1.2B and a $17.4B Microsoft deal. Founding stories, teams, business, capital structure, and the essential differences from Hyperscalers.
- GPU
- Cloud
- CoreWeave
- Nebius
2026-04-25

Apple's Gatekeeper Distribution Power

Apple uses gatekeeper distribution power to run multi-vendor AI sourcing, netting +$18B/yr inbound; Mac becomes the household compute hub, and smart glasses 2027 stake out the next entry point.
- Apple
- AI
- Google
- LLM
2026-04-22

The Eight Schools of Quant Trading

A field guide to the eight schools of quant — HFT, StatArb, CTA, macro, factor, ML, event-driven, crypto — 11 chapters and 40+ leading firms cross-referenced.
- Quant
- Hedge Funds
- Investing
- Crypto
2026-02-27

China's Industrial Hardware Goes Domestic

From DJI to Bambu to Hypershell — the systematic Chinese playbook of compressing industrial-grade equipment by 2-3 orders of magnitude, moving military and lab hardware into the living room.
- DJI
- Consumer Electronics
- China
- Robotics
2025-10-15

LLM Inference Walkthrough — Tensor Shapes and Core Formulas Across the Whole Pipeline

Using Llama 3 8B as the reference, walk the entire inference path from token ID → embedding → Transformer → sampling, writing out every shape transition and every key formula.
- LLM
- Transformer
- inference
- KV Cache

What Happens When the U.S. Sanctions You?

The Compute Metrics Landscape: From FLOPs to MFU, Every Number Explained Through Three Generations of Flagship GPUs

Apple AI Inference Architecture Research

Data Center Power: Supply Side and Demand Side

23 Flagship AI Data Centers — An Overview

A Tour of NVIDIA's GPU Programming Stack — From PTX to CuTe DSL

Storage in CPUs and GPUs: Types, Fabrication, and Design Rationale

Low-Precision Data Formats in Large Language Models

Cracks in NVIDIA's Moat — A Bear-Case Memo on NVDA

The AI Inference Chip Spectrum — Seven Gradients from General GPU to Model-Etched Silicon

CPU vs GPU Distributed Computing — Two Engineering Implementations of the Same BSP Theory

A Decade of GPU Architecture Evolution and the Parallel Expansion of the CUDA Programming Model

A Layer Map of the Modern Data Engineering Ecosystem

CoreWeave and Nebius — Two Divergent Paths for GPU Clouds

Apple's Gatekeeper Distribution Power

The Eight Schools of Quant Trading

China's Industrial Hardware Goes Domestic

LLM Inference Walkthrough — Tensor Shapes and Core Formulas Across the Whole Pipeline