I want to feel this world the way a child does, and share my curiosity.
Three parts: the nine-tier intensity ladder of U.S. sanctions — what each tier does to you, with cases; three archetypal sanctioned countries (Iran, North Korea, Russia) and how ordinary people, officials and merchants fare in each; and 11 China-related cases — ZTE, Huawei, Jinhua, Bank of Kunlun, Carrie Lam, the three telecoms, Shandong teapot refineries, Tencent/CATL, and Shanghai Heiying's Zhou Shuai.
Lowercase FLOPs is a count, uppercase FLOPS is a rate; 1 MAC ≈ 2 FLOPs; the vast majority of LLM workloads are bound by bandwidth, not compute. This note works through five categories — compute volume, compute throughput, memory access, efficiency, and deployment — giving several real worked examples per metric, all cross-referenced against the real specs of A100→H100→B200→Rubin, with hand-drawn Rooflines for three GPU generations.
With WWDC 2026 approaching, this note maps Apple's two local AI inference paths (Neural Engine vs. GPU + Neural Accelerator), traces compute and bandwidth evolution across recent silicon generations, and benchmarks them against Nvidia across four dimensions — compute, memory capacity, bandwidth, and precision. The verdict: Apple's edge is "local, power-efficient, and capable of running large models (especially MoEs)" — not peak throughput on dense large models.
AI has overnight made electricity the most expensive bottleneck in data centers. On one side is "where the power comes from" — from utility grids to behind-the-meter fuel cells, gas turbines, and nuclear; on the other is "where the power goes" — the breakdown across GPUs, other server components, cooling, and networking. This note is my structured survey of the supply and demand sides of this space.
Epoch AI's Frontier Data Centers database cross-references satellite imagery and public permit filings to track the world's largest AI data centers. All 13 operational sites plus 10 under construction or planned are laid out here — owner, tenant, capacity, power source, capital structure, and build pace. A snapshot of who is moving mountains.
PTX, CUDA C++, CUTLASS, CuTe, Triton, CuTe DSL — every entry point in NVIDIA's GPU programming ecosystem converges on the same exit (PTX → SASS). Arranged as a continuous ladder from "convenient" to "extreme" — default to stock libraries, Triton in Python, CUTLASS in C++, PTX as a patch — the whole landscape collapses into one mental model.
A complete tour of the memory hierarchy from register file to hard disk — covering the circuit principles, manufacturing processes, and hardware homes of SRAM / DRAM / HBM / Flash / HDD, and the single physical cause-and-effect chain behind "why some are fast, some expensive, some volatile, and some high-capacity."
One article that lays out every floating-point and integer format used in LLM training and inference at the bit level — definitions, dynamic ranges, where they live in a Transformer, hardware support from V100 to Rubin, and the precision choices behind Llama 4 and DeepSeek-V3.
Four structural reasons to be short NVDA — a moat with a clearly defined target, a CUDA stack being eroded by AI coding, a specialization path that forces NVIDIA to carry compatibility debt while challengers travel light, and a position at the center of a geopolitical vortex. Layered with one timing constraint (HBM / CoWoS / rack-level ODM capacity locked through 2027 gives a 12–24-month buffer) and an eventual market-cap inversion mean-reversion — Apple and Google's ecosystem ceiling is structurally above one chip.
From NVIDIA GPUs to Taalas model-etched silicon, the 2026 AI inference chip landscape forms a seven-gradient spectrum from general to specialized. Each step right brings a 3-10× speedup at the cost of flexibility. The photonic compute path is blocked by the diffraction limit; the interconnect path has scaled.
MapReduce's Reduce and NCCL's AllReduce both descend from MPI; Spark stage and PyTorch DDP step both descend from BSP. Same vocabulary, divergent engineering constraints — let's lay out fault tolerance, communication granularity, sync frequency, and programming model side by side.
From Pascal to Rubin, compute grew 2380× over a decade while CUDA core throughput grew only 10×; hardware complexity was absorbed by Tensor Cores, the programming model expanded from Thread to five layers, while torch.matmul did not change by a single line.
Organized by source / ingestion / processing / storage / query / serve into six layers, this note unpacks 16 mainstream open-source and commercial projects in the current data stack — their history, design, users, and typical usage.
Side-by-side view of two listed Neocloud companies — CoreWeave, born of crypto mining, with Q1 2026 revenue of $2.1B and backlog of $99.4B; Nebius, restructured from Yandex, with ARR of $1.2B and a $17.4B Microsoft deal. Founding stories, teams, business, capital structure, and the essential differences from Hyperscalers.
Apple uses gatekeeper distribution power to run multi-vendor AI sourcing, netting +$18B/yr inbound; Mac becomes the household compute hub, and smart glasses 2027 stake out the next entry point.
A field guide to the eight schools of quant — HFT, StatArb, CTA, macro, factor, ML, event-driven, crypto — 11 chapters and 40+ leading firms cross-referenced.
From DJI to Bambu to Hypershell — the systematic Chinese playbook of compressing industrial-grade equipment by 2-3 orders of magnitude, moving military and lab hardware into the living room.
Using Llama 3 8B as the reference, walk the entire inference path from token ID → embedding → Transformer → sampling, writing out every shape transition and every key formula.
No matching articles