Mixture of Insights.

A long-running notebook about building and taking systems apart: models, tools, infrastructure, failures, and the judgment behind technical work.

Featured

Start Here

Jun 10, 2026 · 12 min read

Post-training is a data problem

PPO, GRPO, and DPO are commoditized. In my engineering iterations, the only variable that structurally improved alignment was the synthetic data engine.

Jun 10, 2026 · 11 min read

A control plane for renting GPUs

The orchestration mess around ephemeral, rented GPUs is the actual bottleneck of model iteration. Here is my bet with ORBIT: treat a run as a reproducible artifact, splitting control from execution.

Jun 10, 2026 · 14 min read

When the GPU isn't an NVIDIA

The whole LLM stack assumes CUDA. The GPU in front of you is often an Intel iGPU or a CPU. Getting a real, low-latency autoregressive TTS to stream there means rebuilding the parts you usually pip-install — the decode loop, the KV cache, the batching scheduler — on OpenVINO.

Series

Post-Training in Practice

From data engines to GRPO, reward hacking, DPO and self-play — the math for why each method works, and why the data usually outweighs the optimizer.

Series

ORBIT — orchestrating training on rented GPUs

Make a training run a reproducible artifact, not a shell session: a declarative control plane reconciled against a disposable execution plane.

Series

Shipping a TTS model on OpenVINO

Rebuilding the CUDA serving stack — paged-KV, a quantized cache, continuous batching — on an Intel iGPU, derived from the bandwidth math up.

Series

Hardening a rooted Android device against app detection

How a non-privileged app detects a rooted custom ROM, channel by channel — and the two walls (verified boot, hardware attestation) that userspace cannot move.