Mixture of Insights

Mixture of InsightsNotes on LLM post-training, RL, agents, and the systems underneath, by Wang Tong.https://mixtureofinsights.com/Auditing from the app's eyeshttps://mixtureofinsights.com/blog/04-auditing-from-the-apps-eyes/https://mixtureofinsights.com/blog/04-auditing-from-the-apps-eyes/You can't tell what a normal app sees from adb shell — shell has privileges an app never does. Three lenses for looking through the app's eyes, and the blind spot of each.Wed, 10 Jun 2026 00:00:00 GMTWhat you can and can't hidehttps://mixtureofinsights.com/blog/05-what-you-can-and-cant-hide/https://mixtureofinsights.com/blog/05-what-you-can-and-cant-hide/The full map of how a non-privileged app detects a rooted custom ROM, what closes each channel, and the two walls that nothing in userspace will move.Wed, 10 Jun 2026 00:00:00 GMTDPO when I can't afford RLHFhttps://mixtureofinsights.com/blog/dpo-when-you-cant-afford-rlhf/https://mixtureofinsights.com/blog/dpo-when-you-cant-afford-rlhf/RLHF is powerful and heavy — a reward model, an online rollout loop, instability. DPO gets most of the way with a fraction of the machinery. The derivation, the gradient that explains why I use it, and the catch — which is always the data.Wed, 10 Jun 2026 00:00:00 GMTCold-start, then climbhttps://mixtureofinsights.com/blog/cold-start-then-climb/https://mixtureofinsights.com/blog/cold-start-then-climb/Pure RL from a base model on a hard task mostly produces high-variance garbage — and the policy-gradient math says exactly why. The fix I use is a two-stage recipe: a small SFT cold-start to give the policy a shape, then GRPO to climb. The recipe, the math, and the failure modes that actually bite.Wed, 10 Jun 2026 00:00:00 GMTHow Qwen3-TTS makes a frame of soundhttps://mixtureofinsights.com/blog/how-qwen3-tts-makes-a-frame/https://mixtureofinsights.com/blog/how-qwen3-tts-makes-a-frame/A TTS model isn't one graph — it's a small pipeline of graphs with wildly different compute shapes. The key design move in porting Qwen3-TTS to OpenVINO is cutting it at the seams: a talker graph for long-context attention, a cached subcode graph for the rest of each multi-codebook frame, and a chunked streaming decoder.Wed, 10 Jun 2026 00:00:00 GMTThe logcat leakhttps://mixtureofinsights.com/blog/03-the-logcat-leak/https://mixtureofinsights.com/blog/03-the-logcat-leak/You hid the packages and the features. Then you notice fifteen apps quietly holding READ_LOGS — reading the whole device log, where every stray Magisk and lineage string is sitting in plain text.Wed, 10 Jun 2026 00:00:00 GMTA control plane for renting GPUshttps://mixtureofinsights.com/blog/a-control-plane-for-renting-gpus/https://mixtureofinsights.com/blog/a-control-plane-for-renting-gpus/The orchestration mess around ephemeral, rented GPUs is the actual bottleneck of model iteration. Here is my bet with ORBIT: treat a run as a reproducible artifact, splitting control from execution.Wed, 10 Jun 2026 00:00:00 GMTA task-agnostic core, and plugins that earn their keephttps://mixtureofinsights.com/blog/orbit-a-task-agnostic-core/https://mixtureofinsights.com/blog/orbit-a-task-agnostic-core/I designed ORBIT's execution core to be completely oblivious to the tasks it runs. By pushing task-specific logic up into plugins, I prevented new tasks from mutating and breaking the executor.Wed, 10 Jun 2026 00:00:00 GMTPaged-KV, U8, and batching where vLLM isn'thttps://mixtureofinsights.com/blog/paged-kv-batching-without-vllm/https://mixtureofinsights.com/blog/paged-kv-batching-without-vllm/You have the model graphs. Now serve them — long-context, concurrent, inside an iGPU's memory budget, with none of vLLM's machinery. Four decisions that compose: paged-KV over fixed buckets, a U8 cache, full-context generation, and online batching that lives in the scheduler so one IR set serves everyone.Wed, 10 Jun 2026 00:00:00 GMTThe bundle is the contracthttps://mixtureofinsights.com/blog/orbit-the-bundle-is-the-contract/https://mixtureofinsights.com/blog/orbit-the-bundle-is-the-contract/When a rented machine evaporates, the only evidence left is what you collected. I enforced a strict directory contract for bundles to ensure exact dependency provenance and runtime observability.Wed, 10 Jun 2026 00:00:00 GMTWhen the GPU isn't an NVIDIAhttps://mixtureofinsights.com/blog/when-the-gpu-isnt-an-nvidia/https://mixtureofinsights.com/blog/when-the-gpu-isnt-an-nvidia/The whole LLM stack assumes CUDA. The GPU in front of you is often an Intel iGPU or a CPU. Getting a real, low-latency autoregressive TTS to stream there means rebuilding the parts you usually pip-install — the decode loop, the KV cache, the batching scheduler — on OpenVINO.Wed, 10 Jun 2026 00:00:00 GMTWhat am I actually rewarding?https://mixtureofinsights.com/blog/what-are-you-rewarding/https://mixtureofinsights.com/blog/what-are-you-rewarding/RL doesn't optimize what I want — it optimizes exactly what I wrote down. The gap between the two is reward hacking, and closing it is most of the real work. Verifiers vs reward models, and how a constraint reward earned its +12%.Wed, 10 Jun 2026 00:00:00 GMTSelf-play, and the games my models teach themselveshttps://mixtureofinsights.com/blog/self-play-and-the-games-models-teach-themselves/https://mixtureofinsights.com/blog/self-play-and-the-games-models-teach-themselves/There's no dataset of good game-play. But in a game with a clear outcome, I manufacture one — I let a strong sampler play out games, filter by who won, and the transcripts become the strategy data. How the data engine, the verifier, and emergent strategy all meet, grounded in my GAME pipeline.Wed, 10 Jun 2026 00:00:00 GMTPost-training is a data problemhttps://mixtureofinsights.com/blog/post-training-is-a-data-problem/https://mixtureofinsights.com/blog/post-training-is-a-data-problem/PPO, GRPO, and DPO are commoditized. In my engineering iterations, the only variable that structurally improved alignment was the synthetic data engine.Wed, 10 Jun 2026 00:00:00 GMTThe Google Wallet Wallhttps://mixtureofinsights.com/blog/01-the-google-wallet-wall/https://mixtureofinsights.com/blog/01-the-google-wallet-wall/Play Integrity passes STRONG. Google Wallet still refuses to add a card. Here is why — proven, not guessed — and why it can't be forced onto an unlocked device.Tue, 09 Jun 2026 00:00:00 GMTStockMask: a stock illusion without touching a single apphttps://mixtureofinsights.com/blog/02-stockmask/https://mixtureofinsights.com/blog/02-stockmask/HideMyApplist hides package names. Apps still detected the custom ROM. The fix was a 200-line module that filters system_server responses by who's asking — never injecting into the app itself.Tue, 09 Jun 2026 00:00:00 GMTNeovim: yank to the system clipboard (OSC 52)https://mixtureofinsights.com/blog/nvim-yank-osc52/https://mixtureofinsights.com/blog/nvim-yank-osc52/How I make Neovim's yank reach the system clipboard over SSH / WSL — utilizing Neovim ≥ 0.10's native OSC 52 support.Sun, 16 Jun 2024 00:00:00 GMT