<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Mixture of Insights</title><description>Notes on LLM post-training, RL, agents, and the systems underneath, by Wang Tong.</description><link>https://mixtureofinsights.com/</link><item><title>Auditing from the app&apos;s eyes</title><link>https://mixtureofinsights.com/blog/04-auditing-from-the-apps-eyes/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/04-auditing-from-the-apps-eyes/</guid><description>You can&apos;t tell what a normal app sees from adb shell — shell has privileges an app never does. Three lenses for looking through the app&apos;s eyes, and the blind spot of each.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>What you can and can&apos;t hide</title><link>https://mixtureofinsights.com/blog/05-what-you-can-and-cant-hide/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/05-what-you-can-and-cant-hide/</guid><description>The full map of how a non-privileged app detects a rooted custom ROM, what closes each channel, and the two walls that nothing in userspace will move.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>DPO when I can&apos;t afford RLHF</title><link>https://mixtureofinsights.com/blog/dpo-when-you-cant-afford-rlhf/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/dpo-when-you-cant-afford-rlhf/</guid><description>RLHF is powerful and heavy — a reward model, an online rollout loop, instability. DPO gets most of the way with a fraction of the machinery. The derivation, the gradient that explains why I use it, and the catch — which is always the data.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Cold-start, then climb</title><link>https://mixtureofinsights.com/blog/cold-start-then-climb/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/cold-start-then-climb/</guid><description>Pure RL from a base model on a hard task mostly produces high-variance garbage — and the policy-gradient math says exactly why. The fix I use is a two-stage recipe: a small SFT cold-start to give the policy a shape, then GRPO to climb. The recipe, the math, and the failure modes that actually bite.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>How Qwen3-TTS makes a frame of sound</title><link>https://mixtureofinsights.com/blog/how-qwen3-tts-makes-a-frame/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/how-qwen3-tts-makes-a-frame/</guid><description>A TTS model isn&apos;t one graph — it&apos;s a small pipeline of graphs with wildly different compute shapes. The key design move in porting Qwen3-TTS to OpenVINO is cutting it at the seams: a talker graph for long-context attention, a cached subcode graph for the rest of each multi-codebook frame, and a chunked streaming decoder.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The logcat leak</title><link>https://mixtureofinsights.com/blog/03-the-logcat-leak/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/03-the-logcat-leak/</guid><description>You hid the packages and the features. Then you notice fifteen apps quietly holding READ_LOGS — reading the whole device log, where every stray Magisk and lineage string is sitting in plain text.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A control plane for renting GPUs</title><link>https://mixtureofinsights.com/blog/a-control-plane-for-renting-gpus/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/a-control-plane-for-renting-gpus/</guid><description>The orchestration mess around ephemeral, rented GPUs is the actual bottleneck of model iteration. Here is my bet with ORBIT: treat a run as a reproducible artifact, splitting control from execution.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A task-agnostic core, and plugins that earn their keep</title><link>https://mixtureofinsights.com/blog/orbit-a-task-agnostic-core/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/orbit-a-task-agnostic-core/</guid><description>I designed ORBIT&apos;s execution core to be completely oblivious to the tasks it runs. By pushing task-specific logic up into plugins, I prevented new tasks from mutating and breaking the executor.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Paged-KV, U8, and batching where vLLM isn&apos;t</title><link>https://mixtureofinsights.com/blog/paged-kv-batching-without-vllm/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/paged-kv-batching-without-vllm/</guid><description>You have the model graphs. Now serve them — long-context, concurrent, inside an iGPU&apos;s memory budget, with none of vLLM&apos;s machinery. Four decisions that compose: paged-KV over fixed buckets, a U8 cache, full-context generation, and online batching that lives in the scheduler so one IR set serves everyone.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The bundle is the contract</title><link>https://mixtureofinsights.com/blog/orbit-the-bundle-is-the-contract/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/orbit-the-bundle-is-the-contract/</guid><description>When a rented machine evaporates, the only evidence left is what you collected. I enforced a strict directory contract for bundles to ensure exact dependency provenance and runtime observability.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>When the GPU isn&apos;t an NVIDIA</title><link>https://mixtureofinsights.com/blog/when-the-gpu-isnt-an-nvidia/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/when-the-gpu-isnt-an-nvidia/</guid><description>The whole LLM stack assumes CUDA. The GPU in front of you is often an Intel iGPU or a CPU. Getting a real, low-latency autoregressive TTS to stream there means rebuilding the parts you usually pip-install — the decode loop, the KV cache, the batching scheduler — on OpenVINO.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>What am I actually rewarding?</title><link>https://mixtureofinsights.com/blog/what-are-you-rewarding/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/what-are-you-rewarding/</guid><description>RL doesn&apos;t optimize what I want — it optimizes exactly what I wrote down. The gap between the two is reward hacking, and closing it is most of the real work. Verifiers vs reward models, and how a constraint reward earned its +12%.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Self-play, and the games my models teach themselves</title><link>https://mixtureofinsights.com/blog/self-play-and-the-games-models-teach-themselves/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/self-play-and-the-games-models-teach-themselves/</guid><description>There&apos;s no dataset of good game-play. But in a game with a clear outcome, I manufacture one — I let a strong sampler play out games, filter by who won, and the transcripts become the strategy data. How the data engine, the verifier, and emergent strategy all meet, grounded in my GAME pipeline.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Post-training is a data problem</title><link>https://mixtureofinsights.com/blog/post-training-is-a-data-problem/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/post-training-is-a-data-problem/</guid><description>PPO, GRPO, and DPO are commoditized. In my engineering iterations, the only variable that structurally improved alignment was the synthetic data engine.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The Google Wallet Wall</title><link>https://mixtureofinsights.com/blog/01-the-google-wallet-wall/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/01-the-google-wallet-wall/</guid><description>Play Integrity passes STRONG. Google Wallet still refuses to add a card. Here is why — proven, not guessed — and why it can&apos;t be forced onto an unlocked device.</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>StockMask: a stock illusion without touching a single app</title><link>https://mixtureofinsights.com/blog/02-stockmask/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/02-stockmask/</guid><description>HideMyApplist hides package names. Apps still detected the custom ROM. The fix was a 200-line module that filters system_server responses by who&apos;s asking — never injecting into the app itself.</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Neovim: yank to the system clipboard (OSC 52)</title><link>https://mixtureofinsights.com/blog/nvim-yank-osc52/</link><guid isPermaLink="true">https://mixtureofinsights.com/blog/nvim-yank-osc52/</guid><description>How I make Neovim&apos;s yank reach the system clipboard over SSH / WSL — utilizing Neovim ≥ 0.10&apos;s native OSC 52 support.</description><pubDate>Sun, 16 Jun 2024 00:00:00 GMT</pubDate></item></channel></rss>