Milkwave · Technical reference
How MILKWAVE is built — four distinct layers working together to take raw audio, extract musical meaning from it, and drive a real-time visual renderer at frame rate.
Each layer has a single responsibility. They communicate over named pipes and a local HTTP sidecar — no shared state, no tight coupling.
Audio enters as raw PCM. By the time it reaches the renderer it has been transformed into a set of per-frame beat uniforms — structured data the GLSL shader can act on directly.
Each technology was chosen for a specific reason — performance-critical paths use compiled languages, the AI layer uses a local model to avoid latency and keep data on-device.
The core signal-processing constants are fixed at design time. They represent a deliberate balance between frequency resolution, temporal resolution, and latency.
| Parameter | Value | Notes |
|---|---|---|
| FFT_SIZE | 2048 samples | Frequency resolution — 21.5 Hz bins at 44.1kHz |
| SAMPLE_RATE | 44,100 Hz | CD-quality audio capture via WASAPI loopback |
| BEAT_HOP_SIZE | 512 samples | ~11.6ms hop — onset detection update rate |
| Frequency bands | 3 | Bass · Mid · Treble — IIR filtered from FFT output |
| Preset library | 148,000 files | .milk community presets, indexed by SPECTRAFORGE |
| Index time | <1 second | Zig pipeline vs 30–45s Python baseline |
| Preset format | GLSL / .milk | MilkDrop shader format — runs unmodified on ModernGL |
| IPC | Named pipe + HTTP | Launcher ↔ renderer · renderer ↔ AI sidecar |
| Capture method | WASAPI loopback | System audio — no cable routing or input switching required |