Milkwave · Technical reference

UNDER THE HOOD
Architecture · DSP pipeline · Stack · Constants

How MILKWAVE is built — four distinct layers working together to take raw audio, extract musical meaning from it, and drive a real-time visual renderer at frame rate.

2048
FFT frame size
44.1k
Sample rate Hz
512
Beat hop size
148k
Preset library
<1s
Library index time
01 System architecture

FOUR LAYERS

Each layer has a single responsibility. They communicate over named pipes and a local HTTP sidecar — no shared state, no tight coupling.

1
Control
LAUNCHER
C# · WPF · Windows
Desktop remote and preset browser. Handles scene selection, preset navigation, and inter-process control of the renderer via named pipe and HTTP. The only UI the user sees directly.
2
Render
RENDERER
Python · PyQt6 · ModernGL · OpenGL / GLSL
Executes .milk preset shaders in real time via OpenGL. Beat uniforms — bass level, BPM, transient flags — are injected into each shader frame from the DSP layer, making the visuals physically react to the music.
3
Intelligence
AI SIDECAR
Python · FastAPI · Local LLM
A lightweight HTTP service running alongside the renderer. Receives audio feature data, classifies the current musical mood, queries the SPECTRAFORGE preset index, and instructs the renderer to switch scenes at the right moments.
4
DSP hot path
DSP ENGINE
Zig · C ABI · ctypes
A native DLL that replaces Python for the audio analysis hot path. Runs a 2048-point FFT, IIR band filters, and onset/beat detection at 44.1kHz with a 512-sample hop. Drops into the Python renderer via ctypes — no rewrite needed. Preset library indexing completes in under one second versus 30–45 seconds in pure Python.
02 DSP + AI pipeline

SIGNAL FLOW

Audio enters as raw PCM. By the time it reaches the renderer it has been transformed into a set of per-frame beat uniforms — structured data the GLSL shader can act on directly.

Capture
AUDIO IN
WASAPI loopback captures system audio — whatever is playing, no routing changes needed
44.1 kHz PCM
DSP engine
FFT
2048-point FFT produces the full frequency spectrum. IIR filters split bass, mid, and treble bands
Zig native DLL
Detection
BEAT
Onset detection on a 512-sample hop. Outputs BPM, transient flags, and per-band energy levels
hop 512
AI layer
CLASSIFY
Local LLM reads the audio features, determines musical mood, and queries SPECTRAFORGE for a matching preset
FastAPI sidecar
Renderer
GLSL OUT
Selected .milk shader runs with live beat uniforms injected per frame — bass, BPM, transients warp the visuals in real time
OpenGL
03 Technology stack

BUILT WITH

Each technology was chosen for a specific reason — performance-critical paths use compiled languages, the AI layer uses a local model to avoid latency and keep data on-device.

Systems · DSP
ZIG
Native DSP engine. Chosen for predictable, zero-overhead performance in the audio hot path — FFT, IIR filtering, and beat detection at 44.1kHz without garbage collection pauses.
Desktop · Control
C# / WPF
Windows launcher and remote UI. Handles the preset browser, scene controls, and IPC to the renderer. WPF gives native Windows feel without a heavyweight framework.
Rendering · Glue
PYTHON
PyQt6 window host and ModernGL renderer. Orchestrates the OpenGL context, loads .milk presets as GLSL shaders, and injects beat uniforms per frame from the Zig DSP DLL via ctypes.
Graphics
GLSL
The .milk preset format compiles directly to GLSL fragment shaders. Each preset is a self-contained visual program — the beat uniforms are the only runtime input it receives.
AI · Sidecar
FASTAPI
Lightweight HTTP service bridging the renderer to the AI layer. Receives audio feature payloads, runs mood classification, and returns preset selections with no perceptible latency.
Intelligence
LOCAL LLM
Runs on-device via Ollama. Classifies audio mood from feature vectors and selects presets from the SPECTRAFORGE tag index. On-device keeps response times low and audio data private.
04 Constants & configuration

DSP CONSTANTS

The core signal-processing constants are fixed at design time. They represent a deliberate balance between frequency resolution, temporal resolution, and latency.

ParameterValueNotes
FFT_SIZE2048 samplesFrequency resolution — 21.5 Hz bins at 44.1kHz
SAMPLE_RATE44,100 HzCD-quality audio capture via WASAPI loopback
BEAT_HOP_SIZE512 samples~11.6ms hop — onset detection update rate
Frequency bands3Bass · Mid · Treble — IIR filtered from FFT output
Preset library148,000 files.milk community presets, indexed by SPECTRAFORGE
Index time<1 secondZig pipeline vs 30–45s Python baseline
Preset formatGLSL / .milkMilkDrop shader format — runs unmodified on ModernGL
IPCNamed pipe + HTTPLauncher ↔ renderer · renderer ↔ AI sidecar
Capture methodWASAPI loopbackSystem audio — no cable routing or input switching required
hub