MILKWAVE — Technical

Milkwave · Technical reference

UNDER THE HOOD

Architecture · DSP pipeline · Stack · Constants

How MILKWAVE is built — four distinct layers working together to take raw audio, extract musical meaning from it, and drive a real-time visual renderer at frame rate.

2048

FFT frame size

44.1k

Sample rate Hz

512

Beat hop size

148k

Preset library

<1s

Library index time

01 System architecture

FOUR LAYERS

Each layer has a single responsibility. They communicate over named pipes and a local HTTP sidecar — no shared state, no tight coupling.

Control

LAUNCHER

C# · WPF · Windows

Desktop remote and preset browser. Handles scene selection, preset navigation, and inter-process control of the renderer via named pipe and HTTP. The only UI the user sees directly.

Render

RENDERER

Python · PyQt6 · ModernGL · OpenGL / GLSL

Executes .milk preset shaders in real time via OpenGL. Beat uniforms — bass level, BPM, transient flags — are injected into each shader frame from the DSP layer, making the visuals physically react to the music.

Intelligence

AI SIDECAR

Python · FastAPI · Local LLM

A lightweight HTTP service running alongside the renderer. Receives audio feature data, classifies the current musical mood, queries the SPECTRAFORGE preset index, and instructs the renderer to switch scenes at the right moments.

DSP hot path

DSP ENGINE

Zig · C ABI · ctypes

A native DLL that replaces Python for the audio analysis hot path. Runs a 2048-point FFT, IIR band filters, and onset/beat detection at 44.1kHz with a 512-sample hop. Drops into the Python renderer via ctypes — no rewrite needed. Preset library indexing completes in under one second versus 30–45 seconds in pure Python.

02 DSP + AI pipeline

SIGNAL FLOW

Audio enters as raw PCM. By the time it reaches the renderer it has been transformed into a set of per-frame beat uniforms — structured data the GLSL shader can act on directly.

Capture

AUDIO IN

WASAPI loopback captures system audio — whatever is playing, no routing changes needed

44.1 kHz PCM

›

DSP engine

FFT

2048-point FFT produces the full frequency spectrum. IIR filters split bass, mid, and treble bands

Zig native DLL

›

Detection

BEAT

Onset detection on a 512-sample hop. Outputs BPM, transient flags, and per-band energy levels

hop 512

›

AI layer

CLASSIFY

Local LLM reads the audio features, determines musical mood, and queries SPECTRAFORGE for a matching preset

FastAPI sidecar

›

Renderer

GLSL OUT

Selected .milk shader runs with live beat uniforms injected per frame — bass, BPM, transients warp the visuals in real time

OpenGL

03 Technology stack

BUILT WITH

Each technology was chosen for a specific reason — performance-critical paths use compiled languages, the AI layer uses a local model to avoid latency and keep data on-device.

Systems · DSP

ZIG

Native DSP engine. Chosen for predictable, zero-overhead performance in the audio hot path — FFT, IIR filtering, and beat detection at 44.1kHz without garbage collection pauses.

Desktop · Control

C# / WPF

Windows launcher and remote UI. Handles the preset browser, scene controls, and IPC to the renderer. WPF gives native Windows feel without a heavyweight framework.

Rendering · Glue

PYTHON

PyQt6 window host and ModernGL renderer. Orchestrates the OpenGL context, loads .milk presets as GLSL shaders, and injects beat uniforms per frame from the Zig DSP DLL via ctypes.

Graphics

GLSL

The .milk preset format compiles directly to GLSL fragment shaders. Each preset is a self-contained visual program — the beat uniforms are the only runtime input it receives.

AI · Sidecar

FASTAPI

Lightweight HTTP service bridging the renderer to the AI layer. Receives audio feature payloads, runs mood classification, and returns preset selections with no perceptible latency.

Intelligence

LOCAL LLM

Runs on-device via Ollama. Classifies audio mood from feature vectors and selects presets from the SPECTRAFORGE tag index. On-device keeps response times low and audio data private.

04 Constants & configuration

DSP CONSTANTS

The core signal-processing constants are fixed at design time. They represent a deliberate balance between frequency resolution, temporal resolution, and latency.

Parameter	Value	Notes
FFT_SIZE	2048 samples	Frequency resolution — 21.5 Hz bins at 44.1kHz
SAMPLE_RATE	44,100 Hz	CD-quality audio capture via WASAPI loopback
BEAT_HOP_SIZE	512 samples	~11.6ms hop — onset detection update rate
Frequency bands	3	Bass · Mid · Treble — IIR filtered from FFT output
Preset library	148,000 files	.milk community presets, indexed by SPECTRAFORGE
Index time	<1 second	Zig pipeline vs 30–45s Python baseline
Preset format	GLSL / .milk	MilkDrop shader format — runs unmodified on ModernGL
IPC	Named pipe + HTTP	Launcher ↔ renderer · renderer ↔ AI sidecar
Capture method	WASAPI loopback	System audio — no cable routing or input switching required