Glad to share the first operational revision, the QB64 source and the Windows executable (64bit, -O3 -mavx2) are in the attached file.
In order to play a WAV/MP3, just give the name as a parameter on the command line. Use quotes if necessary.
My i5-7200U struggled bigtime when changing the FFT samples from 1024 to 4096, at 8192 the tearing is nasty.
I have an idea to precalculate (in next revisions) all the 8192 samples long chunk and to JUST draw the magnitudes.
Also, I encountered a nasty noise-like problem in the decoded stream (bug?!), many sequences with unnatural high values making the visual output blinking badly, what is this, anyone?!
EDIT, 2021-Dec-29:
The bug was fixed, now the revision 2 is fully functional. I hate when don't have enough time to finish something, anyway, found it, the problem was in reading the decompressed stream with at offsets not divisible by 2, duh.
I tested e-pi-ALTES with Japanese narration (Sekirei, Ninja Resurrection) - the visualization is as it should - the vocal range is well presented, some nice bell shapes in the Cyan zone the 1KHz-5KHz, to me visualizer of the magnitudes of frequencies lacking in showing a RICH presence in vocal range is of no use, at least to me.
Visualizing the male vocal of the demon advisor Mori:
You may press Alt+Enter to toggle fullscreen/windowed modes, on my i5-7200U the transforming speed is at 25 frames-per-second for the highest FFT chunk - 8192 samples, it can vary between 128, 256, 512, 1024, 2048, 4096, 8192:
As always, the source code is in the attached package, also 32bit and 64bit executables for Windows.
Enjoy!