Why QVocalWriter Exists
The idea for this app started taking shape about six months ago.
I usually publish a blog post every month about what I’ve been working on recently. About half a year ago, I didn’t feel like writing that post. Instead, I sat down with my phone and dictated it as an audio recording.
My first idea was to hand that audio file directly to ChatGPT and ask it to turn it into a monthly blog post. That didn’t work—ChatGPT wouldn’t touch the raw audio. So I tried another approach: using a local AI model called Whisper, also from OpenAI, to transcribe the recording first.
At the time, Whisper was a Python library. I installed it with pip and wrote a tiny Python script to generate a transcript. The result was… not great. I was speaking Norwegian, with a dialect Whisper clearly didn’t understand very well. The transcription jumped unpredictably between Norwegian, Swedish, Danish, and even Turkish—sometimes multiple times within a single sentence. It was basically unusable.
Then I added one more line to the program, explicitly telling Whisper that the language was Norwegian. That helped a lot. The output became readable, although it arrived as one giant paragraph—a big textual “blob.”
I copied that text into ChatGPT and asked it to turn it into a blog post. That part worked surprisingly well. After some processing, I got back a nicely formatted blog post that closely resembled my previous ones. It was written in Markdown, the language was cleaned up, the structure made sense, and the whole thing was translated into English.
For the next several months, I went back to using the keyboard. The audio-based workflow felt too cumbersome: record audio, run a command-line tool to generate a transcript, copy and paste it into ChatGPT, then explain what I wanted it to do. I’m simply too lazy for that kind of process.
Still, I liked the idea.
So I kept thinking about it: how could I simplify this workflow enough that I’d actually want to use it?
The obvious answer was to write a program that did it for me.
In late November and early December, I sat down and built a proof of concept for what eventually became QVocalWriter. Within a few days, I had a working prototype. By the end of December—after working on it on and off—I had a program that actually worked.
One thing that surprised me was how good the local models are. On my workstation, they run quite fast. The small and some medium models perform well on my laptop, which I didn’t expect. I’ve tried running local models using LM Studio before, and that felt painfully slow—but LM Studio is an Electron app, and Electron applications tend to consume a lot of resources.
QVocalWriter is written in C++ using Qt. It’s a traditional desktop application that runs local language models directly. I maintain separate lists of models: one for transcription, one for rewriting, and one for translation. This lets me use much larger and more powerful models on my workstation, while still having a perfectly usable setup on my laptop.
For short-form tasks—like writing a Reddit post, a LinkedIn update, or an email—this workflow is now 100% local. When it comes to longer documents like blog posts, the local models are powerful enough to handle formatting and restructuring correctly, but they tend to shorten the text a bit too aggressively.
So my current workflow looks like this: I use QVocalWriter to transcribe the audio, then press a button to copy the result. From there, I paste it into ChatGPT and ask it to format the final blog post. That’s good enough for now.
It is tempting to let the program call ChatGPT or other cloud-based models directly, so I could choose between local and cloud models inside the app. But I’m not sure I want to do that. It would create an incentive to rely on cloud models, whereas I find local models far more interesting—both technically and philosophically.
Anyway, that’s the background story of the app.
Over the next few days, I plan to continue this blog series and share more experiences from building QVocalWriter. Based on how ChatGPT behaved during this project, I suspect that not many people are using large language models directly from C++ applications. I ran into several surprises along the way—but I’ll get back to those later.
For now, I’ll just say this: it’s been a genuinely fun project to work on, and I hope others will find it interesting too.