Back to Blog
#webdev#javascript#typescript#productivity

I built a voice journaling app in one night — here's what I learned

Jacobo · March 10, 2026 · 5 min read

Every few months I go through the same cycle: buy a nice notebook, write in it for three days, stop, feel vaguely guilty about the empty pages for six months. Typed apps are worse — you open them and immediately start performing for some imagined future reader. Your sentences get clean and corporate. You start writing “I felt somewhat frustrated with the scope of the project” instead of “god this meeting was a waste of my life.”

The actual problem isn't willpower. It's that typing creates a feedback loop with your editor brain. You're simultaneously feeling something and writing about it, and your editor ruins both.

Voice is different. When you talk, you're not editing. You're just talking.

That's the whole insight behind Voice Journal.

The core loop

The flow is simple: hit record → talk freely for 30 seconds to 5 minutes → stop → get back a structured entry with transcription, detected mood, themes, and a single insight about your patterns.

Three technologies make this work:

  • MediaRecorder API — captures audio in the browser, no native app required
  • Whisper — transcribes with genuinely impressive accuracy
  • Claude — extracts structure from the raw transcription

The recording setup is about 40 lines:

const startRecording = async () => {
  const stream = await navigator.mediaDevices
    .getUserMedia({ audio: true });
  
  const recorder = new MediaRecorder(stream, { 
    mimeType: 'audio/webm' 
  });
  const chunks: BlobPart[] = [];
  
  recorder.ondataavailable = (e) => chunks.push(e.data);
  recorder.onstop = async () => {
    const blob = new Blob(chunks, { type: 'audio/webm' });
    await processEntry(blob);
    stream.getTracks().forEach(t => t.stop());
  };
  
  recorder.start();
  setRecorder(recorder);
};

The insight extraction prompt is where I spent most of my time. The naive version — “extract mood and themes” — gives you garbage. Overfitted to the words, not the meaning.

const prompt = `You are analyzing a personal voice 
journal entry. The person was NOT writing — they 
were speaking freely.

Transcription:
${transcription}

Extract:
1. MOOD: one word, lowercase
2. THEMES: 2-4 keywords
3. INSIGHT: one sentence. NOT a summary. A pattern 
   observation. Reference subtext, not just what 
   was said explicitly.

Return JSON only.`;

The “NOT a summary” instruction was the unlock. Without it, the AI just paraphrased the transcription. Insight means pattern, not recap.

What surprised me

Whisper is scary good. I expected 80% accuracy and planned to add a “correct transcript” step. Didn't need it. The transcriptions are clean enough that downstream AI analysis works on the first pass.

People talk longer than you'd expect. I designed around 30-second entries. Testing it myself, I kept going too — turns out when you're not typing, you don't stop at a paragraph break. There is no paragraph break. You just keep talking.

The insight layer is the whole product. The transcription is table stakes. The mood and themes are nice. But that one-sentence insight — “You mentioned wanting to slow down 3 times this week” — that's the thing that makes people want to come back. That's the part that feels alive.

What I'd build next

Cross-entry pattern detection is the obvious next step. Right now each entry is analyzed in isolation. The interesting stuff happens when you look across 30 days and surface: “You're most restless on Sunday nights.”

I'd also add a weekly summary email — nothing pushy, just a quiet digest of your emotional weather from the past week. And offline mode — there's something philosophically right about a private journal not hitting a server.

Try it yourself →

If you've ever bought a journal you didn't keep, or typed in a journaling app and felt like you were writing a LinkedIn post about your own feelings — this was made for you.

voice-journal-jade.vercel.app →