Manual meeting note-taking consumed significant time while missing key action items and decisions requiring follow-up.
Built a modern meeting intelligence platform that transcribes live audio and generates actionable summaries using state-of-the-art NLP models. Implemented real-time speech-to-text processing with Python and Flask, reducing manual note-taking time.
Intelligent meeting transcription and summarization reducing manual documentation overhead.
Live audio from the meeting is captured via the browser MediaRecorder API and sent to a Python Flask backend in 30-second chunks. Whisper API transcribes each chunk with speaker diarization hints. Transcripts are accumulated in a session store and processed by an OpenAI summarization pipeline at meeting end — first extracting action items with a structured prompt, then generating a narrative summary. Both outputs are returned to the frontend as a formatted report.
Audio quality is the ceiling for any speech-to-text pipeline — no amount of prompt engineering compensates for poor input audio. I also found that action item extraction is a harder NLP task than summary generation: humans rely heavily on tone and implicit commitment signals that LLMs miss without explicit instructions to look for them.