AI-Powered-Meeting-Assistant

01/2024 — 02/2024

Meeting intelligenceActionable summaries

The Problem

Manual meeting note-taking consumed significant time while missing key action items and decisions requiring follow-up.

The Solution

Built a modern meeting intelligence platform that transcribes live audio and generates actionable summaries using state-of-the-art NLP models. Implemented real-time speech-to-text processing with Python and Flask, reducing manual note-taking time.

Impact

Intelligent meeting transcription and summarization reducing manual documentation overhead.

Architecture

Live audio from the meeting is captured via the browser MediaRecorder API and sent to a Python Flask backend in 30-second chunks. Whisper API transcribes each chunk with speaker diarization hints. Transcripts are accumulated in a session store and processed by an OpenAI summarization pipeline at meeting end — first extracting action items with a structured prompt, then generating a narrative summary. Both outputs are returned to the frontend as a formatted report.

Key Challenges

Whisper transcription accuracy dropped significantly with overlapping speakers and background noise. Implemented a pre-processing step using WebAudio API to apply noise suppression before sending audio chunks, which improved transcription accuracy on noisy calls noticeably.
Processing 30-second audio chunks created visible gaps in the live transcript display. Solved by overlapping chunks by 3 seconds and deduplicating the overlap region using word-level timestamp comparison from Whisper's verbose output.
Action item extraction was unreliable when meeting topics shifted mid-conversation. Switched from a single end-of-meeting prompt to a rolling summarization approach — summarizing every 10-minute window and merging summaries — which preserved context across topic changes.

Key Learnings

Audio quality is the ceiling for any speech-to-text pipeline — no amount of prompt engineering compensates for poor input audio. I also found that action item extraction is a harder NLP task than summary generation: humans rely heavily on tone and implicit commitment signals that LLMs miss without explicit instructions to look for them.

Technologies

PythonFlaskSpeech-to-TextNLPWhisper APIOpenAI

Links

View Source Code