Multimodal-AI RAG App

05/2023 — 07/2023
-15% retrieval latencyMulti-format support

The Problem

Organizations needed to query multi-format documents with fast retrieval and contextual reasoning across different file types.

The Solution

Developed full-stack RAG application with FastAPI and React.js. Implemented hybrid search pipeline using Pinecone and ChromaDB, reducing retrieval latency by 15% through optimized metadata filtering. Integrated Google Gemini API for contextual reasoning.

Impact

-15% retrieval latency for multi-format document queries with hybrid search.

Architecture

User-uploaded files (PDF, image, text) are routed through a file-type dispatcher — PyMuPDF for native PDFs, Tesseract OCR for scanned images. Extracted text is chunked with overlap and embedded via Google Gemini embeddings, stored in both Pinecone (cloud vector search) and ChromaDB (local hybrid retrieval). At query time, both stores are queried in parallel using asyncio, results merged with Reciprocal Rank Fusion, and the top-k chunks passed to Gemini for contextual answer generation.

Key Challenges

  • Running Pinecone and ChromaDB queries sequentially was the main latency bottleneck. Parallelizing both queries with asyncio.gather() and merging results using Reciprocal Rank Fusion reduced p95 retrieval latency by 15% — the single biggest optimization in the project.
  • Multi-format parsing was inconsistent — PDFs with embedded images returned empty text from PyMuPDF. Built a fallback pipeline that detects page-level text density and routes low-density pages through Tesseract OCR automatically, without manual intervention.
  • Gemini's context window filled up quickly with large documents. Implemented a token budget cap per query that scores and ranks retrieved chunks by relevance, truncating lowest-scoring chunks first while preserving the most semantically relevant context.

Key Learnings

Retrieval quality dominates RAG performance — a well-ranked retrieval pipeline with a smaller model beats a poorly-ranked one with a larger model every time. I also learned that hybrid search requires careful weight tuning per document type: the RRF weights that work well for legal PDFs don't generalize to technical diagrams or spreadsheets.

Technologies

FastAPIReact.jsPineconeChromaDBGoogle GeminiPythonTypeScript