Legal Q&A systems lacked domain-specific accuracy and data privacy compliance for sensitive legal queries.
Fine-tuned Mistral 7B using QLoRA, achieving 20% increase in domain-specific accuracy. Architected a secure pipeline with AES-256 encryption and containerized deployment via Kubernetes. Optimized model inference for accurate legal responses.
+20% domain accuracy with production-ready legal Q&A and privacy compliance.
A Mistral 7B base model was fine-tuned using QLoRA with PEFT adapters on a curated legal Q&A dataset, reducing GPU memory requirements by 60% vs full fine-tuning. The trained model is served behind a FastAPI inference endpoint containerized with Docker and orchestrated via Kubernetes for horizontal scaling. All query inputs and outputs pass through an AES-256 encryption layer before storage to meet data privacy requirements.
Fine-tuning a 7B model with QLoRA is accessible but unforgiving — the dataset quality matters far more than training duration. Half the accuracy gains came from cleaning and deduplicating the training data, not from model architecture choices. I also learned that encryption at the application layer adds measurable latency that needs to be budgeted into SLA targets from day one.