- Move docling/markitdown services under services/ alongside new
unlimited-ocr and vision services
- Add Laravel app for email-to-markdown conversion and OCR frontend
- Add email export tooling and example emails/output
- Update docker-compose, Caddyfile, and frontend assets
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Docling: pass PdfPipelineOptions (TesseractCLI) to ImageFormatOption
to prevent RapidOCR/PP-OCRv6 being loaded for image files
MarkItDown: auto-fallback to plain conversion when Ollama returns 500
(OOM/crash) instead of propagating the error to the user
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Force TesseractCliOcrOptions for image formats (JPG/PNG/TIFF/BMP)
to prevent RapidOCR/PP-OCRv6 fallback on docling 2.107
- Add db/init.sql and db/init_docling.sql for database initialization
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FastAPI microservices: MarkItDown + Docling với async SQLAlchemy
- Caddy reverse proxy same-origin (no CORS)
- Bootstrap 5 frontend với marked.js rendering
- LLM settings card: Ollama URL, model select từ API, cleanup model
- POST /cleanup endpoint với AI làm đẹp Markdown
- GET /models fetch danh sách model từ Ollama
- Runtime LLM re-init không cần restart container
- PYTHONDONTWRITEBYTECODE + .dockerignore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>