Cleanse Dark Data for the AI Age
View on GitHubIlios (แผผฮปฮนฮฟฯ), ancient Greek for "to cleanse", is a high-performance document-to-markdown conversion API that unlocks dark data trapped in documents.
Most organizational data sits inaccessible in PDFs, images, and legacy formats โ invisible to modern AI systems. Before you can leverage vector databases, knowledge graphs, or RAG systems, your documents must be transformed into clean, structured markdown suitable for embeddings and ontological analysis.
Ilios solves this critical infrastructure gap by providing production-grade OCR powered by Mistral AI, running on the Bun runtime with native APIs for 2-10x faster file operations.
/docsDeploy your own Ilios instance in under 60 seconds:
git clone https://github.com/tobalo/ilios.git
cd ilios
bun install
bun run db:push
bun run dev
Configure environment variables in .env based on .env.example.
The API will be available at http://localhost:1337.
Modern AI infrastructure demands clean, structured data. Yet 80% of enterprise data remains trapped in unstructured formats โ PDFs, scanned documents, legacy archives.
Before you can:
...you must first cleanse your documents into machine-readable formats.
1. Document Ingestion โ Upload PDFs, images, or legacy formats
2. OCR Processing โ Mistral AI extracts text with layout awareness
3. Markdown Output โ Clean, structured format ready for downstream processing
4. Vector Pipeline โ Feed to embedding models, chunk for RAG, or enrich for knowledge graphs
API key authentication is optional but strongly recommended for production.
When enabled via the API_KEY environment variable, all endpoints require authentication except
public paths (/health, /docs, /openapi.json).
Include your API key in the Authorization header with Bearer scheme:
Authorization: Bearer YOUR_API_KEY
Set multiple API keys (comma-separated) to support different teams or clients with isolated usage tracking:
API_KEY=team-alpha-key,team-beta-key,admin-master-key
Each key's operations are tracked independently in the database for billing and auditing. Usage endpoints automatically filter results to the authenticated key.
Ilios provides both immediate and asynchronous processing:
POST /v1/convert
Synchronous OCR. Returns markdown instantly. Optimized for files <100MB with in-memory processing. Perfect for real-time workflows.
POST /api/documents/submit
Upload a document for async OCR processing. Returns document ID for status tracking. Supports files up to 1GB.
GET /api/documents/status/:id
Check processing status of a submitted document. Returns progress and metadata.
GET /api/documents/:id
Get processed document content as JSON or markdown. Query param: ?format=markdown
POST /v1/batch/submit
Asynchronous batch processing. Submit up to 100 files (1GB each). Returns batch ID for progress tracking.
GET /v1/batch/status/:batchId
Track batch progress with detailed metrics: pending, processing, completed, and failed counts.
GET /v1/batch/download/:batchId
Download all results in JSONL format. Each line contains document metadata and markdown content.
curl -X POST https://ilios.sh/v1/convert \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "format=markdown"
curl -X POST https://ilios.sh/v1/batch/submit \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "[email protected]" \
-F "[email protected]" \
-F "priority=8"
curl https://ilios.sh/v1/batch/status/batch_abc123 \
-H "Authorization: Bearer YOUR_API_KEY"
Real-world performance metrics from production workloads. Last updated: