ฮ™ฮ›ฮ™ฮŸฮฃ

Cleanse Dark Data for the AI Age

View on GitHub

What is Ilios?

Ilios (แผผฮปฮนฮฟฯ‚), ancient Greek for "to cleanse", is a high-performance document-to-markdown conversion API that unlocks dark data trapped in documents.

Most organizational data sits inaccessible in PDFs, images, and legacy formats โ€” invisible to modern AI systems. Before you can leverage vector databases, knowledge graphs, or RAG systems, your documents must be transformed into clean, structured markdown suitable for embeddings and ontological analysis.

Ilios solves this critical infrastructure gap by providing production-grade OCR powered by Mistral AI, running on the Bun runtime with native APIs for 2-10x faster file operations.

Features

Installation

Deploy your own Ilios instance in under 60 seconds:

git clone https://github.com/tobalo/ilios.git
cd ilios
bun install
bun run db:push
bun run dev

Configure environment variables in .env based on .env.example. The API will be available at http://localhost:1337.

Why Ilios Matters

Modern AI infrastructure demands clean, structured data. Yet 80% of enterprise data remains trapped in unstructured formats โ€” PDFs, scanned documents, legacy archives.

The Dark Data Problem

Before you can:

...you must first cleanse your documents into machine-readable formats.

How Ilios Enables AI Readiness

1. Document Ingestion โ†’ Upload PDFs, images, or legacy formats
2. OCR Processing โ†’ Mistral AI extracts text with layout awareness
3. Markdown Output โ†’ Clean, structured format ready for downstream processing
4. Vector Pipeline โ†’ Feed to embedding models, chunk for RAG, or enrich for knowledge graphs

Authentication

API key authentication is optional but strongly recommended for production. When enabled via the API_KEY environment variable, all endpoints require authentication except public paths (/health, /docs, /openapi.json).

Using API Keys

Include your API key in the Authorization header with Bearer scheme:

Authorization: Bearer YOUR_API_KEY

Multi-Tenant ACL Support

Set multiple API keys (comma-separated) to support different teams or clients with isolated usage tracking:

API_KEY=team-alpha-key,team-beta-key,admin-master-key

Each key's operations are tracked independently in the database for billing and auditing. Usage endpoints automatically filter results to the authenticated key.

API Usage

Ilios provides both immediate and asynchronous processing:

Immediate Conversion

POST /v1/convert

Synchronous OCR. Returns markdown instantly. Optimized for files <100MB with in-memory processing. Perfect for real-time workflows.

Document Submit

POST /api/documents/submit

Upload a document for async OCR processing. Returns document ID for status tracking. Supports files up to 1GB.

Document Status

GET /api/documents/status/:id

Check processing status of a submitted document. Returns progress and metadata.

Retrieve Document

GET /api/documents/:id

Get processed document content as JSON or markdown. Query param: ?format=markdown

Batch Submit

POST /v1/batch/submit

Asynchronous batch processing. Submit up to 100 files (1GB each). Returns batch ID for progress tracking.

Batch Status

GET /v1/batch/status/:batchId

Track batch progress with detailed metrics: pending, processing, completed, and failed counts.

Batch Download

GET /v1/batch/download/:batchId

Download all results in JSONL format. Each line contains document metadata and markdown content.

Quick Examples

Convert a Single Document

curl -X POST https://ilios.sh/v1/convert \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "format=markdown"

Batch Processing

curl -X POST https://ilios.sh/v1/batch/submit \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "priority=8"

Check Status

curl https://ilios.sh/v1/batch/status/batch_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Performance Benchmarks

Real-world performance metrics from production workloads. Last updated: