Data Catalog
Discover, classify, and annotate datasets and knowledge assets. Every source is tracked, every schema versioned.
Open Source · Apache 2.0 · EU-Sovereign · Air-Gap Ready · Opt-in for regulated deployments
From Latin codex — a book of laws, a manuscript of collected knowledge. An audit trail, a catalog, a lineage record — all in one sovereign stack. The open-source EU alternative to Palantir Foundry / AIP for regulated sectors: government, KritIS, healthcare, banking, and pharma compliance.
The Concept
95 % of operators want a sovereign LLM gateway — that’s MoE Sovereign. The remaining 5 % operate in regulated sectors where AI deployments require documented risk classification, data lineage, approval workflows, and audit trails. That’s MoE Codex.
Discover, classify, and annotate datasets and knowledge assets. Every source is tracked, every schema versioned.
No data enters AI pipelines without authorization. Multi-step approval gates, reviewer assignments, and documented decisions.
End-to-end traceability from raw source to inference output. OpenLineage-compatible events for every pipeline run.
Continuous monitoring of knowledge graph health and statistical data drift. Alerts when models see distribution shifts.
“BVerfG 2023 ruled the Palantir-based Hessendata platform unconstitutional. Every EU authority deploying AI infrastructure now needs a documented sovereign alternative. MoE Codex provides the compliance layer by construction.”
Capabilities
Built for compliance-driven operators. Deployable alongside any MoE Sovereign instance.
Asset discovery, schema registry, tagging, and classification. Tracks every dataset from ingestion to model input. Integrates with OpenMetadata.
Multi-step authorization gates before data enters AI pipelines. Role-based reviewer assignment, timestamped decisions, and irrefutable audit trail.
Every ETL run, NiFi pipeline, and AI inference emits OpenLineage events captured by Marquez. Trace any output back to its source data.
Git-style branches and commits for datasets. Roll back to any snapshot, isolate experiments, and ship reproducible knowledge bundles.
Continuous health monitoring of knowledge graph metrics. Statistical drift alerts when incoming data deviates from baselines. Prometheus-native metrics.
Graph-based query interface for compliance investigations. Link analysis, timeline views, and export-ready audit reports for regulators.
Apache NiFi for visual data flow design. Ingest from legacy sources, normalize, and route to Kafka, Postgres, or the AI pipeline — without writing code.
S3-compatible object store by Deuxfleurs (French non-profit, AGPLv3). EU-origin, Rust-based, MinIO drop-in. No MinIO — it was archived April 2026.
System Design
MoE Codex is an opt-in layer that extends MoE Sovereign. Deploy it only where compliance obligations require it.
Deployment model: MoE Sovereign runs autonomously without MoE Codex. Operators who deploy Codex get the full catalog & compliance layer on top. Only endpoint and credentials change in the application layer — no code changes.
NiFi flows pull data from legacy sources, APIs, and file shares. Catalog entries are created automatically on ingestion.
Curators classify datasets in the catalog. Approval workflows route assets through multi-step authorization before AI pipeline access.
lakeFS creates immutable snapshots of approved datasets. Every pipeline run emits OpenLineage events to Marquez.
Drift metrics surface anomalies. Investigation explorer enables graph-based queries. Export audit reports for regulators on demand.
Regulatory Fit
Three regulatory drivers make a sovereign compliance layer mandatory for AI deployments in regulated EU sectors.
In force since 01.08.2024; high-risk obligations apply from 02.08.2026. MoE Codex provides the risk documentation, audit trail, and traceability required for Annex III high-risk AI systems.
NIS2 transposition requires risk management, incident reporting, and supply-chain accountability for essential entities. MoE Codex’s audit trail and lineage records directly address these obligations.
Data Protection Impact Assessments are mandatory for high-risk personal data processing. Catalog metadata and lineage records provide the inventory and flow documentation Art. 35 requires.
BSI Grundschutz OPS baustein mappings and C5-certified EU hosting (Hetzner, IONOS, STACKIT, OVHcloud) ensure KritIS operators can achieve BSI certification without US-cloud dependencies.
The Federal Constitutional Court ruled the Palantir-based Hessendata policing platform unconstitutional. MoE Codex is the documented, rights-compliant alternative for law enforcement AI use cases.
AWS, Azure, and GCP are subject to the US CLOUD Act regardless of EU region. Deployed on EU-jurisdiction hosting (see eu_sovereignty_charter), MoE Codex has zero CLOUD Act exposure.
| Concern | MoE Codex | Palantir Foundry/AIP | US-Cloud LLM APIs |
|---|---|---|---|
| Data leaves EU jurisdiction | ❌ never | 🟡 depends on contract | ✅ always |
| Subject to US CLOUD Act | ❌ no (EU host) | ✅ yes | ✅ yes |
| Source code auditable | ✅ Apache 2.0 | ❌ proprietary | ❌ proprietary |
| Air-gap deployment | ✅ documented | 🟡 contract option | ❌ impossible |
| Vendor lock-in risk | ❌ none (fork-able) | ✅ high | ✅ high |
| Per-token operator cost | €0 (own hardware) | metered + licence | metered |
Under the Hood
Every component is OSI-licensed, EU-deployable, and auditable. No BSL, SSPL, or ELv2 in the stack.
High-performance async Python API for catalog, approval, lineage, and versioning endpoints.
Visual ETL designer for data ingestion from legacy sources. Apache 2.0 licensed, containerized, no cloud dependency.
OpenLineage-compatible lineage backend. Every data movement is recorded, queryable, and exportable for auditors.
Git for data: branch, commit, and merge datasets. Immutable snapshots ensure reproducibility across compliance audits.
Deuxfleurs French non-profit. AGPLv3, Rust-based, MinIO drop-in. EU-authored object storage replacing EOL MinIO (archived 2026-04-25).
Reliable relational storage for catalog metadata, approval state, and lineage records.
Graph database for knowledge assets and investigation queries. Semantic link analysis and traversal for compliance exploration.
Linux Foundation fork of Redis (replaces SSPL/RSALv2 Redis). Caching and rate limiting. Wire-protocol compatible.
Docker and Docker Compose. 4 cores + 8 GB RAM (add-on to MoE Sovereign). Scales from single-node to Kubernetes cluster.
Only Apache 2.0, MIT, BSD, AGPLv3 (separate container), and LGPL. Automated audit via scripts/audit-licenses.sh in CI.
Recommended: Hetzner, IONOS, STACKIT, OVHcloud, Open Telekom Cloud. All BSI-C5 or SecNumCloud certified. AWS/Azure/GCP explicitly not recommended for sovereignty-critical deployments.
Ecosystem
MoE Codex is the compliance layer of the MoE Sovereign AI ecosystem, designed for regulated operators.
The core — distributed AI inference on your own hardware. API gateway, 15 expert templates, GraphRAG, MCP tools.
Federated knowledge exchange between sovereign MoE instances. Voluntary, bilateral, no central authority.
Full documentation: architecture, API reference, EU sovereignty charter, license compliance guide.
Source code, issues, and contribution guidelines. Apache 2.0. Fork-able without restriction.