The platform

A tokenized lake, end to end.

Storage, compute, catalog, and policy designed together — so tokenization is not an add-on, it is the substrate the rest of the system sits on.

Ingest

Stream from Kafka, Kinesis, Pulsar, or batch from S3/GCS. The ingest tier applies your tokenization policies before the data ever lands in storage.

  • Exactly-once semantics with idempotent commit ledger
  • Schema evolution with backward-compatible reads
  • Backpressure-aware streaming to hot tier

Vault

Sensitive values are replaced with deterministic tokens. The raw data lives in an HSM-backed vault, addressable only by short-lived capabilities issued by the policy engine.

  • Format-preserving encryption (FF3-1) and random tokens
  • Per-namespace key isolation with hardware roots of trust
  • Online key rotation without table rewrites

Policy engine

A declarative policy graph — RBAC + ABAC + purpose-of-use — gates every read. Detokenization is never implicit; it is a policy decision with a written reason in the audit ledger.

  • Row-level scoping evaluated at plan time
  • Column masking with multiple reveal levels
  • Justification-required reveals with tamper-evident audit

Storage

Tiered, open-format, and pluggable. Hot partitions on NVMe, warm on local SSD, cold on your object store. One catalog, one query.

  • Iceberg-compatible table format · Parquet underneath
  • Z-ordering, bloom filters, and partition pruning by default
  • Bring your own bucket — we never own your data at rest

Compute

A vectorized SQL engine with adaptive query execution, plus a Python dataframe interface that compiles to the same plans.

  • Per-tenant compute pools — noisy-neighbor proof
  • Result caching with policy-aware invalidation
  • Autoscaling backed by spot-friendly schedulers

Catalog

A single source of truth for tables, schemas, lineage, and the policies attached to them. Backwards-compatible with the Iceberg REST catalog spec.

  • Time-travel queries on any tokenized table
  • End-to-end lineage from source topic to BI dashboard
  • Federated views across multiple lakes
Deployment

Run it your way.

rjbase.io runs as a managed service in our cloud, in your own VPC, or fully air-gapped on hardware you control. The control plane is the same; only the deployment target changes.

Managed

We run the control plane in our SOC 2 environment. Your data stays in your bucket.

BYOC

We deploy into your AWS, GCP, or Azure account. You see every API call.

On-prem

Helm chart against your Kubernetes. We support air-gapped installs.

Sovereign

Regional residency with hardware HSM roots — for the most regulated workloads.

Now in private preview

Tokenize your data. Keep your governance.

Talk to our team about an evaluation cluster. We typically have engineering partnerships running inside two weeks.