Scaling Kaspa's Data Infrastructure

The Problem — Reframed

This Isn't a One-Size-Fits-All Problem

The initial framing — "fix the explorer database" — assumed every exchange and wallet uses the same stack. They don't. Not even close.

Coinbase runs Kafka with 30 brokers processing billions of events daily into a Delta Lake. Kraken runs Rust microservices with Aeron for microsecond-latency messaging. Binance runs isolated Kafka clusters across regions. A small exchange might use a third-party API provider and never touch a node.

The core issue: Kaspa's existing indexers are opinionated — they output to one specific backend (PostgreSQL or a proprietary local store). Exchanges that use Kafka, Snowflake, ClickHouse, gRPC, or anything else have to build their entire ingestion pipeline from scratch, starting at the raw node RPC level.

Exchange Infrastructure

How Exchanges Actually Work

Exchange infrastructure varies enormously by scale. There is no standard stack.

Tier 1 — Large Exchanges

Binance, Coinbase, Kraken, OKX • 100+ engineers, fully custom

Run their own full/archive nodes for every supported chain
Custom internal indexers — not using community tools
Enterprise message queues — Kafka (Coinbase, Binance), Aeron (Kraken)
Multi-tier databases — operational DB + data warehouse (Snowflake, Delta Lake) + real-time analytics (StarRocks, ClickHouse)
Multiple ingestor instances connected to different nodes for redundancy
Deduplication + event ordering in the queue system before persistence
Custom deployment automation — Coinbase's Snapchain spins up new blockchain nodes in minutes via EBS snapshots

Tier 2 — Medium Exchanges

KuCoin, Gate.io, Bybit, Bitfinex • 10-50 engineers, mixed approach

Own nodes for top 20-50 chains, managed RPC services (QuickNode, Ankr) for long-tail
Simpler Kafka or RabbitMQ setups
PostgreSQL as primary, possibly with a data warehouse for analytics
Smaller blockchain teams, less custom tooling
More likely to use community-built indexers if available

Tier 3 — Small Exchanges & Wallets

Startups, regional exchanges • 1-5 engineers, third-party dependent

Almost entirely reliant on node-as-a-service providers (NowNodes, Ankr, QuickNode)
Managed blockchain APIs — services like CryptoAPIs.io handling deposit detection across 100+ chains
Standard web stack — PostgreSQL/MySQL, Redis, basic queue
May use white-label solutions or CCXT library
Most likely to be affected by explorer API lag — they depend on public APIs

Key insight: You can't hand every exchange the same fix — they all run different systems. What Kaspa can provide is a clean, universal entry point into the data. What each exchange does with that data is up to them.

Responsibility

Who Owns What

Not everything is Kaspa development's problem. Not everything is the exchange's problem either.

Component	Owner	Status
Full node software rusty-kaspa with gRPC/wRPC	Kaspa	Done
UTXO index Address-based queries + subscriptions	Kaspa	Done
RPC notification system Block, virtual chain, UTXO change events	Kaspa	Done
Un-opinionated ingestor Node events → pluggable output sinks	Kaspa	Missing — the gap
Integration documentation Deposit detection, withdrawals, reorgs	Kaspa	Partial
SDKs Tx construction in Rust, Go, TS, Python	Kaspa + Community	Rust (core), Python (community), others partial
Community explorer simply-kaspa-indexer + REST API	Community	Works (PostgreSQL-only)
Processing pipeline Kafka, RabbitMQ, internal queues	Exchange	Their architecture
Database / data store PostgreSQL, Snowflake, ClickHouse, etc.	Exchange	Their choice
Deposit detection + crediting	Exchange	Their business logic
Withdrawal construction	Exchange	Their key management
Hot/cold wallet architecture	Exchange	Their security model
Compliance / AML / KYT	Exchange	Their regulatory obligation
Redundancy + failover	Exchange	Their SLA

The Missing Piece

An Un-Opinionated Kaspa Ingestor

The node already provides excellent RPC subscriptions. The problem is there's no standardized bridge between raw node events and the diverse backends exchanges use.

Both existing Rust indexers are opinionated:

simply-kaspa-indexer

Outputs to PostgreSQL only
Opinionated schema design
Handles 10 BPS on high-end NVMe
Great for explorers, unusable for Kafka-based exchanges

Kasia Indexer

Writes locally to disk
Filters for Kasia protocol messages only
Application-specific, not general-purpose
Not usable for exchange integration

What's needed is a reference ingestor — a standalone Rust application using VSPCv2:

// Un-opinionated Kaspa ingestor architecture ┌──────────────────────────────────────────────────────────────┐ │ Kaspa Ingestor │ │ │ │ INPUT │ │ ├── Subscribes to rusty-kaspa node via wRPC/gRPC │ │ ├── NotifyUtxosChanged (address subscriptions) │ │ └── Uses VSPCv2 for efficient sender address resolution │ │ │ │ NORMALIZE │ │ ├── Canonical protobuf/JSON schema for all events │ │ ├── Block events, tx events, UTXO events, acceptance │ │ └── DAG reorg / virtual chain change events │ │ │ │ OUTPUT — pluggable sinks, configured at runtime │ │ ├── Kafka topics exchange pipelines │ │ ├── gRPC stream low-latency consumers │ │ ├── Webhooks / HTTP simple integrations │ │ ├── PostgreSQL explorers / analytics │ │ ├── RabbitMQ / SQS queue-based systems │ │ ├── S3 / Parquet data lakes / research │ │ └── stdout / file dev / debugging │ │ │ └──────────────────────────────────────────────────────────────┘

Design principle: Extract once, transform and load as needed. The ingestor outputs to whatever sink the consumer configures. Large exchanges run multiple instances connected to different nodes for redundancy, deduplicating events downstream in their queue system.

Integration Patterns

How Each Tier Would Use This

// TIER 1 — Large exchange (Binance/Coinbase pattern) Kaspa Node A ──wRPC──► Ingestor 1 ─┐ Kaspa Node B ──wRPC──► Ingestor 2 ─┤──► Kafka / Aeron ──► Dedup Kaspa Node C ──wRPC──► Ingestor 3 ─┘ │ ▼ ┌── Queue Readers ──┐ │ │ │ ▼ ▼ ▼ Hot Wallet Data Lake Compliance Monitor (Snowflake) (AML/KYT) // TIER 2 — Medium exchange Kaspa Node ──wRPC──► Ingestor ──► RabbitMQ ──► PostgreSQL │ Internal API // TIER 3 — Small exchange Kaspa Node ──wRPC──► Ingestor ──► PostgreSQL (direct) or Ingestor ──► Webhooks to backend // OR just use the community explorer API (current default)

Precedent

How Other Chains Solved This

The solutions that work all share one pattern: separate ingestion from application-specific indexing.

Solana

Geyser Plugin System

Plugins loaded into the validator. Single trait interface. Community output adapters for PostgreSQL, Kafka, gRPC, RabbitMQ, SQS, BigTable. The gold standard.

Ethereum

EthPandaOps Xatu + The Graph

Xatu: pluggable sinks — ClickHouse, Kafka, PostgreSQL, Parquet. The Graph: decentralized indexing via GraphQL subgraphs.

Polygon

Chain Indexer Framework

Three-layer Kafka-centric: Block Producers → Kafka → Transformers → Consumers. Raw data replayable from Kafka indefinitely.

NEAR

Lake Framework

Events as JSON on S3. Framework libraries in Rust, JS, Python. 100-500MB RAM, ~$18/month. Centralized but effective.

Cosmos

Tendermint EventBus + tx_index

Built-in indexing: LevelDB or PostgreSQL backends. EventBus for custom indexers. Similar to Kaspa's notification system.

Bitcoin

Fragmented (cautionary tale)

No standard interface. Massive fragmentation: ElectrumX, Electrs, Esplora, Blockbook. Everyone reinvented the wheel. Avoid this.

The pattern: Chains that provide an output-agnostic ingestion layer early get healthy ecosystems. Chains that don't (Bitcoin) get fragmentation. Kaspa has a window to get this right.

Kaspa's Deliverables

What Kaspa Development Should Provide

Deliverable	Status
Stable node with documented RPC	Done
UTXO Index — address queries + change notifications	Done
Notification event system — blocks, virtual chain, UTXOs	Done
Un-opinionated reference ingestor — Rust, VSPCv2, pluggable sinks	Not started
Exchange integration docs — deposits, withdrawals, DAG reorgs	Partial
Multi-language SDKs — Rust (core), Python (community), Go, TypeScript	Rust + Python done, others vary
Canonical protobuf schema for all ingestor event types	Not started
Docker Compose reference — node + ingestor + Kafka + PostgreSQL	Not started
Mempool notifications (GitHub #339)	Not started

Community Infrastructure

Where the Community Explorer Fits

The simply-kaspa-indexer + kaspa-rest-server stack fills an important role — it powers the public explorer and is the default integration point for Tier 3 exchanges and wallet providers.

Its PostgreSQL bottleneck is real and the optimizations shipping (v2.0 denormalization, v2.1 VSPCv2, batch tuning) directly help this audience.

But it was never going to be the solution for Tier 1 or Tier 2 exchanges — and it shouldn't try to be. Those exchanges need the un-opinionated ingestor to plug into their own infrastructure.

The community explorer is valuable. It serves the public block explorer, small integrations, and developers. It just shouldn't be treated as the universal exchange integration solution. That's the ingestor's job.

Recommended Path

What Should Happen

For Kaspa development:

Build a reference un-opinionated ingestor in Rust using VSPCv2 with pluggable output sinks. Start with Kafka + gRPC + stdout. Community adds more sinks. Publish exchange integration documentation for DAG-specific concerns (virtual chain changes, UTXO handling). Continue supporting community explorer optimizations for the public explorer and Tier 3 use case.

For exchanges:

Run your own rusty-kaspa node(s) with --utxoindex. Use the ingestor to feed events into your existing pipeline. Build deposit detection, crediting, and withdrawal logic on top of your own infrastructure. The ingestor gives you the data — what you do with it is up to you.

For wallet providers:

For real-time balances: connect directly to a node via wRPC using NotifyUtxosChanged subscriptions — no explorer dependency. For transaction history: use the community explorer REST API or run your own simply-kaspa-indexer instance.