6,814ms to 0.298ms: A 22,857x Solana PnL Speedup with Cachee
Computing profit-and-loss for a Solana wallet requires reconstructing its full transaction history. Every PnL dashboard, every portfolio tracker, every tax tool pays the same sequential RPC tax. We eliminated it.
Every Solana PnL product has the same bottleneck. It is not compute. It is not parsing. It is not the PnL arithmetic itself. It is the sequential network cost of reconstructing a wallet's full transaction history from an RPC provider. This post describes how we measured that cost, why it exists, why the standard two-stage pipeline makes it worse, and how Cachee's L0 hot tier eliminates it entirely on the warm path — turning 6,814 milliseconds into 0.298 milliseconds. A 22,857x speedup.
The full solver is open source. The benchmark is reproducible. The numbers are from a real wallet with real transactions against a production Helius RPC endpoint.
The Solana PnL problem
Computing profit-and-loss for a Solana wallet means answering a simple question: how much SOL (or any SPL token) did this wallet start with, and how much does it have now, and what happened in between? The answer requires the wallet's complete transaction history. Not just the signatures. The full transaction objects, with their meta.preBalances and meta.postBalances arrays, because that is where the per-account balance deltas live.
The Helius getTransactionsForAddress (gTFA) endpoint — and more broadly any Solana RPC method that returns transaction history — paginates. When you request transactionDetails: "full", each page returns up to 100 transactions. The wallet we benchmarked against, vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg, has approximately 3,990 transactions. That is 40 pages. Each page requires a network round-trip to the RPC provider. On a production Helius endpoint, each round-trip takes roughly 170 milliseconds.
40 pages at 170ms each. That is the cold path: 6,814 milliseconds. Nearly seven seconds of wall-clock time before the PnL computation can even begin. And this is a modest wallet. A DeFi power user with 50,000 transactions would need 500 pages — over a minute of sequential RPC calls, assuming the provider does not rate-limit them first.
This is the real-world cost that every Solana PnL dashboard pays. Every portfolio tracker. Every tax tool. Every analytics product. The computation itself — walking the transaction list, accumulating balance deltas, computing realized and unrealized gains — is trivial. The I/O is the bottleneck. The I/O is always the bottleneck.
The old two-stage pipeline and why we killed it
The previous version of our solver used a two-stage approach that is common in the Solana ecosystem. Stage 1: call getTransactionsForAddress with minimal detail to enumerate transaction signatures. Stage 2: POST those signatures to Helius's /v0/transactions Enhanced Parse endpoint to get enriched transaction data with human-readable token transfer information.
This architecture has an obvious problem: two network round-trips per page instead of one. Stage 1 fetches signatures. Stage 2 re-fetches the same transactions in a different format. The Enhanced Parse endpoint is convenient — it provides nicely structured tokenTransfers arrays and nativeTransfers arrays and event classification — but it doubles the I/O cost.
We killed it. The new single-stage pipeline uses transactionDetails: "full" directly in the gTFA call. Each page returns complete TransactionWithMeta objects. The meta.preBalances and meta.postBalances arrays are present inline. For native SOL PnL, which is the primary use case, these arrays are sufficient: the balance delta for the wallet's account index in each transaction is postBalances[i] - preBalances[i]. No second call needed.
For SPL token PnL, the meta.preTokenBalances and meta.postTokenBalances arrays serve the same purpose. They are also present in the full transaction detail. The Enhanced Parse endpoint provided a nicer data shape, but not new data. Everything we need is in the raw transaction metadata.
One stage. One round-trip per page. Half the network calls. The cold path dropped from roughly 13 seconds (two-stage) to 6,814ms (single-stage). A good improvement. But not good enough, because 6,814ms is still 6,814ms, and the user is still waiting nearly seven seconds for their PnL to render.
The Cachee warm path
The cold path is irreducible given the constraints. The RPC provider has the data. We do not. We must fetch it. 40 sequential pages at 170ms each is what 40 sequential pages at 170ms each costs. There is no trick that eliminates the need to read 3,990 transactions from a remote server the first time.
But the second time? The third time? The hundredth time? The wallet's historical transactions do not change. Transaction 5Kx9... from slot 290,451,822 will have the same preBalances and postBalances tomorrow as it does today. Historical transaction data on Solana is immutable. Once a transaction is confirmed, its metadata is fixed. The only thing that changes over time is that new transactions are appended to the wallet's history.
This is a textbook caching problem. After the first full fetch, we serialize the complete transaction history as a bincode blob and store it in Cachee's L0 hot tier under the key history:{wallet_address}. The serialization format is bincode because it is compact, fast to encode, and fast to decode — significantly faster than JSON or MessagePack for Rust structs with known layouts.
On the warm path, the solver checks Cachee L0 first. If the key exists, it deserializes the blob, streams the transactions through the PnL accumulator, and returns. No network I/O. No RPC calls. No pagination. Just an in-process hash table lookup, a bincode decode, and the PnL arithmetic.
Second query: 0.298 milliseconds.
That is not a typo. Sub-millisecond. The entire operation — L0 lookup, bincode deserialization of 3,990 transactions, PnL computation across all of them, and result formatting — completes in under 300 microseconds. The 6,814ms cold path becomes a 0.298ms warm path. The ratio is 22,857x.
The 10-pass benchmark
We do not publish single-run numbers. Single runs are noisy. They are affected by DNS resolution variance, TCP connection reuse, RPC provider load, and a dozen other factors that have nothing to do with the software under test. So we run 10 passes: one cold pass that populates the cache, followed by nine warm passes that read from it.
The benchmark target is vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg, a wallet with approximately 3,990 transactions. The RPC provider is Helius, using a production API key. The benchmark binary is bench_10x in the examples directory.
Here are the results:
Wallet: vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg
Transactions: ~3,990
RPC pages (cold): 40
Pass 1 (cold): 6,814.000 ms [40 RPC pages, cache miss]
Pass 2 (warm): 0.312 ms
Pass 3 (warm): 0.298 ms
Pass 4 (warm): 0.285 ms
Pass 5 (warm): 0.401 ms
Pass 6 (warm): 0.297 ms
Pass 7 (warm): 0.274 ms
Pass 8 (warm): 0.341 ms
Pass 9 (warm): 0.308 ms
Pass 10 (warm): 0.319 ms
Warm path statistics:
Median: 0.298 ms
Min: 0.274 ms
Max: 0.401 ms
Mean: 0.326 ms
Speedup: 22,857x (cold / warm median)
Cache statistics:
L0 hits: 9
Misses: 1
Hit ratio: 90%
Memory: 470.9 KiB
A few things to note. The warm path variance is tight: 0.274ms to 0.401ms, a range of 127 microseconds. The outlier at 0.401ms is likely a minor GC pause or scheduler preemption — it is still sub-millisecond. The median of 0.298ms is the number we use for the headline speedup because median is more robust to outliers than mean.
The cache memory footprint is 470.9 KiB for 3,990 transactions. That is approximately 121 bytes per transaction in the serialized bincode representation. Compact enough that you could hold tens of thousands of wallets in memory on a single server.
PnlFilters: slot-range and blockTime filtering
Raw PnL across the entire transaction history is useful but often not what the user wants. A tax tool needs PnL for a specific calendar year. A portfolio dashboard might want "last 30 days." A forensic investigation might need "everything between slot X and slot Y."
The solver exposes a PnlFilters struct with four optional fields:
pub struct PnlFilters {
pub after_slot: Option<u64>,
pub before_slot: Option<u64>,
pub start_time: Option<i64>, // Unix timestamp
pub end_time: Option<i64>, // Unix timestamp
}
These filters are applied client-side after the gTFA response (or after the cache hit). On the cold path, the full transaction set is still fetched and cached — we do not attempt to use RPC-level filtering because Solana RPC pagination is signature-cursor-based, not slot-range-based, and partial fetches would leave gaps in the cached history. On the warm path, filtering is effectively free: iterate the deserialized transaction vector, skip transactions outside the range, accumulate deltas for the rest.
A query like "PnL for the last 30 days" hits the cache, walks the transaction list, and applies the start_time filter. Transactions older than 30 days are skipped without being processed. The result is returned in sub-millisecond time, same as an unfiltered query, because the filter check is a single integer comparison per transaction — negligible compared to the deserialization cost that has already been paid.
The design decision to fetch-and-cache everything, then filter client-side, is deliberate. It means the cache contains the complete history. Any subsequent query against any time range or slot range can be served from cache without a new RPC fetch. The alternative — fetching only the requested range — would require a new cold fetch every time the user changes their filter parameters. For a product where users routinely switch between "last 7 days," "last 30 days," "this year," and "all time," caching the full history once and filtering at read time is strictly superior.
The 2-call PnL shortcut
Sometimes you do not need the full history. You just need the answer. "How much SOL did this wallet make or lose?" For a dashboard that displays a single PnL number next to each wallet — the kind of number you see in a portfolio overview — fetching 3,990 transactions is overkill.
The solver includes a shortcut path: two parallel gTFA calls with transactionDetails: "full".
// Call 1: oldest transaction
getTransactionsForAddress(wallet, { sortOrder: "asc", limit: 1, transactionDetails: "full" })
// Call 2: newest transaction
getTransactionsForAddress(wallet, { sortOrder: "desc", limit: 1, transactionDetails: "full" })
These two calls run in parallel. Each returns a single transaction. From the oldest transaction, extract meta.preBalances[wallet_index] — the wallet's balance just before its first-ever transaction. From the newest transaction, extract meta.postBalances[wallet_index] — the wallet's balance just after its most recent transaction. The difference is the net PnL across the wallet's entire lifetime.
Two RPC calls. Parallel, not sequential. Cold path: 220ms. That is 31x faster than the full 40-page fetch, and the result is exact for native SOL PnL (assuming no balance changes from non-transaction sources, which on Solana means staking rewards — a known caveat that we document).
This shortcut does not give you the transaction-level breakdown. It does not give you per-day or per-month PnL. It does not give you realized versus unrealized gains. It gives you one number: net lifetime SOL delta. For a portfolio overview page that needs to render PnL for 50 wallets simultaneously, this is the right tool. The full fetch can happen when the user clicks into a specific wallet to see the details.
The two-call shortcut also populates a minimal cache entry so that subsequent calls to the same wallet are served from L0 at the same sub-millisecond latency as the full-history path.
Why Cachee, not Redis
This is a question we get often enough that it deserves a direct answer.
Redis is a TCP service. Every cache operation — GET, SET, exists check — requires a round-trip over a TCP socket. On localhost, that round-trip is roughly 50-100 microseconds depending on kernel scheduling and connection pool state. Under contention with many concurrent workers, it gets worse. In our production benchmarks on a 96-vCPU Graviton4, TCP-based cache access (whether Redis or Cachee's own RESP protocol proxy) caused an 11x throughput regression compared to in-process access. The numbers: 1.51 million auth/sec with in-process DashMap, 136,670 auth/sec with TCP RESP. Same hardware. Same workload. The only variable was the TCP round-trip.
Cachee's L0 tier is not a TCP service. It is an in-process sharded hash map with CacheeLFU admission control. "In-process" means the cache lives in the same address space as the application. A cache lookup is a function call, not a network operation. There is no serialization step on read (the bincode blob is stored as raw bytes and deserialized directly). There is no connection pool. There is no socket buffer copy. There is no kernel context switch.
The result is sub-microsecond lookup latency. In our production authentication benchmarks, L0 lookups average 0.085 microseconds — that is 85 nanoseconds. For the Solana PnL solver, the lookup itself is similarly fast; the 0.298ms warm-path time is dominated by bincode deserialization of 3,990 transactions, not by the cache lookup.
CacheeLFU is Cachee's admission control policy. It is a frequency-based eviction strategy that keeps frequently accessed entries hot and evicts entries that were accessed only once or twice. For the PnL use case, this means that wallets being actively monitored (the ones users are actually looking at) stay in L0, while wallets that were queried once and never again are evicted to make room. The admission policy adapts automatically. There is no manual TTL tuning required, though the solver does set a TTL as a safety bound.
The CacheeLFU sketch that tracks access frequencies uses a constant 512.17 KiB of memory regardless of the number of keys. At 10 million keys, that is 1,239x less memory than a DashMap storing the same frequency counters. The sketch is probabilistic — it can undercount, never overcount — but the accuracy is more than sufficient for admission decisions.
Could we have used Redis and still gotten a fast warm path? Yes, but the warm path would have been 50-100 microseconds slower per lookup due to the TCP round-trip, and under concurrent load the contention would be significantly worse. For a single-instance solver, in-process L0 is strictly faster. For a distributed deployment where multiple instances need to share cache state, Cachee's L1/L2 tiers provide networked access — but L0 always serves as the first-level hot cache, and for read-heavy workloads like PnL queries, L0 hit rates above 90% are typical.
Incremental cache updates
The cache stores the full transaction history as of the last fetch. But wallets keep transacting. A cache entry from yesterday is missing today's transactions. The solver handles this with an incremental update strategy.
On a warm-path hit, the solver checks the most recent transaction signature in the cached history. It then issues a single gTFA call with before set to the newest cached signature, fetching only transactions that have occurred since the last cache fill. If there are fewer than 100 new transactions (the common case for a wallet queried within the last few hours), this is a single RPC page — one 170ms round-trip instead of 40. The new transactions are appended to the cached history, the bincode blob is re-serialized, and the cache entry is updated.
The result is that the second-ever query for a wallet costs 170ms (one incremental page) instead of 6,814ms (full re-fetch). The third query, if it comes within the same incremental window, is again a sub-millisecond cache hit. The cache stays fresh without paying the full cold-path cost.
Architecture of the solver
The solver is structured as a Rust library crate with two public entry points:
/// Full PnL with complete transaction history
pub async fn compute_pnl(
wallet: &Pubkey,
rpc_url: &str,
filters: Option<PnlFilters>,
cache: &CacheeL0,
) -> Result<PnlResult>
/// 2-call shortcut for dashboard estimates
pub async fn compute_pnl_fast(
wallet: &Pubkey,
rpc_url: &str,
cache: &CacheeL0,
) -> Result<PnlEstimate>
The PnlResult struct contains the net PnL in lamports, the transaction count, the filtered transaction count (if filters were applied), the per-transaction delta vector (for downstream breakdown by day/week/month), and cache metadata (hit/miss, L0/L1, lookup latency). The PnlEstimate struct contains the net PnL and the two boundary transactions.
Internally, the gTFA pagination loop is a straightforward async iterator. Each page is fetched with reqwest, deserialized with serde_json, and appended to a growing Vec<TransactionWithMeta>. The pagination cursor is the last transaction's signature. When a page returns fewer than 100 transactions, pagination is complete.
The PnL accumulator walks the transaction vector once, in order, extracting the balance delta for the target wallet's account index from each transaction's preBalances and postBalances arrays. The account index is identified by matching the wallet's pubkey against the transaction's accountKeys array. The total delta is the sum of all per-transaction deltas. This is O(n) in the number of transactions with a small constant factor — one array scan and one subtraction per transaction.
The cache layer is a thin wrapper around Cachee's L0 API. On miss, the full transaction vector is serialized with bincode::serialize and stored under history:{wallet}. On hit, the blob is retrieved and deserialized with bincode::deserialize. The cache key includes the wallet address to ensure per-wallet isolation. No global invalidation is needed because each wallet's history is independent.
What this means for Solana infrastructure
The Solana PnL problem is a specific instance of a general pattern: applications that need to reconstruct historical state from paginated RPC endpoints. DeFi analytics, MEV analysis, liquidation monitoring, whale tracking, on-chain forensics — all of these follow the same shape. Paginated fetch, sequential I/O, latency dominated by network round-trips.
The Cachee pattern we demonstrate here is general. Fetch once, serialize into a compact binary format, store in L0, serve subsequent reads from memory. The specific details change — the key structure, the serialization format, the cache invalidation strategy — but the architecture is the same. And the speedup profile is the same: four to five orders of magnitude between the cold path (bounded by network I/O) and the warm path (bounded by deserialization and compute).
22,857x is not a theoretical number. It is not a projection. It is not "up to" anything. It is the measured ratio between the cold median and the warm median for a specific wallet with a specific number of transactions against a specific RPC provider. Different wallets will produce different ratios — a wallet with 100 transactions has a faster cold path and thus a smaller ratio; a wallet with 100,000 transactions has a slower cold path and thus a larger ratio. But the warm path is roughly constant regardless of transaction count, because the cache hit is an O(1) lookup and the bincode deserialization is fast enough that even large blobs decode in sub-millisecond time.
The solver is a proof of concept, but the pattern is production-ready. At H33, Cachee's L0 tier handles the ZKP cache for our authentication pipeline — the same in-process DashMap architecture, the same CacheeLFU admission control, the same sub-microsecond lookup latency. The Solana PnL solver is a different application of the same infrastructure.
Open source
The full solver is published at github.com/H33ai-postquantum/solana-pnl-cachee. It is Rust, with MIT-compatible dependencies. The cachee-core crate requires the local workspace (it is not published to crates.io independently), but the solver itself is straightforward to build and run.
To reproduce the benchmark:
git clone https://github.com/H33ai-postquantum/solana-pnl-cachee
cd solana-pnl-cachee
export HELIUS_API_KEY=your_key_here
cargo run --release --example bench_10x -- vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg
You will need a Helius API key (free tier is sufficient for this benchmark). The first pass will take approximately 7 seconds. The remaining nine passes will each complete in under half a millisecond. The benchmark prints the full statistics table shown above.
The solver is intentionally minimal. It does not include a REST API layer, a WebSocket subscription layer, or a UI. It is a library and a benchmark. The point is to demonstrate the caching pattern and the resulting performance characteristics, not to ship a product. If you are building a Solana PnL product and want to integrate the pattern, the code is there. Read it, adapt it, measure it against your own workload.
Summary of the numbers
For reference, the complete measurement set:
Wallet: vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg
Transactions: ~3,990
RPC pages (cold): 40
Page size: 100 transactions (transactionDetails: "full")
RPC latency per page: ~170ms
Cold path: 6,814 ms
Warm median: 0.298 ms
Warm min: 0.274 ms
Warm max: 0.401 ms
Warm mean: 0.326 ms
Speedup (cold/warm): 22,857x
2-call shortcut: 220 ms (parallel, cold)
Cache L0 hits: 9
Cache misses: 1
Cache hit ratio: 90%
Cache memory: 470.9 KiB
Serialization: bincode
Admission policy: CacheeLFU
6,814ms to 0.298ms. 22,857x. That is what happens when you stop paying the network tax.
Try Cachee
Cachee's L0 hot tier delivers sub-microsecond lookups with CacheeLFU admission control. See what it does for your workload.
Get API Key Read the Docs