The cleaner the signal, the less it's yours

A structural limit of on-chain whale research — and why we publish it anyway

Swiss Whale Intelligence — Honest Research

We spent a year running on-chain whale signals as live trading rules on a real Kraken account. The point was not to get rich; it was to find out whether the thing we track has a tradable edge. It does not. Across 370 strictly-live trades the win rate was 74.9 % — three profitable trades out of four — and the equal-weight return still fell to −96 % before a single fee was charged. The average loss was four times the average size of the average win, and that asymmetry alone sinks the account; fees only deepen the hole they did not dig. You can slide the fee to zero and the curve still bleeds.¹

That result is clean, and it killed a hypothesis we would have liked to keep. But the more interesting thing happened next. Trying to design an honest test of "do whale movements predict price" forced a question that turned out to be deeper than the original one: not is there an edge, but can these claims even be tested fairly? Working through that question produced a finding we have not seen anyone in this industry state plainly, for a reason that is itself the point.

Three things you want, and can't all have

A whale signal you would want to build on has three desirable properties. They sound independent. They are not.

Cleanliness. Is the trigger a point-in-time fact, or does it smuggle in hindsight? On-chain primitives — a coin's value, its age, a realized-profit ratio at the moment it moves — are clean: the blockchain at a given height is immutable, so a metric recomputed from it carries no look-ahead. Entity and cluster labels are not: "this address is Exchange X" is a judgement our pipeline makes on today's co-spend graph and applies backward. An address that looks like an exchange now may not have been identifiable as one at the time we are studying. For describing the present, fine. For a historical test, contamination.

Differentiation. Does the signal use the proprietary whale infrastructure — the clustering, the entity attribution, the years of indexed history — or is it a commodity number that Glassnode and CryptoQuant already publish? "Exchange netflow volume" is a commodity; anyone with a node computes it. "This specific dormant whale just woke" is differentiated; it needs the machinery.

Answerability. Is the event-generating process stationary and exogenous enough that a pre-registered test can ever return an informative verdict — rather than a permanent shrug? This is the axis we did not see until the data forced it. A signal can be clean and differentiated and still be structurally untestable, because the process that emits the events is unstable, or because the event and the outcome share a common driver.

Here is where the three candidates we evaluated actually sit:

Signal	Differentiation	Cleanliness	Answerability
Whale → exchange inflow (the iconic one)	high	medium (needs labels)	low
Dormant whale wakeup (old coins moving)	medium	high (label-free)	high
Continuous exchange netflow (commodity)	low	high	high

Read the columns and the diagonal jumps out. As differentiation rises, cleanliness and answerability fall. The most iconic, most moat-using signal is the least testable. The most testable, cleanest signal is the least proprietary.

Why the tension is structural, not bad luck

This is not an artifact of our particular data or a temporary gap in coverage. It follows from what makes a whale signal proprietary in the first place.

Differentiation in this domain is identity attribution — naming who moved. But naming is exactly what contaminates (labels are retroactive, not point-in-time) and what destabilizes the process (a named population is a specific set of actors, and specific actors migrate). We watched this happen in the data. The "whale deposits to exchange" signal — the retail-famous, most marketable claim we have — comes from a process whose annual event count swung 1 → 20 → 115 → 24 over four years, a coefficient of variation near 1.1. The species that emits the event is leaving: large holders increasingly move size off-chain through OTC desks and ETF custody. A pre-registered test on it could wait three years and still return an INCONCLUSIVE that cannot distinguish "no effect" from "the population that generated this event is gone." The very flow we'd most like to sell a verdict on is the one whose verdict the world has structurally removed from reach.

Now look at the clean, label-free alternative — dormant wakeups, defined purely by a coin's size and age, no attribution at all. Its process is ~3× more stationary (CV ≈ 0.4), and we feared it would be endogenous — old hands wake in rallies and capitulations, so event and outcome might share a price driver — but measured, wakeup days carry only 1.09× baseline volatility: essentially exogenous. It is the answerable, clean choice. Its only deficit is on the third axis: "old coins moving" is a less iconic story than "whales dumping on Coinbase." We bought answerability with a little glamour.

The pattern generalizes. The clean, answerable signals are the ones anyone can compute, because purity and stability are properties of public chain primitives, and public primitives are not a moat. The differentiated signals are differentiated because they rest on proprietary judgement calls, and judgement calls are where contamination and non-stationarity enter. Rigor and differentiation are negatively correlated in on-chain whale research. That is the finding.

Why nobody writes this down

A finding that no competitor publishes is worth asking about. The reason here is simple: it indicts the product. If you sell "whale signals," you cannot afford to say in public that the proprietary part of your offering is, almost by construction, the least rigorously testable part — and that the part you can test rigorously is the part you don't own. The sentence is true and it is commercially radioactive for a signal vendor.

We can say it, and we gain by saying it, precisely because our product is not a signal. The moat, once you accept the trade-off above, cannot live in any single proprietary number — the data just showed those are the unanswerable ones. So we move the moat. It lives in the discipline applied to the whole space: pre-registration, publishing both outcomes including the null, showing the machinery that decides whether a finding is real, and refusing to dress a wide confidence interval as an insight. The honesty we apply to a commodity-clean metric is itself the thing that is not a commodity. When differentiated signals can't be tested rigorously and rigorous signals aren't differentiated, you stop competing on signals and compete on rigor.

What we actually do about it

Three concrete moves, each visible:

The flagship is chosen on answerability, not iconicity. Our pre-registered study tests dormant wakeups — the clean, stationary, exogenous process — not exchange inflows, even though inflows are the trophy. We wrote down why the trophy was disqualified, on its own data, before freezing.² The test is built as a darling-killer: its floor is set so that refuting our most flattering hypothesis is reachable, and confirming it would require an implausibly large effect. The likely outcome is "cannot say," and we have committed in advance to publishing that outcome as a first-class result.
The commodity results get published too, honestly. The fastest, best-powered version of any of these claims is a plain netflow-vs-price regression — pure commodity. We will run it and publish the pre-registered null on it, because "even the simplest, best-powered form of this famous claim does not survive registration" is a credibility statement no data vendor makes, exactly because it questions their own product.
Descriptive, never prescriptive. The Kraken experiment is the exhibit: a 75 % win rate that loses money is what an absent edge looks like dressed in a flattering metric. So we ship descriptive on-chain intelligence — what the data does and does not say — and not buy/sell signals. Including, especially, when the data says this does not work.

This essay is itself an instance of the strategy. It devalues our most marketable asset — the idea that proprietary whale attribution yields tradable foresight — and we publish it because we would rather be trusted than impressive. In a field whose entire commercial gravity pulls toward overclaiming, the durable position is the one that holds when the confidence interval is wide: not here are the whale signals that work, but here is why they're so hard to establish, and here is exactly what we did to avoid fooling you and ourselves.

Source: trading_history_live, 370 strictly-live Kraken trades (Jul 2025 – Jun 2026), gross per-trade price returns, equal weight. Interactive version with the fee slider accompanies this piece; the full trade-by-trade ledger is public and SHA256-verifiable at /ledger/. ↩
Pre-registration v2 (dormant wakeups), frozen 2026-06-09 12:17 UTC at git commit fec427a, data cutoff block 952966, candidate cohort immutably snapshotted (4,483 unspent ≥500 BTC UTXOs); superseded design v1 (exchange inflows) retired on the record for non-stationarity. Both are published as part of this series. ↩