Thompson Sampling is a Bayesian algorithm that solves one of the hardest problems in web3 advertising: deciding whether to show ads to audiences you know convert well, or test new audiences that might convert even better. Unlike static A/B tests that waste budget on equal splits, Thompson Sampling learns in real-time and shifts spend toward winners automatically. Top crypto ad networks including HypeLab, along with Google and Meta, use variants of this algorithm to maximize conversions.
Quick answer: Thompson Sampling maintains probability distributions for each audience segment's conversion rate. When allocating an impression, it samples from each distribution and picks the highest value. Segments with less data have wider distributions (more exploration). Segments with lots of data have narrow distributions clustered around true performance (more exploitation). The algorithm naturally transitions from exploration to exploitation as confidence grows.
Consider a crypto advertiser with budget to spend across dozens of publishers. Historical data shows DeFi power users on wallet dashboards like Zerion and Rainbow convert well. But what about NFT collectors on OpenSea or Blur? What about casual readers on CoinDesk or The Block? Maybe one of these unexplored segments converts even better. This is the explore-exploit dilemma that Thompson Sampling solves mathematically.
What Is the Explore-Exploit Problem in Advertising?
Imagine a slot machine with 10 levers. Each lever has a different (unknown) payout probability. You have 1,000 pulls. How do you maximize total payout?
Pulling only the lever that paid out most in your first few tries (exploitation) might miss the best lever. Pulling each lever equally regardless of results (exploration) wastes pulls on bad levers. The optimal strategy balances learning which levers are best while concentrating plays on the better ones.
In advertising, the levers are campaign-publisher-user segment combinations. The payout is conversion rate. The pulls are impressions. The question is identical: how do you allocate impressions to maximize conversions while learning which combinations work best?
The advertising parallel: A campaign targets 10 publisher segments. Each segment has an unknown conversion rate. You have 100,000 impressions to allocate. Showing all impressions to the historically best segment might miss a better one. Splitting evenly wastes impressions on bad segments. Thompson Sampling finds the optimal balance.
Why Do Simple Optimization Approaches Fail?
Most blockchain advertising platforms rely on basic optimization methods that leave money on the table. Here's why each approach falls short:
Greedy Exploitation
Always show ads to the best-performing segment so far. The problem is that noisy initial data can lock you onto a suboptimal segment. A segment with 2 conversions in 10 impressions looks better than one with 5 conversions in 100 impressions, but the latter is far more reliable.
Epsilon-Greedy
Exploit most of the time, but randomly explore X% of impressions. The problem is that exploration is undirected. You might waste impressions re-exploring segments you already know are bad. The epsilon parameter requires tuning and does not adapt to data.
Fixed A/B Testing
Split traffic evenly until you have enough data, then switch to the winner. The problem is that you waste half your impressions on the loser during the test period. For crypto campaigns with limited budgets, this exploration cost is painful.
UCB (Upper Confidence Bound)
Choose the segment with highest potential based on confidence intervals. The problem is that UCB is deterministic, always picking the same segment given the same data. This leads to suboptimal behavior in the non-stationary environments common in web3 advertising.
| Method | Exploration | Exploitation | Regret Growth | Best For |
|---|---|---|---|---|
| Greedy | None | Maximum | Linear | Stable, known audiences |
| Epsilon-Greedy | Random X% | Rest | Linear | Simple implementations |
| A/B Testing | 50/50 split | After test | Linear during test | One-time decisions |
| UCB | Confidence-based | Deterministic | Logarithmic | Stationary environments |
| Thompson Sampling | Uncertainty-based | Probabilistic | Logarithmic | Non-stationary, volatile markets |
The bottom line: Traditional optimization methods either explore too much (wasting budget) or exploit too early (missing better opportunities). Thompson Sampling provides a mathematically optimal balance that adapts automatically, reducing wasted spend by 15-30% compared to fixed A/B testing in volatile environments.
How Does Thompson Sampling Work?
Thompson Sampling takes a Bayesian approach. For each segment, maintain a probability distribution over its true conversion rate. When allocating an impression, sample from each segment's distribution and pick the segment with the highest sampled value.
The key insight: segments with high uncertainty (little data) have wide distributions, so their samples sometimes come out very high, triggering exploration. Segments with low uncertainty (lots of data) have narrow distributions, so their samples cluster around the true mean. This naturally balances exploration and exploitation.
Thompson Sampling algorithm:
1. For each segment, model conversion rate as Beta(successes + 1, failures + 1)
2. To allocate an impression, sample from each segment's Beta distribution
3. Show the ad to the segment with the highest sampled value
4. Observe outcome (conversion or not)
5. Update the winning segment's distribution (increment success or failure count)
Early on, when all segments have little data, samples vary widely and exploration happens naturally. As data accumulates, high-performing segments narrow to high values and get most traffic, while low performers narrow to low values and get little traffic. The algorithm automatically shifts from exploration to exploitation as confidence increases.
What Are the Mathematical Properties of Thompson Sampling?
Thompson Sampling achieves logarithmic regret, meaning the cumulative loss from suboptimal decisions grows slowly (logarithmically) with the number of impressions. This is provably optimal among algorithms that do not know true conversion rates in advance.
The Bayesian framework also provides natural uncertainty quantification. At any point, you can ask "what is the probability that segment A is actually the best?" and get a principled answer by comparing posterior distributions. This transparency is valuable for crypto advertisers who want to understand exactly how their budget is being allocated.
Regret comparison across algorithms: Greedy exploitation has linear regret (grows proportionally with impressions). Epsilon-greedy has linear regret from forced exploration. UCB and Thompson Sampling have logarithmic regret. In academic benchmarks, Thompson Sampling achieves 10-25% lower cumulative regret than UCB in non-stationary environments - exactly the conditions found in blockchain advertising where market sentiment shifts rapidly.
How Does HypeLab Apply Thompson Sampling to Budget Allocation?
HypeLab's web3 ad platform applies Thompson Sampling to campaign-publisher-user segment combinations across its premium inventory network. For a campaign nearing the end of its budget, the system allocates remaining impressions to maximize conversions automatically.
Each combination of campaign, publisher category, and user type is treated as a segment. User types might include: DeFi power users (high on-chain activity on Uniswap or Aave), NFT collectors (holdings on OpenSea or Magic Eden), casual readers (low crypto activity), and new wallets (recently created).
The Thompson Sampler maintains conversion statistics for each segment. When an impression opportunity arises, it samples from each segment's posterior and routes the impression to the highest-sampled segment that matches the targeting constraints.
How Does Thompson Sampling Fit Into a Complete Ad Tech Stack?
Thompson Sampling is one component of HypeLab's multi-model crypto ad network infrastructure. It interacts with several other systems in the real-time bidding pipeline:
- PCTR: Predicts click probability for specific impressions. Thompson Sampling allocates across segments; PCTR scores within segments.
- CVR scoring: Provides publisher-level quality signals. Thompson Sampling refines at the segment level within publisher categories.
- Lambda Pacer: Controls spending velocity. Thompson Sampling decides where to spend; pacing decides how fast.
The models are complementary. PCTR might say "this specific impression will get clicked." Thompson Sampling might say "but this campaign should explore gaming publishers more." The programmatic RTB auction weighs both signals to determine the final bid.
Ready to see smarter budget allocation in action? Launch a campaign on HypeLab and let Thompson Sampling find your highest-converting audiences automatically. Self-serve setup takes minutes, with real-time analytics via our BigQuery integration.
How Does Thompson Sampling Handle Changing Market Conditions?
Crypto user behavior changes rapidly with market conditions, new protocol launches, and shifting narratives. A segment that converted well during bull markets might underperform in bear markets. A web3 advertising platform needs algorithms that adapt without manual intervention. Thompson Sampling handles non-stationarity through discounting: older observations contribute less to the posterior than recent ones.
HypeLab implements time-weighted Thompson Sampling where conversion counts decay over time. A conversion from yesterday counts more than one from last month. This allows the algorithm to adapt to changing conditions without explicit detection of distribution shifts.
Non-stationarity handling:
Full Thompson Sampling: Beta(50 successes, 450 failures) from all time
Time-weighted: Beta(15 recent successes, 135 recent failures) weighted toward last 14 days
The time-weighted version responds faster to changes. If a segment starts converting better due to market shifts, the recent successes quickly raise its posterior and it starts receiving more traffic.
What Challenges Arise When Implementing Thompson Sampling at Scale?
Building Thompson Sampling into a production crypto ad network requires addressing several practical engineering challenges:
Segment Granularity
Too many segments means sparse data and slow learning. Too few segments means missing important heterogeneity. HypeLab balances by using meaningful segments (publisher category x user type) rather than arbitrary divisions.
Cold Start
New campaigns have no segment-level data. HypeLab initializes with network-wide priors: segments that historically convert well start with optimistic priors, encouraging their selection until campaign-specific data accumulates.
Computational Efficiency
Sampling from Beta distributions is computationally cheap. The main cost is maintaining segment statistics, which HypeLab handles through efficient data structures updated incrementally.
Multi-Armed Bandit vs Contextual Bandit
Pure Thompson Sampling treats segments as independent arms. Contextual bandits incorporate features to generalize across segments. HypeLab uses contextual approaches where segment-level features (publisher quality scores, user intent signals) inform the priors.
Why Is Thompson Sampling Essential for Web3 Advertising?
The crypto and web3 advertising landscape has unique characteristics that make Thompson Sampling particularly valuable compared to traditional digital advertising:
- Rapid market shifts: User behavior changes with market conditions, from Ethereum DeFi season to Solana meme coins to Base ecosystem growth. Adaptive exploration discovers new opportunities before competitors.
- Diverse audience segments: Crypto users range from DeFi degens trading on dYdX to NFT collectors on Blur to casual holders on Coinbase. A web3 ad platform must find the best segment for each campaign automatically.
- Constrained budgets: Many blockchain ads campaigns have limited budgets compared to traditional finance. Efficient exploration minimizes waste and maximizes ROI.
- Novel products: New protocols like EigenLayer or Monad lack historical data. Exploration is essential for campaigns promoting novel products to crypto-native audiences.
For advertisers, Thompson Sampling means their campaigns automatically discover high-converting segments without manual experimentation. A campaign might discover that gaming users convert better than expected, and shift budget accordingly.
For publishers, Thompson Sampling means campaigns find the right audiences on their sites. A gaming publisher might receive campaigns they would not have won based solely on historical averages, because exploration reveals their audience converts well for specific products. Learn more about how HypeLab's ML stack delivers results in our advertiser case studies.
Q: How is Thompson Sampling different from what other crypto ad networks use?
A: Most blockchain advertising platforms rely on simple rules-based allocation or basic A/B testing. Thompson Sampling is the same class of algorithm used by Google Ads and Meta, but adapted for the unique volatility of web3 audiences. HypeLab is one of the few crypto ad networks implementing this level of ML sophistication.
What Does the Future Hold for Adaptive Ad Optimization?
Thompson Sampling is part of HypeLab's evolving ML stack for the web3 advertising ecosystem. Combined with PCTR prediction, conversion rate scoring, and budget pacing, it creates a comprehensive system for optimal ad serving across premium crypto publisher inventory.
The explore-exploit balance is fundamental to any learning system. Static models cannot discover new opportunities. Pure exploration cannot capitalize on knowledge. Thompson Sampling provides the principled middle ground that production advertising systems need, and it's especially valuable in the fast-moving crypto market.
Google, Meta, and major ad platforms use variants of Thompson Sampling. HypeLab is bringing this sophistication to the crypto ad network space, where the dynamic environment makes adaptive learning even more valuable than in traditional digital advertising.
Key takeaways:
- Thompson Sampling balances exploration and exploitation mathematically, outperforming A/B tests and greedy optimization
- The algorithm adapts automatically to changing market conditions without manual intervention
- HypeLab implements time-weighted Thompson Sampling tuned for crypto's volatility
- Combined with PCTR, CVR scoring, and pacing, it forms a complete programmatic RTB stack
Next step: Create your free HypeLab account to access Thompson Sampling-powered budget allocation for your crypto campaigns. Self-serve setup, transparent reporting, and dual payment rails (crypto + credit card) make it easy to get started.
Frequently Asked Questions
- The explore-exploit tradeoff is choosing between showing ads to user segments with known good performance (exploit) versus trying new segments that might be even better (explore). Pure exploitation misses opportunities in unexplored segments. Pure exploration wastes budget on experiments. Thompson Sampling mathematically balances these competing needs.
- A/B testing splits traffic evenly and evaluates afterward. Thompson Sampling continuously shifts traffic toward better-performing options while maintaining enough exploration to discover improvements. It learns and adapts in real-time rather than waiting for test completion. This reduces regret, which is the lost value from showing suboptimal ads during the learning period.
- Crypto user behavior changes rapidly with market conditions, new protocols, and shifting narratives. A segment that performed well during DeFi summer might underperform during NFT season. Thompson Sampling continuously re-evaluates segment performance, naturally adapting to changes without requiring manual intervention or predefined test schedules.



