How often does HypeLab train new ML models?

HypeLab trains new prediction models every two weeks. This cadence balances freshness (crypto user behavior changes quickly) with stability (too-frequent retraining adds risk). Not every new model beats the champion, which is why rigorous testing gates exist before promotion.

What happens if a challenger model performs poorly in production?

If the challenger model underperforms during A/B testing, HypeLab's system automatically rolls back to the champion model. At 30% traffic, if the challenger shows statistically significant degradation, traffic immediately shifts back to the champion. Slack alerts notify the ML team, but no manual intervention is required to protect production.

Champion-Challenger ML Model Promotion

Why does this matter for crypto advertisers? Ad prediction models that go stale cost you money. When a model trained on January data serves March traffic, it misses new tokens, shifted sentiment, and changed user portfolios. HypeLab solves this with a champion-challenger framework that promotes only proven-better models to production, automatically rolls back underperformers, and keeps CTR predictions accurate as crypto markets evolve.

Quick answers:

How often does HypeLab retrain? Every two weeks, balancing freshness with stability.
How many models compete? 50 candidates per training run, evaluated on held-out data.
What if a new model is worse? Automatic rollback at 30% traffic if degradation is detected.
Who uses this system? Every campaign on HypeLab, from DeFi protocols like Uniswap to NFT marketplaces and prediction markets.

Every two weeks, HypeLab trains a new prediction model. Fresh data flows in, training kicks off, and 50 candidate models compete to become the next challenger. But a new model is not automatically better than the one currently serving crypto ad campaigns. Sometimes the champion holds its ground. Sometimes the challenger wins. The key is having a rigorous process to determine which is which.

HypeLab's champion-challenger framework ensures that only genuinely better models reach production. The framework has three gates: training selection, calibration testing, and production A/B testing. Each gate filters out models that would degrade performance. The result is a system where model quality improves monotonically over time, similar to how Netflix and Spotify continuously optimize their recommendation engines.

Why Does HypeLab Retrain Every Two Weeks?

Two weeks is the sweet spot for crypto ad networks. Crypto markets and user behavior shift faster than traditional finance but slower than hour-by-hour trading patterns. A model trained on January data captures January patterns. By mid-February, new tokens have launched, market sentiment has shifted, and user portfolios have changed. This cadence captures meaningful changes without creating deployment overhead.

Shorter cadences (daily, weekly) would mean constant model churn. Each deployment has risk. Each A/B test takes time. Running through the promotion pipeline every week would mean models barely finish testing before the next challenger arrives.

Longer cadences (monthly, quarterly) would mean stale models. Crypto user behavior in January might look nothing like April. A quarterly retrained model would miss DeFi summer on Uniswap, NFT winter on OpenSea, and prediction market spikes on Polymarket.

Training data volume: Each training run uses 200 million impressions from the past 8 weeks, with higher weight on recent data. Older data provides base patterns. Recent data captures current trends. The weighting scheme ensures the model stays fresh while retaining learned fundamentals.

How Does HypeLab Select the Best Model from Training?

HypeLab trains 50 candidate models with different model configuration parameters, then selects the best performer on held-out data. This ensemble approach ensures the winning model is not a lucky accident. To understand how we train these models efficiently in parallel, see our article on distributed training of 50 models in under an hour.

These 50 models train in parallel on our cloud ML platform using distributed computing infrastructure. Training takes 4-6 hours depending on data volume. When training completes, each model is evaluated on held-out validation data that was not used during training.

Evaluation metrics include:

Ranking accuracy: Measures how well the model separates clicks from non-clicks. Higher ranking accuracy means better discrimination between positive and negative examples. We use precision-focused ranking metrics rather than accuracy for this evaluation.
Log loss: Measures calibration of predicted probabilities. Lower log loss means predictions are more accurate as probabilities, not just rankings.
Calibration error: The average absolute difference between predicted CTR buckets and actual CTR. Good models have calibration error under 5%.
Feature importance stability: If important features shift dramatically from the previous model, something might be wrong with the data pipeline.

The winning candidate is the model with the best combination of AUC and calibration. Pure AUC optimization can produce models that rank well but predict probabilities poorly. HypeLab needs both because bid optimization and budget pacing depend on calibrated predictions.

What Is Calibration Testing and Why Does It Matter?

Calibration testing validates that a model's predicted probabilities match real-world outcomes. The training selection winner faces this second gate against data from 2-3 days after training completed. This data was generated by the current production model and represents real-world conditions.

Why test on post-training data? Training data comes from the past. The feature pipeline might have changed. Publisher inventory might have shifted. A model that performs well on historical validation data might fail on current production conditions.

Calibration testing runs the challenger model's inference on recent impressions and compares predictions to actual outcomes. If the challenger predicts 1.5% CTR for a set of impressions that actually had 2.0% CTR, calibration is off. The model might still be better than the champion, but HypeLab wants to know about calibration drift before A/B testing starts.

Calibration test requirements:

Predicted/actual CTR ratio	Within acceptable bounds
Ranking accuracy drop from validation	Within calibration thresholds
Publisher segment calibration error	Within calibration thresholds
Inference latency (p99)	Real-time

Models that fail calibration testing do not enter A/B testing. The ML team investigates why calibration degraded. Common causes include feature pipeline bugs, label delays, or distribution shift in the data.

How Does HypeLab A/B Test Models in Production?

Models that pass calibration testing enter production A/B testing against the current champion, serving real impressions to advertisers running campaigns on HypeLab. This is the definitive test: which model performs better on live traffic from Web3 apps like Phantom, StepN, and DeFi platforms?

HypeLab uses five-phase progressive rollout for A/B testing. Traffic allocation starts at 3% for the challenger and increases through 10%, 20%, 40%, and finally 50%. Each phase has statistical guardrails that must pass before advancing.

The primary metric is CTR: click-through rate on served impressions. A model that produces higher CTR is delivering more value to advertisers (more clicks per dollar) and publishers (more engaged ads mean better user experience).

Secondary metrics include calibration (does the model's confidence match reality?), latency (does inference stay fast?), and error rates (does the model fail on edge cases?). A challenger could win on CTR but lose on calibration, which would raise questions about deployment.

Want to see this ML infrastructure work for your campaigns? Create a HypeLab account and launch your first crypto ad campaign in minutes. Every impression you serve benefits from this champion-challenger system.

What Happens If a Challenger Model Underperforms?

Automatic rollback protects advertiser campaigns from bad models. The most dangerous phase is around 30% traffic: enough to cause real damage if the model is bad, but not enough to have overwhelming statistical confidence. HypeLab has special safeguards at this phase.

If the challenger model shows statistically significant degradation at 30% traffic, automatic rollback triggers immediately. "Statistically significant degradation" means the Bayesian posterior probability of the challenger being worse than the champion exceeds 95%. No human intervention required; the system protects production performance automatically.

Rollback trigger: When P(challenger CTR < champion CTR) > 0.95 at any traffic level above 20%, automatic rollback executes. Traffic shifts to 100% champion within 60 seconds. Slack alert fires with details for post-mortem.

This automatic rollback is non-negotiable. A human might hesitate, want to collect more data, or hope the trend reverses. The system does not hesitate. Protection of production performance is more important than giving a struggling model a chance to recover. For advertisers, this means their campaigns never suffer from a bad model deployment.

What Does HypeLab Learn When a Challenger Loses?

Not every challenger becomes champion, and that is valuable information. Sometimes the current model is genuinely better. Sometimes the new training data contained noise that degraded model quality. Sometimes a bug in preprocessing produced subtly wrong features.

When a challenger loses, HypeLab archives the model and logs the failure details. The ML team reviews why the model underperformed:

Data quality issues: Were there labeling delays? Bot traffic contamination? Publisher data pipeline failures?
Configuration problems: Did the winning candidate from training selection actually overfit to validation data?
Distribution shift: Did something change in the two weeks between training and A/B testing that made the training data unrepresentative?
Feature engineering: Are there new signals that should be added? Old signals that have become stale?

Losing challengers provide learning opportunities. HypeLab's model improvement comes not just from promoting winners but from understanding why losers lost.

How Does a Challenger Become the New Champion?

When a challenger passes all five phases of A/B testing with statistically significant improvement, it becomes the new champion. The promotion process:

Shift 100% of traffic to the new champion
Archive the old champion in the model registry (available for emergency rollback)
Update model version tags in monitoring dashboards
Log the promotion metrics: CTR improvement, calibration delta, latency comparison
Notify the team via Slack with promotion summary

The old champion remains available for 30 days. If the new champion shows unexpected problems in the days after promotion, HypeLab can quickly rollback. This has happened twice in the past year: once due to a rare edge case the A/B test did not surface, once due to a data pipeline change that affected inference but not testing.

What Visibility Do Teams Have into the ML Pipeline?

HypeLab's ML team has full visibility into every stage of the champion-challenger process. Dashboards show:

Training phase: Progress of the 50 candidate models, validation metrics as they complete, feature importance rankings, selected winner

Calibration phase: Predicted vs actual CTR plots, calibration error by publisher segment, latency distribution, pass/fail status

A/B testing phase: Current traffic split, cumulative metrics for both models, posterior probability of challenger winning, projected time to significance

The system is automated but not opaque. When decisions need human judgment, like interpreting an unusual A/B test result, the data is readily available. When decisions are routine, like advancing from 10% to 20% traffic after passing guardrails, automation handles it.

Why Should Crypto Advertisers Care About ML Model Freshness?

Crypto behavior changes faster than any other advertising vertical. DeFi yields shift on Aave and Compound. New chains like Base and Arbitrum launch. Market sentiment swings from fear to greed and back. A static prediction model trained once and deployed forever would quickly become stale, wasting advertiser budget on poorly targeted impressions.

HypeLab's champion-challenger framework ensures the prediction model stays fresh without risking production stability. Every two weeks, there is an opportunity for improvement. Bad models get caught by training selection, calibration testing, or A/B testing. Good models promote smoothly. In a typical quarter, roughly two-thirds of challengers earn promotion, each delivering measurable CTR improvements for advertisers.

For advertisers, this means predictions that reflect current reality, not January patterns in March. For publishers, this means ads that match their audience's current interests. For HypeLab, this means a system that improves itself while protecting against regression.

How HypeLab compares to other Web3 ad networks: Most crypto ad networks like Coinzilla, Bitmedia, and A-Ads train models quarterly if at all. Most deploy without rigorous A/B testing. Most lack automatic rollback. HypeLab's infrastructure is what production-grade ML looks like, built on the same principles that power prediction systems at Google Ads and Meta.

Ready to run campaigns on a platform that continuously improves its prediction models? Create your HypeLab account and launch your first Web3 ad campaign in minutes. For publishers looking to monetize with high-quality, brand-safe ads, learn how HypeLab's ML-powered ad serving maximizes your revenue.

Frequently Asked Questions

: HypeLab trains new prediction models every two weeks. This cadence balances freshness (crypto user behavior changes quickly) with stability (too-frequent retraining adds risk). Not every new model beats the champion, which is why rigorous testing gates exist before promotion.
: If the challenger model underperforms during A/B testing, HypeLab's system automatically rolls back to the champion model. At 30% traffic, if the challenger shows statistically significant degradation, traffic immediately shifts back to the champion. Slack alerts notify the ML team, but no manual intervention is required to protect production.
: Each training run produces 50 candidate models with different configurations. HypeLab evaluates each on held-out validation data, selecting the one with the best combination of CTR prediction accuracy and calibration. This winner then faces additional testing against recent production data before entering A/B testing against the current champion.

Continue Reading

Technology & Product

AI-Powered Crypto Ad Targeting by Wallet

AI crypto ad targeting now analyzes wallet behavior, on-chain activity, and DeFi interactions to identify high-value Web3 users across 200+ publishers.

April 6, 2026·14 min read

Technology & Product

Why Accuracy Fails for Web3 Ad Prediction

Most crypto ad networks use accuracy to evaluate click prediction — and get it wrong. Learn why precision-focused metrics improve ROI on Web3 ad platforms.

February 28, 2026·10 min read

Technology & Product

Building a Self-Healing ML Ad Pipeline

How HypeLab evolved from manual SQL queries to an automated ML pipeline that trains on 200M data points and self-heals without human intervention.

February 25, 2026·13 min read

All articles

Champion-Challenger ML Model Promotion

Why Does HypeLab Retrain Every Two Weeks?

How Does HypeLab Select the Best Model from Training?

What Is Calibration Testing and Why Does It Matter?

How Does HypeLab A/B Test Models in Production?

What Happens If a Challenger Model Underperforms?

What Does HypeLab Learn When a Challenger Loses?

How Does a Challenger Become the New Champion?

What Visibility Do Teams Have into the ML Pipeline?

Why Should Crypto Advertisers Care About ML Model Freshness?

Frequently Asked Questions

Continue Reading

AI-Powered Crypto Ad Targeting by Wallet

Why Accuracy Fails for Web3 Ad Prediction

Building a Self-Healing ML Ad Pipeline

Contact our sales team.