Change Significance Tracker

Analyze which AI model rank changes are statistically significant vs. random noise. Uses z-score analysis with 95% confidence intervals across 0 models with sparkline history.

Models Analyzed

Significant Changes

Noise (Not Significant)

Both Timeframes

Statistically Significant Changes

0 models with z-scores exceeding the 95% confidence threshold (|z| > 1.96), sorted by significance strength.

No significant changes

All model score changes fall within normal statistical variance.

Multi-Timeframe Analysis

Cross-referencing daily (24h) and weekly (7d) rank changes to identify the strongest signals. Models significant on both timeframes represent the most reliable trend shifts.

Significant on Both Timeframes(strongest signals)

No models currently significant on both daily and weekly timeframes.

Daily Only(may be noise)

No models with daily-only significance.

Weekly Only(building trend)

No models with weekly-only significance.

Noise Floor Analysis

Coefficient of variation (CV%) measures score volatility relative to mean. High-CV models require larger changes to be statistically significant; low-CV models are the most predictable.

Noisiest Models(highest CV% — widest significance thresholds)

Model	Provider	Score	CV%	Std Dev	Sig. Threshold

Most Consistent Models(lowest CV% — tightest significance thresholds)

Model	Provider	Score	CV%	Std Dev	Sig. Threshold

How to Read This Page

Understanding the statistical methodology behind our significance analysis helps you distinguish real performance shifts from random fluctuations.

Statistical Significance

We use z-scores with a 95% confidence threshold (|z| > 1.96). A z-score measures how many standard deviations a model's current score is from its historical baseline. Only changes exceeding 1.96 standard deviations are flagged as statistically significant.

Baseline Score

The baseline is computed as the arithmetic mean of each model's 14-day sparkline data. This rolling average smooths out daily fluctuations and provides a stable reference point for detecting meaningful deviations.

Confidence Intervals

Each model's 95% confidence interval is calculated as baseline ± 1.96 × standard deviation. Scores falling outside this range indicate a statistically meaningful change. The "Confidence" column shows the ± threshold value.

Multi-Timeframe Analysis

Daily (24h) and weekly (7d) rank changes are analyzed separately. Daily significance requires a rank shift of more than 3 positions; weekly requires more than 5. Models significant on both timeframes represent the strongest, most reliable signals.

Noise vs. Signal

The coefficient of variation (CV%) measures relative volatility. High-CV models have naturally noisy scores and require larger absolute changes to be significant. Low-CV models are more predictable, so even small deviations may represent real shifts.

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect when AI models may be getting worse

Stability Tracker

Track model ranking stability and consistency

Multi-Timeframe Analysis

Cross-referencing daily (24h) and weekly (7d) rank changes to identify the strongest signals. Models significant on both timeframes represent the most reliable trend shifts.

Significant on Both Timeframes(strongest signals)

No models currently significant on both daily and weekly timeframes.

Daily Only(may be noise)

No models with daily-only significance.

Weekly Only(building trend)

No models with weekly-only significance.

Noise Floor Analysis

Coefficient of variation (CV%) measures score volatility relative to mean. High-CV models require larger changes to be statistically significant; low-CV models are the most predictable.

Noisiest Models(highest CV% — widest significance thresholds)

Model	Provider	Score	CV%	Std Dev	Sig. Threshold

Most Consistent Models(lowest CV% — tightest significance thresholds)

Model	Provider	Score	CV%	Std Dev	Sig. Threshold

How to Read This Page

Understanding the statistical methodology behind our significance analysis helps you distinguish real performance shifts from random fluctuations.

Change Significance Tracker

Statistically Significant Changes

Multi-Timeframe Analysis

Significant on Both Timeframes(strongest signals)

Daily Only(may be noise)

Weekly Only(building trend)

Noise Floor Analysis

Noisiest Models(highest CV% — widest significance thresholds)

Most Consistent Models(lowest CV% — tightest significance thresholds)

How to Read This Page

Statistical Significance

Baseline Score

Confidence Intervals

Multi-Timeframe Analysis

Noise vs. Signal

Related

Change Significance Tracker

Statistically Significant Changes

Multi-Timeframe Analysis

Significant on Both Timeframes(strongest signals)

Daily Only(may be noise)

Weekly Only(building trend)

Noise Floor Analysis

Noisiest Models(highest CV% — widest significance thresholds)

Most Consistent Models(lowest CV% — tightest significance thresholds)

How to Read This Page

Statistical Significance

Baseline Score

Confidence Intervals

Multi-Timeframe Analysis

Noise vs. Signal

Related