Methodology

Full transparency into how AI Models Map ranks 60+ AI models. Every score and weight is documented here so you can understand exactly what our rankings measure.

How Composite Scores Work

Every model on AI Models Map receives a composite score from 0 to 100. This score is not a single benchmark result — it is a weighted blend of multiple signal categories that together capture a model’s overall quality, popularity, and reliability.

We combine quantitative benchmark data with real-world adoption signals and qualitative expert assessments. Each signal is normalized to a 0-100 scale before weighting, so no single source with a different numeric range can dominate the final score.

The weights below reflect our default category profile. Weights are tuned per category (coding, image generation, video generation) to emphasize the signals most relevant to each domain. You can also build your own custom ranking by adjusting weights in the Explorer.

Signal Categories

Five categories of signals feed into the composite score. Here is how the default weighting breaks down:

Benchmark Performance30%

Community Sentiment20%

Adoption & Usage20%

Expert Reviews15%

Speed & Reliability15%

Benchmark Performance

30% weight

Standardized evaluation scores from established benchmark suites. These are the most objective, reproducible signals available.

Multi-domain knowledge scores
Code generation pass rates
Software engineering task resolution
Math and reasoning benchmarks
Head-to-head Elo ratings

Community Sentiment

20% weight

What developers and users are saying about each model. We aggregate sentiment from multiple forums to reduce noise.

Forum discussion sentiment
Social media mention analysis
Community platform ratings
Developer survey results

Adoption & Usage

20% weight

Real-world usage metrics that indicate how widely a model is actually being used in production and development.

Open-source repository activity
Model download volume
Search interest trends
Developer Q&A activity
Package manager downloads

Expert Reviews

15% weight

Qualitative assessments from AI researchers, engineers, and domain experts who evaluate models hands-on.

Academic paper citations
Expert blog reviews & comparisons
Conference workshop evaluations
Internal council assessments

Speed & Reliability

15% weight

Operational metrics that matter for production use: how fast is it, and can you count on it being available?

API latency (time to first token)
Throughput (tokens per second)
Uptime monitoring (99.x% SLA)
Rate limit headroom

Confidence Levels

Not all models have equal data coverage. We assign a confidence rating to every composite score so you know how reliable it is.

High Confidence

All primary signals available, model has been live for 30+ days, 3+ benchmark scores.

Medium Confidence

Most signals available but some gaps. Model may be new (7-30 days) or missing 1-2 benchmark results.

Low Confidence

Limited data. Model launched within the last 7 days, few benchmarks available, or limited signal coverage.

Update Frequency

Hourly Re-scoring

Fast-moving signals like API latency, uptime, and trending data are polled and re-scored every hour. Composite scores can shift multiple times per day.

Daily Batch

Repository stats, model downloads, developer activity, and social sentiment are refreshed in a nightly batch job at 00:00 UTC.

Weekly Deep Analysis

Academic citations, expert reviews, and benchmark results from new papers are incorporated weekly. Weight calibration is reviewed by the AI Council every Monday.

Methodology Changes

Our methodology is versioned. Every time we adjust signal weights, add a new signal, or change the normalization logic, we increment the version number, document the change in the Changelog, and explain the rationale.

The current scoring formula is v2.1. Historical scores are not retroactively recomputed — the methodology version is recorded alongside every snapshot so you can always see which formula produced a given ranking.

We welcome feedback. If you believe a signal is over- or under-weighted, or if you have suggestions for improving our methodology, reach out via the AI Council transparency page.

Explore the Rankings

Now that you understand how scores work, see them in action on the leaderboard, or build your own custom ranking.

View Leaderboard|Custom Rankings Explorer|Changelog