Full transparency into how AI Models Map ranks 60+ AI models. Every score and weight is documented here so you can understand exactly what our rankings measure.
Every model on AI Models Map receives a composite score from 0 to 100. This score is not a single benchmark result — it is a weighted blend of multiple signal categories that together capture a model’s overall quality, popularity, and reliability.
We combine quantitative benchmark data with real-world adoption signals and qualitative expert assessments. Each signal is normalized to a 0-100 scale before weighting, so no single source with a different numeric range can dominate the final score.
The weights below reflect our default category profile. Weights are tuned per category (coding, image generation, video generation) to emphasize the signals most relevant to each domain. You can also build your own custom ranking by adjusting weights in the Explorer.
Five categories of signals feed into the composite score. Here is how the default weighting breaks down:
Standardized evaluation scores from established benchmark suites. These are the most objective, reproducible signals available.
What developers and users are saying about each model. We aggregate sentiment from multiple forums to reduce noise.
Real-world usage metrics that indicate how widely a model is actually being used in production and development.
Qualitative assessments from AI researchers, engineers, and domain experts who evaluate models hands-on.
Operational metrics that matter for production use: how fast is it, and can you count on it being available?
Not all models have equal data coverage. We assign a confidence rating to every composite score so you know how reliable it is.
All primary signals available, model has been live for 30+ days, 3+ benchmark scores.
Most signals available but some gaps. Model may be new (7-30 days) or missing 1-2 benchmark results.
Limited data. Model launched within the last 7 days, few benchmarks available, or limited signal coverage.
Fast-moving signals like API latency, uptime, and trending data are polled and re-scored every hour. Composite scores can shift multiple times per day.
Repository stats, model downloads, developer activity, and social sentiment are refreshed in a nightly batch job at 00:00 UTC.
Academic citations, expert reviews, and benchmark results from new papers are incorporated weekly. Weight calibration is reviewed by the AI Council every Monday.
Our methodology is versioned. Every time we adjust signal weights, add a new signal, or change the normalization logic, we increment the version number, document the change in the Changelog, and explain the rationale.
The current scoring formula is v2.1. Historical scores are not retroactively recomputed — the methodology version is recorded alongside every snapshot so you can always see which formula produced a given ranking.
We welcome feedback. If you believe a signal is over- or under-weighted, or if you have suggestions for improving our methodology, reach out via the AI Council transparency page.
Now that you understand how scores work, see them in action on the leaderboard, or build your own custom ranking.