293 models ranked for scientific research. Scored with bonuses for reasoning (complex analysis), large context (papers), vision (diagrams), web search (literature), and JSON mode (structured data).
| # | Model | Score |
|---|---|---|
| 1 | GPT-5.4 ProOpenAI | 91 |
| 2 | GPT-5.2 ProOpenAI | 90 |
| 3 | GPT-5 ProOpenAI | 90 |
| 4 | o3 ProOpenAI | 82 |
| 5 | Claude Opus 4.1Anthropic | 81 |
| 6 | o1-proOpenAI | 77 |
| 7 | o3 Deep ResearchOpenAI | 74 |
| 8 | Claude Opus 4Anthropic | 76 |
| 9 | Claude Opus 4.6Anthropic | 71 |
| 10 | Claude Opus 4.5Anthropic | 70 |
| 11 | GPT-5.4OpenAI | 70 |
| 12 | Claude Sonnet 4.5Anthropic | 69 |
| 13 | Qwen3 VL 30B A3B ThinkingAlibaba | 69 |
| 14 | Qwen3 VL 235B A22B ThinkingAlibaba | 69 |
| 15 | GPT-5.2OpenAI | 68 |
| 16 | Claude Sonnet 4.6Anthropic | 68 |
| 17 | GPT-5.1OpenAI | 67 |
| 18 | GPT-5.3-CodexOpenAI | 67 |
| 19 | GPT-5.2-CodexOpenAI | 67 |
| 20 | GPT-5OpenAI | 67 |
| 21 | o4 Mini Deep ResearchOpenAI | 66 |
| 22 | GPT-5.1-Codex-MaxOpenAI | 66 |
| 23 | Gemini 3.1 Pro Preview Custom ToolsGoogle | 68 |
| 24 | Gemini 3.1 Pro PreviewGoogle | 68 |
| 25 | Gemini 3 Pro PreviewGoogle | 68 |
| 26 | GPT-5 MiniOpenAI | 65 |
| 27 | GPT-5 NanoOpenAI | 64 |
| 28 | Grok 4.1 FastxAI | 64 |
| 29 | Grok 4 FastxAI | 64 |
| 30 | Claude Haiku 4.5Anthropic | 63 |
Large context models (128K+) can process entire research papers. Combined with reasoning, they extract key findings, identify methodology gaps, and synthesize across multiple sources.
Vision models analyze charts, plots, and experimental images. Reasoning models work through complex statistical analyses, helping researchers validate findings and spot patterns.
Reasoning models help design experiments, identify confounding variables, and suggest controls. Web search keeps research informed by the latest published methods and protocols.
Large output models draft sections of scientific papers with proper structure. Models review drafts for logical consistency, suggest improvements, and check against current literature.