🤖 Choosing the Right AI Model

Characteristics of the LLM Provider Landscape

thegraphcourses.org • Generative AI for Work & Research

Last updated: May 28, 2025

Why Model Selection Matters

Different AI tasks require different considerations:

📊 Bulk Operations (e.g. data cleaning): Price per use matters a lot

💬 Chatbots: Fast response time is important

🔍 Code and Data Analysis: Accuracy is non-negotiable

📑 Document Understanding: Ability to process long files is essential

This guide teaches you to evaluate the best models for your specific use case.

Key Factors to Consider

🏆 Performance

Check public leaderboards to shortlist candidates:

LMArena Leaderboard

Artificial Analysis

Best practice: Run your own evaluations with your specific use cases.

💰 Price

Pricing typically listed per million tokens

Token = chunk of text (see platform.openai.com/tokenizer)

100 tokens ≈ 75 words

Input and output tokens priced separately

Strategy: Test cheap models first before moving to premium options

Speed

Cheap, smaller models are usually faster but less intelligent

For chatbots & interactive apps: Consider fast models for better UX

For async processes: Speed is less critical, focus on quality

📝 Context Limit

Modern models have huge limits (1M tokens ~ entire Bible), but fitting in context ≠ good responses.

Applications can use RAG to surpass context limits effectively.

🔓 Open vs Closed

Open-weight models:

• Often cheaper to run

• Can deploy on your infrastructure

• More control and customization

Closed models: Typically more powerful but pricier

Model Comparison Chart

Model OpenAI Anthropic Claude Google Gemini Alibaba Qwen
⚡️ FAST, CHEAP 👨🏻‍🏫 GENIUS ⚡️ FAST, CHEAP 👨🏻‍🏫 GENIUS ⚡️ FAST, CHEAP 👨🏻‍🏫 GENIUS ⚡️ FAST, CHEAP 👨🏻‍🏫 GENIUS
Model Name gpt 4.1 mini o4 mini high 3.5 Haiku 4 Sonnet 2.5 Flash 2.5 Pro Qwen3 30B A3B Qwen3 235B A22B thinking
Performance (MMLU)
0.781
0.832
0.634
0.837
0.783
0.858
0.71
0.828
Price ($/M tokens)
(3/1 in/out)
$0.7
$1.93
$1.6
$6
$0.26
$3.44
$0.35
$2.63
Context Limit (M tokens)
1M
0.2M
0.2M
0.2M
1M
1M
0.13M
0.13M
Open / Closed Weights Closed Closed Closed Closed Closed Closed Open Open
Links

Some Rules of Thumb

🎯 Start with Major Providers

Begin with models in the main dropdowns of public-facing UIs from major providers (OpenAI, Anthropic, Google).

Why: With 400+ models on platforms like OpenRouter, starting with the flagship models helps narrow your choices.

🤝 Stick with Familiar Ecosystems

Different providers' offerings are quite similar these days, so consider picking the company whose apps you already use.

Note: If you use Copilot in Microsoft, that's likely OpenAI's models under the hood.

🚀 Google: Best Performance & Value

As of writing, Google has some of the best benchmarking models:

Gemini 2.5 Pro: Top performance

Gemini 2.5 Flash: One of the cheapest good models

Great starting point if you have no other preferences.

💡 Our Favorite: Claude 4 Sonnet

While not always top on benchmarks, it excels at:

• Programming and design tasks

• Smart responses without being too chatty

• Often lower actual usage costs than 2.5 Pro due to efficiency

Note: 2.5 Pro is a thinking model, which can be verbose

⚠️ API vs Chat Interface

Important distinction: These recommendations are for API usage.

For chat interfaces, we actually pay for ChatGPT because it offers:

• Superior user interface

• Deep research capabilities

• Image generation & code execution

• Easy-to-use voice mode