AI Model Selection Guide - The Graph Courses

Why Model Selection Matters

Different AI tasks require different considerations:

📊 Bulk Operations (e.g. data cleaning): Price per use matters a lot

💬 Chatbots: Fast response time is important

🔍 Code and Data Analysis: Accuracy is non-negotiable

📑 Document Understanding: Ability to process long files is essential

This guide teaches you to evaluate the best models for your specific use case.

Key Factors to Consider

🏆 Performance

Check public leaderboards to shortlist candidates:

• LMArena Leaderboard

• Artificial Analysis

Best practice: Run your own evaluations with your specific use cases.

💰 Price

Pricing typically listed per million tokens

Token = chunk of text (see platform.openai.com/tokenizer)

100 tokens ≈ 75 words

Input and output tokens priced separately

Strategy: Test cheap models first before moving to premium options

⚡ Speed

Cheap, smaller models are usually faster but less intelligent

For chatbots & interactive apps: Consider fast models for better UX

For async processes: Speed is less critical, focus on quality

📝 Context Limit

Modern models have huge limits (1M tokens ~ entire Bible), but fitting in context ≠ good responses.

Applications can use RAG to surpass context limits effectively.

🔓 Open vs Closed

Open-weight models:

• Often cheaper to run

• Can deploy on your infrastructure

• More control and customization

Closed models: Typically more powerful but pricier

Model Comparison Chart

Model	OpenAI		Anthropic Claude		Google Gemini		Alibaba Qwen
Model	⚡️ FAST, CHEAP	👨🏻‍🏫 GENIUS	⚡️ FAST, CHEAP	👨🏻‍🏫 GENIUS	⚡️ FAST, CHEAP	👨🏻‍🏫 GENIUS	⚡️ FAST, CHEAP	👨🏻‍🏫 GENIUS
Model Name	gpt 4.1 mini	o4 mini high	3.5 Haiku	4 Sonnet	2.5 Flash	2.5 Pro	Qwen3 30B A3B	Qwen3 235B A22B thinking
Performance (MMLU)	0.781	0.832	0.634	0.837	0.783	0.858	0.71	0.828
Price ($/M tokens) (3/1 in/out)	$0.7	$1.93	$1.6	$6	$0.26	$3.44	$0.35	$2.63
Context Limit (M tokens)	1M	0.2M	0.2M	0.2M	1M	1M	0.13M	0.13M
Open / Closed Weights	Closed	Closed	Closed	Closed	Closed	Closed	Open	Open
Links	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter	Analysis OpenRouter

Some Rules of Thumb

🎯 Start with Major Providers

Begin with models in the main dropdowns of public-facing UIs from major providers (OpenAI, Anthropic, Google).

Why: With 400+ models on platforms like OpenRouter, starting with the flagship models helps narrow your choices.

🤝 Stick with Familiar Ecosystems

Different providers' offerings are quite similar these days, so consider picking the company whose apps you already use.

Note: If you use Copilot in Microsoft, that's likely OpenAI's models under the hood.

🚀 Google: Best Performance & Value

As of writing, Google has some of the best benchmarking models:

• Gemini 2.5 Pro: Top performance

• Gemini 2.5 Flash: One of the cheapest good models

Great starting point if you have no other preferences.

💡 Our Favorite: Claude 4 Sonnet

While not always top on benchmarks, it excels at:

• Programming and design tasks

• Smart responses without being too chatty

• Often lower actual usage costs than 2.5 Pro due to efficiency

Note: 2.5 Pro is a thinking model, which can be verbose

⚠️ API vs Chat Interface

Important distinction: These recommendations are for API usage.

For chat interfaces, we actually pay for ChatGPT because it offers:

• Superior user interface

• Deep research capabilities

• Image generation & code execution

• Easy-to-use voice mode

🤖 Choosing the Right AI Model

Characteristics of the LLM Provider Landscape