What is the cheapest LLM for RAG (Retrieval Augmented Generation)?

The cheapest LLM suitable for RAG (Retrieval Augmented Generation) is Japanese Stable Diffusion XL via Fireworks AI at $0.0001 per million tokens (blended rate). There are 200 models in this category.

Should I use an open-source LLM for RAG (Retrieval Augmented Generation)?

There are 18 open-weight models available for RAG (Retrieval Augmented Generation). Open-weight LLMs like Llama and Mistral can be self-hosted for lower per-token costs at scale, but require infrastructure investment. For lower volumes, API-based providers often offer better value.

10 Best LLMs for RAG in 2026

The 10 best LLMs for RAG pipelines in 2026. Chat models ranked by context window, grounding accuracy, and price for Retrieval Augmented Generation across 23+ providers.

200

Models Compared

$0.0001

Cheapest

1140

Top ELO

ELO-Rated

What is the best LLM for RAG (Retrieval Augmented Generation)?

The best LLM for RAG (Retrieval Augmented Generation) is Qwen3 Embedding 0 6B Batch via Deepinfra at $0.0037 per million tokens. With an Arena ELO of 1140, it offers the best balance of quality and cost. 200 models are compared across all providers.

What Makes a Good LLM for RAG?

Large context window for retrieved passages

Faithful grounding in provided context

Low hallucination rate

Cost-effective for high query volumes

RAG Models — Ranked by Value

200 models

Sorted by value: models with higher Arena ELO and lower price rank first. Models without ELO scores are sorted by cheapest price.

#	Model	Provider	Input	Output	Quality	Released↓
177	Allenai Olmocr 2 7B 1025	Deepinfra	$0.090	$0.190	—	Oct 25
200	Zai Org GLM 4.7 Flash	Deepinfra	$0.060	$0.400	—	Sep 1
53	GPT 5 Nano	Azure OpenAI	$0.050	$0.400	Strong	Aug 7
43	Qwen: Qwen3 Coder 30B A3B Instruct 30BOpen	Deepinfra	$0.070	$0.260	Good	Jul 14
97	Voxtral Mini 3B 2507	AWS Bedrock	$0.040	$0.040	—	Jul 1
89	Paddlepaddle Paddleocr VL	Novita AI	$0.020	$0.020	—	Jun 30
180	Baidu: ERNIE 4.5 21B A3B 21B	Novita AI	$0.070	$0.280	—	Jun 30
181	Baidu: ERNIE 4.5 21B A3B Thinking 21B	Novita AI	$0.070	$0.280	—	Jun 30
117	Baichuan Baichuan M2 32B	Novita AI	$0.070	$0.070	—	Jun 1
157	Minimax M1 80K	Fireworks AI	$0.100	$0.100	—	Jun 1
21	DeepSeek R1 0528 Qwen3 8B	Novita AI	$0.060	$0.090	Frontier	May 28
108	OpenAI: gpt-oss-20b 20B	Deepinfra	$0.030	$0.140	—	May 1
122	OpenAI: gpt-oss-120b 120B	Deepinfra	$0.039	$0.190	—	May 1
132	OpenAI: gpt-oss-120b (exacto) 120B	OpenRouter	$0.047	$0.228	—	May 1
171	OpenAI: gpt-oss-safeguard-20b 20B	AWS Bedrock	$0.070	$0.200	—	May 1
194	GPT Oss 20B 1 0	AWS Bedrock	$0.070	$0.300	—	May 1
1	Qwen3 Embedding 0 6B Batch	Deepinfra	$0.0050	Free	Good	Apr 28
3	Qwen3 Reranker 0 6B	Deepinfra	$0.010	Free	Good	Apr 28
4	Qwen3 Embedding 4B Batch	Deepinfra	$0.010	Free	Good	Apr 28
5	Qwen3 Embedding 8B	Deepinfra	$0.010	Free	Good	Apr 28
6	Qwen3 Embedding 0 6B	Deepinfra	$0.010	Free	Good	Apr 28
7	Qwen3 Embedding 4B	Deepinfra	$0.020	Free	Good	Apr 28
8	Qwen3 Reranker 4B	Deepinfra	$0.025	Free	Good	Apr 28
14	Qwen3 4B Fp8	Novita AI	$0.030	$0.030	Good	Apr 28
15	Qwen3 Embedding 8B Batch	Deepinfra	$0.040	Free	Good	Apr 28
17	Qwen3 Reranker 8B	Deepinfra	$0.050	Free	Good	Apr 28
26	Qwen3 8B Fp8	Novita AI	$0.035	$0.138	Good	Apr 28
29	Qwen3 235B A22b Instruct 2507	Deepinfra	$0.071	$0.100	Good	Apr 28
34	Qwen: Qwen3 235B A22B Instruct 2507 235BOpen	OpenRouter	$0.085	$0.120	Good	Apr 28
36	Qwen3 1.7b Fp8 Draft	Fireworks AI	$0.100	$0.100	Good	Apr 28
37	Qwen3 1.7b	Fireworks AI	$0.100	$0.100	Good	Apr 28
38	Qwen3 1.7b Fp8 Draft 40960	Fireworks AI	$0.100	$0.100	Good	Apr 28
39	Qwen3 1.7b Fp8 Draft 131072	Fireworks AI	$0.100	$0.100	Good	Apr 28
40	Qwen3 0.6b	Fireworks AI	$0.100	$0.100	Good	Apr 28
52	Qwen: Qwen3 14B 14BOpen	OpenRouter	$0.072	$0.288	Good	Apr 28
54	Qwen: Qwen3 30B A3B 30BOpen	Deepinfra	$0.080	$0.280	Good	Apr 28
55	Qwen: Qwen3 32B 32BOpen	Deepinfra	$0.080	$0.280	Good	Apr 28
130	qwen-turbo	Alibaba Cloud	$0.050	$0.200	—	Apr 28
44	Llama 4 Scout 17B 16e Instruct	Deepinfra	$0.080	$0.300	Frontier	Apr 5
144	Cogito V1 Preview Llama 3B	Fireworks AI	$0.100	$0.100	—	Apr 1
172	AllenAI: Olmo 2 32B Instruct 32B	OpenRouter	$0.060	$0.240	—	Mar 25
91	Google: Gemma 3n 4B 4B	Together AI	$0.020	$0.040	—	Mar 12
113	Phi 4 Multimodal Instruct	Deepinfra	$0.050	$0.100	—	Feb 26
196	Phi-4-mini-instruct	Azure OpenAI	$0.075	$0.300	—	Feb 26
199	Phi-4-mini-reasoning	Azure OpenAI	$0.080	$0.320	—	Feb 26
49	Gemini 2.0 Flash Lite Preview 02 05	Google	$0.075	$0.300	Strong	Feb 5
50	Gemini 2.0 Flash Lite 001	Google	$0.075	$0.300	Strong	Feb 5
51	Gemini 2.0 Flash Lite	Google	$0.075	$0.300	Strong	Feb 5
123	Command R7b 12 2024 7B	OpenRouter	$0.045	$0.180	—	Feb 3
31	DeepSeek R1 Distill Qwen 1.5b	Fireworks AI	$0.100	$0.100	Frontier	Jan 20

Best LLM for Other Use Cases

Cheapest Models Coding Chatbots Summarization Creative Writing Data Extraction Customer Support Code Review Translation AI Agents

Frequently Asked Questions

The #1 LLM for RAG (Retrieval Augmented Generation) in 2026 is Qwen3 Embedding 0 6B Batch via Deepinfra at $0.0037 per million tokens. It has an Arena ELO of 1140, placing it among the highest-rated models. This top-10 ranking considers both quality (Arena ELO) and cost to find the best value across 23+ providers.