Supported Models
Cortensor runs a curated catalog of open-weight LLMs across two primary engines:
Llamafile – quantized GGUF models (CPU-friendly, can also use GPU where available)
Ollama – full GPU models (for higher-end / dedicated GPU miners)
Internally, each model is packaged as a Docker image identified as:
cts-llm,cts-llm-1,cts-llm-2, …cts-llm-40
These cts-llm-* IDs are Docker image tags, not the raw model names.
Each image wraps one runtime engine (llamafile or Ollama) and one concrete model identifier (e.g. deepseek-r1:8b-gpu).
❗ This catalog will grow over time. New images will be added with higher IDs (
cts-llm-41,cts-llm-42, …), so treat the tables below as versioned snapshots rather than a permanent, fixed list.
Quantized Models (Llamafile)
Engine: llamafile
Indexes: 0–11
Hardware: CPU-first, can use GPU where available
Purpose: Maximum coverage across heterogeneous nodes (CPUs, smaller GPUs, edge machines) using 4–8 bit quantization.
These images are ideal for:
Nodes without a dedicated GPU
Mixed-resource environments (small VPS, home PCs, edge hardware)
High fan-out / broad geographic coverage
Model Families Covered
The current llamafile set includes quantized versions of:
Vision / Multimodal
llava-v1.5-7b-q4(used asdefault-gpuon many nodes)
Reasoning & General LLMs
DeepSeek R1 Distill Llama 8B (Q4)
Meta Llama 3.1 8B Instruct (Q4)
Mistral 7B Instruct v0.3 (Q4)
Granite 3.2 8B Instruct (Q4)
Qwen2.5 7B Instruct 1M (Q4)
High-Capacity & Coder Variants
Qwen QwQ 32B (Q4)
Qwen3 4B (Q8)
Gemma 3 4B / 12B IT (Q6 / Q4)
DeepSeek R1 Distill Qwen 14B (Q4)
Qwen2.5 Coder 14B Instruct (Q4)
These quantized models prioritize:
Lower memory footprint
Broad participation from smaller nodes
“Good enough” quality for many general-purpose workloads and background jobs
Full GPU Models (Ollama)
Engine: ollama
Indexes: 12–40
Hardware: GPU nodes (NVIDIA strongly preferred)
Purpose: High-quality, low-latency inference for more demanding workloads and higher-SLA routes.
These are full GPU models (not llamafile-style Q4/Q6 GGUFs). They are meant for:
Dedicated or strong GPU miners
Latency-sensitive user requests
High-quality routing tiers
Model Families Covered
The current Ollama set spans several major model families:
Gemma 3
270M, 4B, 12B, 27B
DeepSeek R1
8B, 14B, 70B
GPT-OSS
20B, 120B
Granite
Granite 4 3B
Ministral / Mistral
Ministral 3 8B / 14B
Mistral 7B
Qwen 3 & Qwen 3 VL
Qwen3 4B / 8B
Qwen3 VL 4B / 8B
Phi
Phi-3 3.8B / 14B
Phi-4 14B
Llama 3.x
Llama 3.2 1B / 3B
Llama 3.1 8B / 70B
Others
Dolphin 3 8B
TinyLLaMA 1.1B
Falcon 3 3B / 7B / 10B
These are best suited to:
Nodes with enough VRAM to serve these models efficiently
High-SLA task classes
Scenarios where quality/latency > footprint
How the Mapping Works
At the node / Docker level, you mostly deal with:
cts-llm-*→ Docker image IDEach image contains:
The engine (
llamafileorollama)The model identifier (e.g.
deepseek-r1:8b-gpu,Qwen2.5-7B-Instruct-1M-Q4_K_M)
At the network / session level, Cortensor can:
Route tasks by model index (0–40)
Or use more descriptive labels via config/SDKs where needed
As we add more models, we’ll extend the table with new indices and images; this section stays as the canonical mapping.
Appendix: Full Model Mapping
Below are the current model mappings used by Cortensor.
🧩 Columns
Index – numeric model index used in configs / selection
Docker Image ID – the
cts-llm-*tagEngine –
llamafileorollamaModel Identifier – actual model/runtime name inside the image
Model Family / Size – human-readable description
Quantized? – whether it’s a quantized GGUF (for llamafile) or full GPU model
Quantized Models – Llamafile (0–11)
0–11)0
cts-llm
llamafile
default
default-gpu (llamafile)
LLaVA v1.5 7B Q4
Yes – 4-bit
1
cts-llm-1
llamafile
DeepSeek-R1-Distill-Llama-8B-Q4_K_M
DeepSeek R1 Distill Llama 8B
Yes – 4-bit
2
cts-llm-2
llamafile
Meta-Llama-3.1-8B-Instruct.Q4_K_M
Llama 3.1 8B Instruct
Yes – 4-bit
3
cts-llm-3
llamafile
Qwen_QwQ-32B-Q4_K_M
Qwen QwQ 32B
Yes – 4-bit
4
cts-llm-4
llamafile
Mistral-7B-Instruct-v0.3.Q4_0
Mistral 7B Instruct v0.3
Yes – 4-bit
5
cts-llm-5
llamafile
granite-3.2-8b-instruct-Q4_K_M
Granite 3.2 8B Instruct
Yes – 4-bit
6
cts-llm-6
llamafile
Qwen2.5-7B-Instruct-1M-Q4_K_M
Qwen2.5 7B Instruct 1M
Yes – 4-bit
7
cts-llm-7
llamafile
google_gemma-3-4b-it-Q6_K
Gemma 3 4B IT
Yes – 6-bit
8
cts-llm-8
llamafile
Qwen_Qwen3-4B-Q8_0
Qwen3 4B
Yes – 8-bit
9
cts-llm-9
llamafile
google_gemma-3-12b-it-Q4_K_M
Gemma 3 12B IT
Yes – 4-bit
10
cts-llm-10
llamafile
DeepSeek-R1-Distill-Qwen-14B-Q4_K_M
DeepSeek R1 Distill Qwen 14B
Yes – 4-bit
11
cts-llm-11
llamafile
Qwen2.5-Coder-14B-Instruct-Q4_K_M
Qwen2.5 Coder 14B Instruct
Yes – 4-bit
Full GPU Models – Ollama (12–40)
12–40)12
cts-llm-12
ollama
gemma3:270m-gpu
Gemma 3 270M
No – full GPU
13
cts-llm-13
ollama
gpt-oss:20b-gpu
GPT-OSS 20B
No – full GPU
14
cts-llm-14
ollama
deepseek-r1:8b-gpu
DeepSeek R1 8B
No – full GPU
15
cts-llm-15
ollama
granite4:3b-gpu
Granite 4 3B
No – full GPU
16
cts-llm-16
ollama
gpt-oss:120b-gpu
GPT-OSS 120B
No – full GPU
17
cts-llm-17
ollama
deepseek-r1:70b-gpu
DeepSeek R1 70B
No – full GPU
18
cts-llm-18
ollama
gemma3:4b-gpu
Gemma 3 4B
No – full GPU
19
cts-llm-19
ollama
gemma3:12b-gpu
Gemma 3 12B
No – full GPU
20
cts-llm-20
ollama
ministral-3:8b-gpu
Ministral 3 8B
No – full GPU
21
cts-llm-21
ollama
deepseek-r1:14b-gpu
DeepSeek R1 14B
No – full GPU
22
cts-llm-22
ollama
gemma3:27b-gpu
Gemma 3 27B
No – full GPU
23
cts-llm-23
ollama
qwen3-vl:4b-gpu
Qwen3 VL 4B
No – full GPU (vision-capable)
24
cts-llm-24
ollama
qwen3-vl:8b-gpu
Qwen3 VL 8B
No – full GPU (vision-capable)
25
cts-llm-25
ollama
ministral-3:14b-gpu
Ministral 3 14B
No – full GPU
26
cts-llm-26
ollama
qwen3:4b-gpu
Qwen3 4B
No – full GPU
27
cts-llm-27
ollama
qwen3:8b-gpu
Qwen3 8B
No – full GPU
28
cts-llm-28
ollama
mistral:7b-gpu
Mistral 7B
No – full GPU
29
cts-llm-29
ollama
phi3:3.8b-gpu
Phi-3 3.8B
No – full GPU
30
cts-llm-30
ollama
phi3:14b-gpu
Phi-3 14B
No – full GPU
31
cts-llm-31
ollama
llama3:1b-gpu
Llama 3.2 1B
No – full GPU
32
cts-llm-32
ollama
llama3:3b-gpu
Llama 3.2 3B
No – full GPU
33
cts-llm-33
ollama
llama3:8b-gpu
Llama 3.1 8B
No – full GPU
34
cts-llm-34
ollama
llama3:70b-gpu
Llama 3.1 70B
No – full GPU
35
cts-llm-35
ollama
phi4:14b-gpu
Phi-4 14B
No – full GPU
36
cts-llm-36
ollama
dolphin3:8b-gpu
Dolphin 3 8B
No – full GPU
37
cts-llm-37
ollama
tinyllama1.1b-gpu
TinyLLaMA 1.1B
No – full GPU
38
cts-llm-38
ollama
falcon3:3b-gpu
Falcon 3 3B
No – full GPU
39
cts-llm-39
ollama
falcon3:7b-gpu
Falcon 3 7B
No – full GPU
40
cts-llm-40
ollama
falcon3:10b-gpu
Falcon 3 10B
No – full GPU
Last updated