Supported Models

Cortensor runs a curated catalog of open-weight LLMs across two primary engines:

Llamafile – quantized GGUF models (CPU-friendly, can also use GPU where available)
Ollama – full GPU models (for higher-end / dedicated GPU miners)

Internally, each model is packaged as a Docker image identified as:

cts-llm, cts-llm-1, cts-llm-2, … cts-llm-40

These cts-llm-* IDs are Docker image tags, not the raw model names. Each image wraps one runtime engine (llamafile or Ollama) and one concrete model identifier (e.g. deepseek-r1:8b-gpu).

❗ This catalog will grow over time. New images will be added with higher IDs (cts-llm-41, cts-llm-42, …), so treat the tables below as versioned snapshots rather than a permanent, fixed list.

Quantized Models (Llamafile)

Engine: llamafile Indexes: 0–11 Hardware: CPU-first, can use GPU where available Purpose: Maximum coverage across heterogeneous nodes (CPUs, smaller GPUs, edge machines) using 4–8 bit quantization.

These images are ideal for:

Nodes without a dedicated GPU
Mixed-resource environments (small VPS, home PCs, edge hardware)
High fan-out / broad geographic coverage

Model Families Covered

The current llamafile set includes quantized versions of:

Vision / Multimodal
- llava-v1.5-7b-q4 (used as default-gpu on many nodes)
Reasoning & General LLMs
- DeepSeek R1 Distill Llama 8B (Q4)
- Meta Llama 3.1 8B Instruct (Q4)
- Mistral 7B Instruct v0.3 (Q4)
- Granite 3.2 8B Instruct (Q4)
- Qwen2.5 7B Instruct 1M (Q4)
High-Capacity & Coder Variants
- Qwen QwQ 32B (Q4)
- Qwen3 4B (Q8)
- Gemma 3 4B / 12B IT (Q6 / Q4)
- DeepSeek R1 Distill Qwen 14B (Q4)
- Qwen2.5 Coder 14B Instruct (Q4)

These quantized models prioritize:

Lower memory footprint
Broad participation from smaller nodes
“Good enough” quality for many general-purpose workloads and background jobs

Full GPU Models (Ollama)

Engine: ollama Indexes: 12–40 Hardware: GPU nodes (NVIDIA strongly preferred) Purpose: High-quality, low-latency inference for more demanding workloads and higher-SLA routes.

These are full GPU models (not llamafile-style Q4/Q6 GGUFs). They are meant for:

Dedicated or strong GPU miners
Latency-sensitive user requests
High-quality routing tiers

Model Families Covered

The current Ollama set spans several major model families:

Gemma 3
- 270M, 4B, 12B, 27B
DeepSeek R1
- 8B, 14B, 70B
GPT-OSS
- 20B, 120B
Granite
- Granite 4 3B
Ministral / Mistral
- Ministral 3 8B / 14B
- Mistral 7B
Qwen 3 & Qwen 3 VL
- Qwen3 4B / 8B
- Qwen3 VL 4B / 8B
Phi
- Phi-3 3.8B / 14B
- Phi-4 14B
Llama 3.x
- Llama 3.2 1B / 3B
- Llama 3.1 8B / 70B
Others
- Dolphin 3 8B
- TinyLLaMA 1.1B
- Falcon 3 3B / 7B / 10B

These are best suited to:

Nodes with enough VRAM to serve these models efficiently
High-SLA task classes
Scenarios where quality/latency > footprint

How the Mapping Works

At the node / Docker level, you mostly deal with:

cts-llm-* → Docker image ID
Each image contains:
- The engine (llamafile or ollama)
- The model identifier (e.g. deepseek-r1:8b-gpu, Qwen2.5-7B-Instruct-1M-Q4_K_M)

At the network / session level, Cortensor can:

Route tasks by model index (0–40)
Or use more descriptive labels via config/SDKs where needed

As we add more models, we’ll extend the table with new indices and images; this section stays as the canonical mapping.

Appendix: Full Model Mapping

Below are the current model mappings used by Cortensor.

🧩 Columns
Index – numeric model index used in configs / selection
Docker Image ID – the cts-llm-* tag
Engine – llamafile or ollama
Model Identifier – actual model/runtime name inside the image
Model Family / Size – human-readable description
Quantized? – whether it’s a quantized GGUF (for llamafile) or full GPU model

Quantized Models – Llamafile (`0–11`)

Index

Docker Image ID

Engine

Model Identifier

Model Family / Size

Quantized?

cts-llm

llamafile

default

default-gpu (llamafile)

LLaVA v1.5 7B Q4

Yes – 4-bit

cts-llm-1

llamafile

DeepSeek-R1-Distill-Llama-8B-Q4_K_M

DeepSeek R1 Distill Llama 8B

Yes – 4-bit

cts-llm-2

llamafile

Meta-Llama-3.1-8B-Instruct.Q4_K_M

Llama 3.1 8B Instruct

Yes – 4-bit

cts-llm-3

llamafile

Qwen_QwQ-32B-Q4_K_M

Qwen QwQ 32B

Yes – 4-bit

cts-llm-4

llamafile

Mistral-7B-Instruct-v0.3.Q4_0

Mistral 7B Instruct v0.3

Yes – 4-bit

cts-llm-5

llamafile

granite-3.2-8b-instruct-Q4_K_M

Granite 3.2 8B Instruct

Yes – 4-bit

cts-llm-6

llamafile

Qwen2.5-7B-Instruct-1M-Q4_K_M

Qwen2.5 7B Instruct 1M

Yes – 4-bit

cts-llm-7

llamafile

google_gemma-3-4b-it-Q6_K

Gemma 3 4B IT

Yes – 6-bit

cts-llm-8

llamafile

Qwen_Qwen3-4B-Q8_0

Qwen3 4B

Yes – 8-bit

cts-llm-9

llamafile

google_gemma-3-12b-it-Q4_K_M

Gemma 3 12B IT

Yes – 4-bit

cts-llm-10

llamafile

DeepSeek-R1-Distill-Qwen-14B-Q4_K_M

DeepSeek R1 Distill Qwen 14B

Yes – 4-bit

cts-llm-11

llamafile

Qwen2.5-Coder-14B-Instruct-Q4_K_M

Qwen2.5 Coder 14B Instruct

Yes – 4-bit

Full GPU Models – Ollama (`12–40`)

Index

Docker Image ID

Engine

Model Identifier

Model Family / Size

Quantized?

cts-llm-12

ollama

gemma3:270m-gpu

Gemma 3 270M

No – full GPU

cts-llm-13

ollama

gpt-oss:20b-gpu

GPT-OSS 20B

No – full GPU

cts-llm-14

ollama

deepseek-r1:8b-gpu

DeepSeek R1 8B

No – full GPU

cts-llm-15

ollama

granite4:3b-gpu

Granite 4 3B

No – full GPU

cts-llm-16

ollama

gpt-oss:120b-gpu

GPT-OSS 120B

No – full GPU

cts-llm-17

ollama

deepseek-r1:70b-gpu

DeepSeek R1 70B

No – full GPU

cts-llm-18

ollama

gemma3:4b-gpu

Gemma 3 4B

No – full GPU

cts-llm-19

ollama

gemma3:12b-gpu

Gemma 3 12B

No – full GPU

cts-llm-20

ollama

ministral-3:8b-gpu

Ministral 3 8B

No – full GPU

cts-llm-21

ollama

deepseek-r1:14b-gpu

DeepSeek R1 14B

No – full GPU

cts-llm-22

ollama

gemma3:27b-gpu

Gemma 3 27B

No – full GPU

cts-llm-23

ollama

qwen3-vl:4b-gpu

Qwen3 VL 4B

No – full GPU (vision-capable)

cts-llm-24

ollama

qwen3-vl:8b-gpu

Qwen3 VL 8B

No – full GPU (vision-capable)

cts-llm-25

ollama

ministral-3:14b-gpu

Ministral 3 14B

No – full GPU

cts-llm-26

ollama

qwen3:4b-gpu

Qwen3 4B

No – full GPU

cts-llm-27

ollama

qwen3:8b-gpu

Qwen3 8B

No – full GPU

cts-llm-28

ollama

mistral:7b-gpu

Mistral 7B

No – full GPU

cts-llm-29

ollama

phi3:3.8b-gpu

Phi-3 3.8B

No – full GPU

cts-llm-30

ollama

phi3:14b-gpu

Phi-3 14B

No – full GPU

cts-llm-31

ollama

llama3:1b-gpu

Llama 3.2 1B

No – full GPU

cts-llm-32

ollama

llama3:3b-gpu

Llama 3.2 3B

No – full GPU

cts-llm-33

ollama

llama3:8b-gpu

Llama 3.1 8B

No – full GPU

cts-llm-34

ollama

llama3:70b-gpu

Llama 3.1 70B

No – full GPU

cts-llm-35

ollama

phi4:14b-gpu

Phi-4 14B

No – full GPU

cts-llm-36

ollama

dolphin3:8b-gpu

Dolphin 3 8B

No – full GPU

cts-llm-37

ollama

tinyllama1.1b-gpu

TinyLLaMA 1.1B

No – full GPU

cts-llm-38

ollama

falcon3:3b-gpu

Falcon 3 3B

No – full GPU

cts-llm-39

ollama

falcon3:7b-gpu

Falcon 3 7B

No – full GPU

cts-llm-40

ollama

falcon3:10b-gpu

Falcon 3 10B

No – full GPU

PreviousSummary NextType of Services

Last updated 3 months ago

hashtagQuantized Models (Llamafile)

hashtagModel Families Covered

hashtagFull GPU Models (Ollama)

hashtagModel Families Covered

hashtagHow the Mapping Works

hashtagAppendix: Full Model Mapping

hashtagQuantized Models – Llamafile (0–11)

hashtagFull GPU Models – Ollama (12–40)

Quantized Models (Llamafile)

Model Families Covered

Full GPU Models (Ollama)

Model Families Covered

How the Mapping Works

Appendix: Full Model Mapping

Quantized Models – Llamafile (`0–11`)

Full GPU Models – Ollama (`12–40`)