Supported Models

Cortensor runs a curated catalog of open-weight LLMs across two primary engines:

  • Llamafile – quantized GGUF models (CPU-friendly, can also use GPU where available)

  • Ollama – full GPU models (for higher-end / dedicated GPU miners)

Internally, each model is packaged as a Docker image identified as:

  • cts-llm, cts-llm-1, cts-llm-2, … cts-llm-40

These cts-llm-* IDs are Docker image tags, not the raw model names. Each image wraps one runtime engine (llamafile or Ollama) and one concrete model identifier (e.g. deepseek-r1:8b-gpu).

❗ This catalog will grow over time. New images will be added with higher IDs (cts-llm-41, cts-llm-42, …), so treat the tables below as versioned snapshots rather than a permanent, fixed list.


Quantized Models (Llamafile)

Engine: llamafile Indexes: 0–11 Hardware: CPU-first, can use GPU where available Purpose: Maximum coverage across heterogeneous nodes (CPUs, smaller GPUs, edge machines) using 4–8 bit quantization.

These images are ideal for:

  • Nodes without a dedicated GPU

  • Mixed-resource environments (small VPS, home PCs, edge hardware)

  • High fan-out / broad geographic coverage

Model Families Covered

The current llamafile set includes quantized versions of:

  • Vision / Multimodal

    • llava-v1.5-7b-q4 (used as default-gpu on many nodes)

  • Reasoning & General LLMs

    • DeepSeek R1 Distill Llama 8B (Q4)

    • Meta Llama 3.1 8B Instruct (Q4)

    • Mistral 7B Instruct v0.3 (Q4)

    • Granite 3.2 8B Instruct (Q4)

    • Qwen2.5 7B Instruct 1M (Q4)

  • High-Capacity & Coder Variants

    • Qwen QwQ 32B (Q4)

    • Qwen3 4B (Q8)

    • Gemma 3 4B / 12B IT (Q6 / Q4)

    • DeepSeek R1 Distill Qwen 14B (Q4)

    • Qwen2.5 Coder 14B Instruct (Q4)

These quantized models prioritize:

  • Lower memory footprint

  • Broad participation from smaller nodes

  • “Good enough” quality for many general-purpose workloads and background jobs


Full GPU Models (Ollama)

Engine: ollama Indexes: 12–40 Hardware: GPU nodes (NVIDIA strongly preferred) Purpose: High-quality, low-latency inference for more demanding workloads and higher-SLA routes.

These are full GPU models (not llamafile-style Q4/Q6 GGUFs). They are meant for:

  • Dedicated or strong GPU miners

  • Latency-sensitive user requests

  • High-quality routing tiers

Model Families Covered

The current Ollama set spans several major model families:

  • Gemma 3

    • 270M, 4B, 12B, 27B

  • DeepSeek R1

    • 8B, 14B, 70B

  • GPT-OSS

    • 20B, 120B

  • Granite

    • Granite 4 3B

  • Ministral / Mistral

    • Ministral 3 8B / 14B

    • Mistral 7B

  • Qwen 3 & Qwen 3 VL

    • Qwen3 4B / 8B

    • Qwen3 VL 4B / 8B

  • Phi

    • Phi-3 3.8B / 14B

    • Phi-4 14B

  • Llama 3.x

    • Llama 3.2 1B / 3B

    • Llama 3.1 8B / 70B

  • Others

    • Dolphin 3 8B

    • TinyLLaMA 1.1B

    • Falcon 3 3B / 7B / 10B

These are best suited to:

  • Nodes with enough VRAM to serve these models efficiently

  • High-SLA task classes

  • Scenarios where quality/latency > footprint


How the Mapping Works

At the node / Docker level, you mostly deal with:

  • cts-llm-*Docker image ID

  • Each image contains:

    • The engine (llamafile or ollama)

    • The model identifier (e.g. deepseek-r1:8b-gpu, Qwen2.5-7B-Instruct-1M-Q4_K_M)

At the network / session level, Cortensor can:

  • Route tasks by model index (0–40)

  • Or use more descriptive labels via config/SDKs where needed

As we add more models, we’ll extend the table with new indices and images; this section stays as the canonical mapping.


Appendix: Full Model Mapping

Below are the current model mappings used by Cortensor.

🧩 Columns

  • Index – numeric model index used in configs / selection

  • Docker Image ID – the cts-llm-* tag

  • Enginellamafile or ollama

  • Model Identifier – actual model/runtime name inside the image

  • Model Family / Size – human-readable description

  • Quantized? – whether it’s a quantized GGUF (for llamafile) or full GPU model


Quantized Models – Llamafile (0–11)

Index
Docker Image ID
Engine
Model Identifier
Model Family / Size
Quantized?

0

cts-llm

llamafile

default

default-gpu (llamafile)

LLaVA v1.5 7B Q4

Yes – 4-bit

1

cts-llm-1

llamafile

DeepSeek-R1-Distill-Llama-8B-Q4_K_M

DeepSeek R1 Distill Llama 8B

Yes – 4-bit

2

cts-llm-2

llamafile

Meta-Llama-3.1-8B-Instruct.Q4_K_M

Llama 3.1 8B Instruct

Yes – 4-bit

3

cts-llm-3

llamafile

Qwen_QwQ-32B-Q4_K_M

Qwen QwQ 32B

Yes – 4-bit

4

cts-llm-4

llamafile

Mistral-7B-Instruct-v0.3.Q4_0

Mistral 7B Instruct v0.3

Yes – 4-bit

5

cts-llm-5

llamafile

granite-3.2-8b-instruct-Q4_K_M

Granite 3.2 8B Instruct

Yes – 4-bit

6

cts-llm-6

llamafile

Qwen2.5-7B-Instruct-1M-Q4_K_M

Qwen2.5 7B Instruct 1M

Yes – 4-bit

7

cts-llm-7

llamafile

google_gemma-3-4b-it-Q6_K

Gemma 3 4B IT

Yes – 6-bit

8

cts-llm-8

llamafile

Qwen_Qwen3-4B-Q8_0

Qwen3 4B

Yes – 8-bit

9

cts-llm-9

llamafile

google_gemma-3-12b-it-Q4_K_M

Gemma 3 12B IT

Yes – 4-bit

10

cts-llm-10

llamafile

DeepSeek-R1-Distill-Qwen-14B-Q4_K_M

DeepSeek R1 Distill Qwen 14B

Yes – 4-bit

11

cts-llm-11

llamafile

Qwen2.5-Coder-14B-Instruct-Q4_K_M

Qwen2.5 Coder 14B Instruct

Yes – 4-bit


Full GPU Models – Ollama (12–40)

Index
Docker Image ID
Engine
Model Identifier
Model Family / Size
Quantized?

12

cts-llm-12

ollama

gemma3:270m-gpu

Gemma 3 270M

No – full GPU

13

cts-llm-13

ollama

gpt-oss:20b-gpu

GPT-OSS 20B

No – full GPU

14

cts-llm-14

ollama

deepseek-r1:8b-gpu

DeepSeek R1 8B

No – full GPU

15

cts-llm-15

ollama

granite4:3b-gpu

Granite 4 3B

No – full GPU

16

cts-llm-16

ollama

gpt-oss:120b-gpu

GPT-OSS 120B

No – full GPU

17

cts-llm-17

ollama

deepseek-r1:70b-gpu

DeepSeek R1 70B

No – full GPU

18

cts-llm-18

ollama

gemma3:4b-gpu

Gemma 3 4B

No – full GPU

19

cts-llm-19

ollama

gemma3:12b-gpu

Gemma 3 12B

No – full GPU

20

cts-llm-20

ollama

ministral-3:8b-gpu

Ministral 3 8B

No – full GPU

21

cts-llm-21

ollama

deepseek-r1:14b-gpu

DeepSeek R1 14B

No – full GPU

22

cts-llm-22

ollama

gemma3:27b-gpu

Gemma 3 27B

No – full GPU

23

cts-llm-23

ollama

qwen3-vl:4b-gpu

Qwen3 VL 4B

No – full GPU (vision-capable)

24

cts-llm-24

ollama

qwen3-vl:8b-gpu

Qwen3 VL 8B

No – full GPU (vision-capable)

25

cts-llm-25

ollama

ministral-3:14b-gpu

Ministral 3 14B

No – full GPU

26

cts-llm-26

ollama

qwen3:4b-gpu

Qwen3 4B

No – full GPU

27

cts-llm-27

ollama

qwen3:8b-gpu

Qwen3 8B

No – full GPU

28

cts-llm-28

ollama

mistral:7b-gpu

Mistral 7B

No – full GPU

29

cts-llm-29

ollama

phi3:3.8b-gpu

Phi-3 3.8B

No – full GPU

30

cts-llm-30

ollama

phi3:14b-gpu

Phi-3 14B

No – full GPU

31

cts-llm-31

ollama

llama3:1b-gpu

Llama 3.2 1B

No – full GPU

32

cts-llm-32

ollama

llama3:3b-gpu

Llama 3.2 3B

No – full GPU

33

cts-llm-33

ollama

llama3:8b-gpu

Llama 3.1 8B

No – full GPU

34

cts-llm-34

ollama

llama3:70b-gpu

Llama 3.1 70B

No – full GPU

35

cts-llm-35

ollama

phi4:14b-gpu

Phi-4 14B

No – full GPU

36

cts-llm-36

ollama

dolphin3:8b-gpu

Dolphin 3 8B

No – full GPU

37

cts-llm-37

ollama

tinyllama1.1b-gpu

TinyLLaMA 1.1B

No – full GPU

38

cts-llm-38

ollama

falcon3:3b-gpu

Falcon 3 3B

No – full GPU

39

cts-llm-39

ollama

falcon3:7b-gpu

Falcon 3 7B

No – full GPU

40

cts-llm-40

ollama

falcon3:10b-gpu

Falcon 3 10B

No – full GPU

Last updated