# Supported Models

Cortensor runs a curated catalog of **open-weight LLMs** across two primary engines:

* **Llamafile** – quantized GGUF models (CPU-friendly, can also use GPU where available)
* **Ollama** – full GPU models (for higher-end / dedicated GPU miners)

Internally, each model is packaged as a **Docker image** identified as:

* `cts-llm`, `cts-llm-1`, `cts-llm-2`, … `cts-llm-40`

These **`cts-llm-*` IDs are Docker image tags**, not the raw model names.\
Each image wraps one runtime engine (**llamafile** or **Ollama**) and one concrete model identifier (e.g. `deepseek-r1:8b-gpu`).

> ❗ This catalog will grow over time. New images will be added with higher IDs (`cts-llm-41`, `cts-llm-42`, …), so treat the tables below as **versioned snapshots** rather than a permanent, fixed list.

***

### Quantized Models (Llamafile)

**Engine:** `llamafile`\
**Indexes:** `0–11`\
**Hardware:** CPU-first, can use GPU where available\
**Purpose:** Maximum coverage across heterogeneous nodes (CPUs, smaller GPUs, edge machines) using **4–8 bit quantization**.

These images are ideal for:

* Nodes without a dedicated GPU
* Mixed-resource environments (small VPS, home PCs, edge hardware)
* High fan-out / broad geographic coverage

#### Model Families Covered

The current llamafile set includes quantized versions of:

* **Vision / Multimodal**
  * `llava-v1.5-7b-q4` (used as `default-gpu` on many nodes)
* **Reasoning & General LLMs**
  * **DeepSeek R1 Distill Llama 8B** (Q4)
  * **Meta Llama 3.1 8B Instruct** (Q4)
  * **Mistral 7B Instruct v0.3** (Q4)
  * **Granite 3.2 8B Instruct** (Q4)
  * **Qwen2.5 7B Instruct 1M** (Q4)
* **High-Capacity & Coder Variants**
  * **Qwen QwQ 32B** (Q4)
  * **Qwen3 4B** (Q8)
  * **Gemma 3 4B / 12B IT** (Q6 / Q4)
  * **DeepSeek R1 Distill Qwen 14B** (Q4)
  * **Qwen2.5 Coder 14B Instruct** (Q4)

These quantized models prioritize:

* Lower memory footprint
* Broad participation from smaller nodes
* “Good enough” quality for many general-purpose workloads and background jobs

***

### Full GPU Models (Ollama)

**Engine:** `ollama`\
**Indexes:** `12–40`\
**Hardware:** GPU nodes (NVIDIA strongly preferred)\
**Purpose:** High-quality, low-latency inference for more demanding workloads and higher-SLA routes.

These are **full GPU models** (not llamafile-style Q4/Q6 GGUFs). They are meant for:

* Dedicated or strong GPU miners
* Latency-sensitive user requests
* High-quality routing tiers

#### Model Families Covered

The current Ollama set spans several major model families:

* **Gemma 3**
  * 270M, 4B, 12B, 27B
* **DeepSeek R1**
  * 8B, 14B, 70B
* **GPT-OSS**
  * 20B, 120B
* **Granite**
  * Granite 4 3B
* **Ministral / Mistral**
  * Ministral 3 8B / 14B
  * Mistral 7B
* **Qwen 3 & Qwen 3 VL**
  * Qwen3 4B / 8B
  * Qwen3 VL 4B / 8B
* **Phi**
  * Phi-3 3.8B / 14B
  * Phi-4 14B
* **Llama 3.x**
  * Llama 3.2 1B / 3B
  * Llama 3.1 8B / 70B
* **Others**
  * Dolphin 3 8B
  * TinyLLaMA 1.1B
  * Falcon 3 3B / 7B / 10B

These are best suited to:

* Nodes with enough VRAM to serve these models efficiently
* High-SLA task classes
* Scenarios where quality/latency > footprint

***

### How the Mapping Works

At the **node / Docker level**, you mostly deal with:

* `cts-llm-*` → **Docker image ID**
* Each image contains:
  * The **engine** (`llamafile` or `ollama`)
  * The **model identifier** (e.g. `deepseek-r1:8b-gpu`, `Qwen2.5-7B-Instruct-1M-Q4_K_M`)

At the **network / session level**, Cortensor can:

* Route tasks by **model index** (0–40)
* Or use more descriptive labels via config/SDKs where needed

As we add more models, we’ll extend the table with new indices and images; this section stays as the canonical mapping.

***

### Appendix: Full Model Mapping

Below are the **current** model mappings used by Cortensor.

> 🧩 **Columns**
>
> * **Index** – numeric model index used in configs / selection
> * **Docker Image ID** – the `cts-llm-*` tag
> * **Engine** – `llamafile` or `ollama`
> * **Model Identifier** – actual model/runtime name inside the image
> * **Model Family / Size** – human-readable description
> * **Quantized?** – whether it’s a quantized GGUF (for llamafile) or full GPU model

***

#### Quantized Models – Llamafile (`0–11`)

| Index | Docker Image ID | Engine    | Model Identifier                                                       | Model Family / Size          | Quantized?  |
| ----: | --------------- | --------- | ---------------------------------------------------------------------- | ---------------------------- | ----------- |
|     0 | `cts-llm`       | llamafile | <p><code>default</code></p><p><code>default-gpu</code> (llamafile)</p> | LLaVA v1.5 7B Q4             | Yes – 4-bit |
|     1 | `cts-llm-1`     | llamafile | `DeepSeek-R1-Distill-Llama-8B-Q4_K_M`                                  | DeepSeek R1 Distill Llama 8B | Yes – 4-bit |
|     2 | `cts-llm-2`     | llamafile | `Meta-Llama-3.1-8B-Instruct.Q4_K_M`                                    | Llama 3.1 8B Instruct        | Yes – 4-bit |
|     3 | `cts-llm-3`     | llamafile | `Qwen_QwQ-32B-Q4_K_M`                                                  | Qwen QwQ 32B                 | Yes – 4-bit |
|     4 | `cts-llm-4`     | llamafile | `Mistral-7B-Instruct-v0.3.Q4_0`                                        | Mistral 7B Instruct v0.3     | Yes – 4-bit |
|     5 | `cts-llm-5`     | llamafile | `granite-3.2-8b-instruct-Q4_K_M`                                       | Granite 3.2 8B Instruct      | Yes – 4-bit |
|     6 | `cts-llm-6`     | llamafile | `Qwen2.5-7B-Instruct-1M-Q4_K_M`                                        | Qwen2.5 7B Instruct 1M       | Yes – 4-bit |
|     7 | `cts-llm-7`     | llamafile | `google_gemma-3-4b-it-Q6_K`                                            | Gemma 3 4B IT                | Yes – 6-bit |
|     8 | `cts-llm-8`     | llamafile | `Qwen_Qwen3-4B-Q8_0`                                                   | Qwen3 4B                     | Yes – 8-bit |
|     9 | `cts-llm-9`     | llamafile | `google_gemma-3-12b-it-Q4_K_M`                                         | Gemma 3 12B IT               | Yes – 4-bit |
|    10 | `cts-llm-10`    | llamafile | `DeepSeek-R1-Distill-Qwen-14B-Q4_K_M`                                  | DeepSeek R1 Distill Qwen 14B | Yes – 4-bit |
|    11 | `cts-llm-11`    | llamafile | `Qwen2.5-Coder-14B-Instruct-Q4_K_M`                                    | Qwen2.5 Coder 14B Instruct   | Yes – 4-bit |

***

#### Full GPU Models – Ollama (`12–40`)

| Index | Docker Image ID | Engine | Model Identifier      | Model Family / Size | Quantized?                     |
| ----: | --------------- | ------ | --------------------- | ------------------- | ------------------------------ |
|    12 | `cts-llm-12`    | ollama | `gemma3:270m-gpu`     | Gemma 3 270M        | No – full GPU                  |
|    13 | `cts-llm-13`    | ollama | `gpt-oss:20b-gpu`     | GPT-OSS 20B         | No – full GPU                  |
|    14 | `cts-llm-14`    | ollama | `deepseek-r1:8b-gpu`  | DeepSeek R1 8B      | No – full GPU                  |
|    15 | `cts-llm-15`    | ollama | `granite4:3b-gpu`     | Granite 4 3B        | No – full GPU                  |
|    16 | `cts-llm-16`    | ollama | `gpt-oss:120b-gpu`    | GPT-OSS 120B        | No – full GPU                  |
|    17 | `cts-llm-17`    | ollama | `deepseek-r1:70b-gpu` | DeepSeek R1 70B     | No – full GPU                  |
|    18 | `cts-llm-18`    | ollama | `gemma3:4b-gpu`       | Gemma 3 4B          | No – full GPU                  |
|    19 | `cts-llm-19`    | ollama | `gemma3:12b-gpu`      | Gemma 3 12B         | No – full GPU                  |
|    20 | `cts-llm-20`    | ollama | `ministral-3:8b-gpu`  | Ministral 3 8B      | No – full GPU                  |
|    21 | `cts-llm-21`    | ollama | `deepseek-r1:14b-gpu` | DeepSeek R1 14B     | No – full GPU                  |
|    22 | `cts-llm-22`    | ollama | `gemma3:27b-gpu`      | Gemma 3 27B         | No – full GPU                  |
|    23 | `cts-llm-23`    | ollama | `qwen3-vl:4b-gpu`     | Qwen3 VL 4B         | No – full GPU (vision-capable) |
|    24 | `cts-llm-24`    | ollama | `qwen3-vl:8b-gpu`     | Qwen3 VL 8B         | No – full GPU (vision-capable) |
|    25 | `cts-llm-25`    | ollama | `ministral-3:14b-gpu` | Ministral 3 14B     | No – full GPU                  |
|    26 | `cts-llm-26`    | ollama | `qwen3:4b-gpu`        | Qwen3 4B            | No – full GPU                  |
|    27 | `cts-llm-27`    | ollama | `qwen3:8b-gpu`        | Qwen3 8B            | No – full GPU                  |
|    28 | `cts-llm-28`    | ollama | `mistral:7b-gpu`      | Mistral 7B          | No – full GPU                  |
|    29 | `cts-llm-29`    | ollama | `phi3:3.8b-gpu`       | Phi-3 3.8B          | No – full GPU                  |
|    30 | `cts-llm-30`    | ollama | `phi3:14b-gpu`        | Phi-3 14B           | No – full GPU                  |
|    31 | `cts-llm-31`    | ollama | `llama3:1b-gpu`       | Llama 3.2 1B        | No – full GPU                  |
|    32 | `cts-llm-32`    | ollama | `llama3:3b-gpu`       | Llama 3.2 3B        | No – full GPU                  |
|    33 | `cts-llm-33`    | ollama | `llama3:8b-gpu`       | Llama 3.1 8B        | No – full GPU                  |
|    34 | `cts-llm-34`    | ollama | `llama3:70b-gpu`      | Llama 3.1 70B       | No – full GPU                  |
|    35 | `cts-llm-35`    | ollama | `phi4:14b-gpu`        | Phi-4 14B           | No – full GPU                  |
|    36 | `cts-llm-36`    | ollama | `dolphin3:8b-gpu`     | Dolphin 3 8B        | No – full GPU                  |
|    37 | `cts-llm-37`    | ollama | `tinyllama1.1b-gpu`   | TinyLLaMA 1.1B      | No – full GPU                  |
|    38 | `cts-llm-38`    | ollama | `falcon3:3b-gpu`      | Falcon 3 3B         | No – full GPU                  |
|    39 | `cts-llm-39`    | ollama | `falcon3:7b-gpu`      | Falcon 3 7B         | No – full GPU                  |
|    40 | `cts-llm-40`    | ollama | `falcon3:10b-gpu`     | Falcon 3 10B        | No – full GPU                  |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cortensor.network/technical-architecture/supported-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
