Cortensor Portal V1 – Model Offering, Router Fleet & Capacity Strategy

Status: Draft Scope: Initial hosted model catalog, dedicated-first capacity plan, and first router-fleet layout for Portal V1


1. Overview

Portal V1 needs a backend inference shape that is:

  • predictable

  • supportable

  • easy to reason about operationally

  • simple enough to launch without overbuilding

The current direction is to start with:

  • a small hosted model catalog

  • dedicated-backed capacity first

  • router groups organized by model

  • reverse-proxy / Nginx model groups as the first fleet layout

This gives Portal a stable backend product surface while keeping implementation and operations manageable.


2. Model Offering Strategy

Portal V1 is expected to start narrow, not expose every supported model in the Cortensor ecosystem.

Likely initial model families

Current likely families for V1:

  • OSS 20B

  • OSS 120B

  • Gemma 4

  • Qwen family

These are intended as curated hosted offerings, not a full mirror of every possible backend-supported model.

Why start narrow

A smaller catalog helps with:

  • operational predictability

  • easier debugging

  • clearer support expectations

  • simpler quota/billing design

  • easier product messaging

  • lower risk in the first hosted release

Portal V1 should feel deliberate, not overloaded.


3. Product Model Aliases vs Backend Topology

Users should interact with:

  • simple product-facing model names

They should not need to know:

  • session IDs

  • router layout

  • which pool serves which backend session

  • which node group is dedicated vs ephemeral

That means the Portal needs a layer that maps:

  • product model alias

  • to backend router group / model pool

Conceptually:

  • user requests oss-20b

  • Portal resolves that to the oss-20b router group

  • a healthy router in that group serves the request

  • that router uses its configured backend session

This allows backend topology to evolve without breaking the Portal-facing model API.


4. Capacity Strategy – Dedicated First

The current recommendation for Portal V1 is:

Start with dedicated-backed inference first

Why dedicated-backed capacity is safer for V1

Dedicated-backed capacity is better for a first hosted product because it is:

  • more predictable

  • easier to support

  • easier to reason about operationally

  • easier to analyze when things go wrong

  • easier to align with product expectations around stability and latency

A hosted inference product needs to behave in a reasonably stable way. Dedicated-backed capacity is the better starting point for that.


5. Ephemeral Capacity Positioning

Ephemeral capacity is still important to Cortensor, but Portal V1 is not expected to rely on it as the core serving path.

Why ephemeral is not the first foundation

Ephemeral capacity introduces more variability around:

  • node assignment

  • latency

  • model availability

  • performance predictability

  • debugging and supportability

Those tradeoffs are valuable later, but they increase complexity too much for the first hosted product version.

Likely role of ephemeral later

Ephemeral capacity is more likely to appear in later phases as:

  • a hybrid burst layer

  • a cheaper / more flexible product tier

  • a more dynamic marketplace-like backend capacity source

So the rough path is:

  • V1: dedicated-backed

  • Later: selective hybrid / burst / dynamic capacity


6. Router Fleet Design – Current Thinking

Recent thinking on the first fleet shape is:

For each model group:

  • multiple router nodes per model

  • each router node configured with one session

  • that session tied to one model

  • many same-model routers together form one model group

This is intentionally simple and operationally clear.

Example conceptual shape

For oss-20b:

  • Router A → session for OSS 20B

  • Router B → session for OSS 20B

  • Router C → session for OSS 20B

Those routers together form the OSS 20B group.

Repeat the same pattern for:

  • OSS 120B

  • Gemma 4

  • Qwen family


7. V1 Fleet Layout Recommendation

For the first Portal version, the recommended infra path is:

  • group routers by model

  • put reverse proxy / Nginx in front of each model group

  • keep fleet rollout small and easy first

This means V1 should favor:

  • simpler reverse-proxy model groups

  • smaller router fleets

  • easy-to-reason-about routing behavior

  • fewer moving parts

Why this is the right first step

This approach is attractive because it is:

  • easy to understand

  • easy to support operationally

  • easy to expand one group at a time

  • easier to debug than a more abstract orchestration layer

  • sufficient for proving the hosted product shape


8. Longer-Term Fleet Evolution

The current thinking is that router fleets may evolve later into a more Kubernetes-style deployment or replacement model, but not for the first step.

That later evolution could bring:

  • cleaner scaling

  • automated replacement

  • better rollout controls

  • more dynamic fleet balancing

  • more standardized infra operations

But that should be treated as a later maturity milestone, not a V1 requirement.

Portal V1 should prove product and backend demand before introducing more complex fleet orchestration.


9. Why This Scoping Is Happening Now

Current work in this area is mostly about:

  • previewing real Portal surfaces early

  • identifying blockers early

  • reducing unknowns before execution

  • shaping rough delivery expectations

The purpose is not to overdesign indefinitely. The purpose is to make the later implementation phase smoother by narrowing the unknowns now.

This follows the same approach used in earlier Cortensor work:

  • surface the product shape early

  • identify likely blockers

  • refine implementation direction before committing build effort


10. Current Direction Summary

Portal V1 backend strategy is currently shaping up as:

  • a hosted inference product

  • small curated model catalog

  • OSS 20B / OSS 120B / Gemma 4 / Qwen as likely first families

  • dedicated-backed capacity first

  • multiple router nodes per model group

  • one model-focused session per router

  • reverse-proxy / Nginx model groups first

  • more advanced fleet orchestration later if justified

In one sentence:

Portal V1 should start with a narrow hosted model catalog backed by dedicated-first router groups organized per model family, using a simple reverse-proxy fleet shape first and leaving more dynamic scaling and orchestration for later versions.

Last updated