Cortensor Portal V1 – Model Offering, Router Fleet & Capacity Strategy
Status: Draft Scope: Initial hosted model catalog, dedicated-first capacity plan, and first router-fleet layout for Portal V1
1. Overview
Portal V1 needs a backend inference shape that is:
predictable
supportable
easy to reason about operationally
simple enough to launch without overbuilding
The current direction is to start with:
a small hosted model catalog
dedicated-backed capacity first
router groups organized by model
reverse-proxy / Nginx model groups as the first fleet layout
This gives Portal a stable backend product surface while keeping implementation and operations manageable.
2. Model Offering Strategy
Portal V1 is expected to start narrow, not expose every supported model in the Cortensor ecosystem.
Likely initial model families
Current likely families for V1:
OSS 20B
OSS 120B
Gemma 4
Qwen family
These are intended as curated hosted offerings, not a full mirror of every possible backend-supported model.
Why start narrow
A smaller catalog helps with:
operational predictability
easier debugging
clearer support expectations
simpler quota/billing design
easier product messaging
lower risk in the first hosted release
Portal V1 should feel deliberate, not overloaded.
3. Product Model Aliases vs Backend Topology
Users should interact with:
simple product-facing model names
They should not need to know:
session IDs
router layout
which pool serves which backend session
which node group is dedicated vs ephemeral
That means the Portal needs a layer that maps:
product model alias
to backend router group / model pool
Conceptually:
user requests
oss-20bPortal resolves that to the
oss-20brouter groupa healthy router in that group serves the request
that router uses its configured backend session
This allows backend topology to evolve without breaking the Portal-facing model API.
4. Capacity Strategy – Dedicated First
The current recommendation for Portal V1 is:
Start with dedicated-backed inference first
Why dedicated-backed capacity is safer for V1
Dedicated-backed capacity is better for a first hosted product because it is:
more predictable
easier to support
easier to reason about operationally
easier to analyze when things go wrong
easier to align with product expectations around stability and latency
A hosted inference product needs to behave in a reasonably stable way. Dedicated-backed capacity is the better starting point for that.
5. Ephemeral Capacity Positioning
Ephemeral capacity is still important to Cortensor, but Portal V1 is not expected to rely on it as the core serving path.
Why ephemeral is not the first foundation
Ephemeral capacity introduces more variability around:
node assignment
latency
model availability
performance predictability
debugging and supportability
Those tradeoffs are valuable later, but they increase complexity too much for the first hosted product version.
Likely role of ephemeral later
Ephemeral capacity is more likely to appear in later phases as:
a hybrid burst layer
a cheaper / more flexible product tier
a more dynamic marketplace-like backend capacity source
So the rough path is:
V1: dedicated-backed
Later: selective hybrid / burst / dynamic capacity
6. Router Fleet Design – Current Thinking
Recent thinking on the first fleet shape is:
For each model group:
multiple router nodes per model
each router node configured with one session
that session tied to one model
many same-model routers together form one model group
This is intentionally simple and operationally clear.
Example conceptual shape
For oss-20b:
Router A → session for OSS 20B
Router B → session for OSS 20B
Router C → session for OSS 20B
Those routers together form the OSS 20B group.
Repeat the same pattern for:
OSS 120B
Gemma 4
Qwen family
7. V1 Fleet Layout Recommendation
For the first Portal version, the recommended infra path is:
group routers by model
put reverse proxy / Nginx in front of each model group
keep fleet rollout small and easy first
This means V1 should favor:
simpler reverse-proxy model groups
smaller router fleets
easy-to-reason-about routing behavior
fewer moving parts
Why this is the right first step
This approach is attractive because it is:
easy to understand
easy to support operationally
easy to expand one group at a time
easier to debug than a more abstract orchestration layer
sufficient for proving the hosted product shape
8. Longer-Term Fleet Evolution
The current thinking is that router fleets may evolve later into a more Kubernetes-style deployment or replacement model, but not for the first step.
That later evolution could bring:
cleaner scaling
automated replacement
better rollout controls
more dynamic fleet balancing
more standardized infra operations
But that should be treated as a later maturity milestone, not a V1 requirement.
Portal V1 should prove product and backend demand before introducing more complex fleet orchestration.
9. Why This Scoping Is Happening Now
Current work in this area is mostly about:
previewing real Portal surfaces early
identifying blockers early
reducing unknowns before execution
shaping rough delivery expectations
The purpose is not to overdesign indefinitely. The purpose is to make the later implementation phase smoother by narrowing the unknowns now.
This follows the same approach used in earlier Cortensor work:
surface the product shape early
identify likely blockers
refine implementation direction before committing build effort
10. Current Direction Summary
Portal V1 backend strategy is currently shaping up as:
a hosted inference product
small curated model catalog
OSS 20B / OSS 120B / Gemma 4 / Qwen as likely first families
dedicated-backed capacity first
multiple router nodes per model group
one model-focused session per router
reverse-proxy / Nginx model groups first
more advanced fleet orchestration later if justified
In one sentence:
Portal V1 should start with a narrow hosted model catalog backed by dedicated-first router groups organized per model family, using a simple reverse-proxy fleet shape first and leaving more dynamic scaling and orchestration for later versions.
Last updated