Cortensor Portal
Product & Architecture Specification with Milestones (Draft)
Status: Draft specification and roadmap reference Scope: Hosted Cortensor Portal product built on top of managed router fleets Purpose: Define the intended product shape, system architecture, and phased rollout path for Cortensor Portal
1. Overview
Cortensor Portal is a proposed hosted API product built on top of managed Cortensor router infrastructure.
Instead of requiring users to operate their own router nodes, sessions, and supporting infrastructure, the portal provides a managed control plane and API gateway where developers can:
sign up and authenticate
create and revoke API keys
call a stable hosted inference API
view usage, limits, and model access
consume Cortensor as a product rather than raw infrastructure
The intended V1 mental model is:
An OpenAI-style hosted inference API backed by managed Cortensor router pools
This framing keeps the developer experience simple while preserving the flexibility of Cortensor’s underlying network.
2. Product Goals
The Portal should provide a stable hosted surface for inference while hiding most raw Cortensor topology and operational complexity.
2.1 Primary Goals
Expose a simple hosted API for inference.
Allow users to interact with Cortensor without running their own router.
Centralize:
API key management
entitlement checks
rate limits
quota logic
usage and billing state
Keep router pools and backend session layouts replaceable without breaking user integrations.
2.2 Non-Goals for Early Versions
The first versions should not try to expose the full complexity of the network.
Avoid making early Portal versions:
a thin dashboard directly attached to one raw router
a power-user infra console for raw session orchestration
a marketplace-like capacity routing product on day one
a “support every model and every backend mode” surface
Early phases should prioritize hosted inference simplicity over exposing all Cortensor-native controls.
3. Core Design Principles
3.1 Separate Product Logic from Router Logic
The Portal should not be just a frontend attached directly to one router node.
Instead, the system should be logically separated into three layers:
Portal / Product Layer
Control / Gateway Layer
Router Fleet / Inference Layer
This separation matters because:
business logic should not live inside router nodes
customer API keys should not directly hit raw routers
model routing and router pools should be replaceable
billing, rate limiting, policy enforcement, and observability should stay centralized
3.2 Stable Product Surface, Replaceable Backend
The Portal API should remain stable even if:
router pools change
session IDs change
models are re-mapped internally
capacity is rebalanced or drained
Users should integrate with Portal-facing model aliases and API semantics, not router internals.
3.3 Centralized Enforcement, Decentralized Execution
The Portal should centralize:
authentication
authorization
API key handling
rate limiting
quota checks
billing usage capture
product policies
The Cortensor router fleet remains the backend for:
inference execution
routing into the network
session-level orchestration
future validation / delegation / advanced agent flows
4. High-Level Architecture
The recommended high-level flow is:
User → Portal API / Gateway → Portal control logic → router pool → Cortensor router backend
Avoid exposing this shape as the primary product surface:
User → raw router node
4.1 Logical Layers
Portal / Product Layer
User-facing application layer for:
auth and account management
API key CRUD
usage dashboard
model catalog / documentation
quotas / plans / billing visibility
playgrounds and future product UX
Control / Gateway Layer
Hot-path request layer responsible for:
validating Portal API keys
checking model entitlement
applying rate limits
applying quota checks
selecting backend router pool
request normalization
usage event emission
stable response shaping
Router Fleet / Inference Layer
Managed Cortensor backend consisting of:
multiple router nodes
model-specific router pools
dedicated-backed session groups underneath
load balancing / health / draining logic
future burst / fallback / hybrid capacity logic
5. System Components
5.1 Portal Frontend
The frontend web application should provide:
sign-in / auth
org/account views
API key creation and revocation
usage and limits dashboard
model catalog
docs / examples / playground
future billing / credits management
5.2 Portal Backend
Portal app backend responsibilities:
user/org session handling
API key metadata CRUD
dashboard APIs
usage summaries
billing/account state
admin tooling
coordination with gateway and durable storage
5.3 Inference Gateway
The gateway is the critical hot-path service. Responsibilities:
receive customer inference requests
validate Portal API keys
look up entitlement / plan access
enforce rate limits
enforce quotas
map Portal model alias → router pool
forward request to router fleet
normalize / wrap responses
emit usage and metering events
stamp request IDs and trace metadata
5.4 Router Fleet Manager
Whether implemented as a separate service or as internal control logic, the system needs a router-fleet management function that understands:
router health
router pool health
session-to-model mapping
live vs draining pools
degraded pools
failover behavior
future autoscaling / replacement logic
5.5 Durable Product Data Store
Durable storage is needed for:
identity
orgs
API key metadata
plan state
usage ledgers
request logs
router pool metadata
model access policies
Supabase is a strong candidate here.
5.6 Hot-Path Runtime State Layer
A separate fast state layer should be used for:
rate limiting
burst smoothing
short-TTL key policy cache
short-window request counters
fast deny/allow checks
Examples:
Redis
Upstash
managed equivalent fast key-value system
6. Product Positioning
6.1 Initial Positioning
For the first Portal milestone, the product should be framed as a:
Hosted inference product
rather than:
Managed gateway into the Cortensor network
Reasoning:
easier to explain
easier to support
easier to price
easier to keep consistent while backend evolves
reduces exposure of raw router/session/network concepts too early
6.2 Longer-Term Positioning Evolution
As the Portal matures, it can gradually expose more Cortensor-native concepts where useful, especially for:
advanced routing control
performance / trust tiers
agent-native workflows
future validation / delegation APIs
hybrid dedicated + ephemeral offerings
The roadmap should therefore evolve from:
V1: hosted inference first to
later versions: managed gateway into Cortensor network capabilities where product maturity supports it
7. Suggested Model Offering Strategy
7.1 V1 Model Catalog
V1 should start with a narrow and curated model catalog.
Initial candidates:
OSS 20B
OSS 120B
Gemma 4 family
newer Qwen family
7.2 Why Start Narrow
Reasons to start with a smaller set:
easier operational predictability
easier debugging
simpler support burden
cleaner pricing/quota design
clearer product messaging
lower risk than exposing every model supported by raw Cortensor infra
7.3 Model Pool Abstraction
Internally, model families should map to router pools.
Example conceptual pools:
oss-20b pooloss-120b poolgemma4 poolqwen pool
Then the gateway resolves a Portal-facing model alias to a backend pool.
Example:
customer requests
gemma-4-26bgateway maps it to
gemma4 router poolload balancer / fleet manager selects a healthy router in that pool
router uses dedicated-backed sessions underneath
This keeps internal topology hidden from customers.
7.4 Later Model Expansion
Later milestones can expand model coverage in stages:
more open-source families
tiered performance variants
possible premium / trust-optimized models
selective hybrid or burst-backed offerings
Model expansion should be tied to:
operational maturity
quota/billing clarity
supportability
8. Capacity Strategy – Dedicated vs Ephemeral
8.1 V1 Recommendation: Dedicated-Backed First
For the first hosted version, the recommendation is:
Use dedicated-backed router pools for the primary product path.
Reasons:
more predictable latency
more consistent output behavior
easier quota/billing semantics
easier failure analysis
better supportability
easier to set customer expectations
8.2 Capacity Roadmap
The Portal capacity strategy should evolve in phases:
V1: dedicated-backed hosted inference
V2: hybrid bursting for selected lower-cost or flexible tiers
V3: more dynamic marketplace-like capacity strategy
8.3 Tradeoff Summary
Dedicated nodes
cleaner
more predictable
more supportable
better for early commercial productization
Ephemeral nodes
more flexible
potentially more cost-efficient
better fit for dynamic capacity markets
but introduces more variability and operational complexity
9. Supabase in the Architecture
Supabase is a reasonable choice for the Portal-side source of truth.
9.1 Good Uses for Supabase
Supabase fits well for:
user auth
organizations
org membership
API key metadata
model access policies
router pool metadata
usage events
request logs
billing ledger
subscriptions / credits / plan state
internal/admin metadata
9.2 Likely Early Tables / Entities
usersorganizationsorganization_membersapi_keysmodel_access_policiesrouter_poolsrouter_pool_modelsusage_eventsrequest_logsbilling_ledgerplan_subscriptions
9.3 Important Caveat
Supabase should be the durable source of truth, but not the only hot-path enforcement layer.
This distinction matters because raw DB checks on every request do not scale well operationally.
10. Hot Path Enforcement Model
10.1 Why Not Raw DB Reads Per Request
Naively checking the DB on every inference request for:
key validity
plan access
model entitlement
quota exhaustion
rate limit counters
may work for a very small MVP, but becomes painful quickly.
Problems:
higher latency
DB contention on bursty traffic
fragile rate-limit counters
difficult horizontal scaling
awkward burst handling
slower request-time policy decisions
10.2 Recommended Split
Use the DB for:
key metadata
org/plan state
source-of-truth quotas
durable usage ledger
revocation state
durable billing state
Use a runtime fast layer for:
rate limiting
burst control
short-window counters
hot key policy cache
fast request allow/deny decisions
This is the core Portal hot-path principle.
11. Rate Limit, Quota, and Billing
These should be treated as separate product primitives and rolled out with increasing sophistication over time.
11.1 Rate Limit
Purpose:
anti-abuse
load smoothing
burst control
Examples:
requests per second
requests per minute
concurrent request caps
Operationally:
enforced in the gateway hot path
backed by fast runtime state
11.2 Quota
Purpose:
enforce plan consumption limits
implement prepaid / monthly limits
Examples:
requests per month
tokens per month
daily image generations
monthly credit caps
Operationally:
durable source of truth in DB/ledger
hot-path checks can rely on cached snapshots
should not depend on naive synchronous DB increments for every request at scale
11.3 Billing Ledger
Purpose:
durable auditable source for accounting and product logic
Examples:
request count
token count
model-specific usage
premium-tier surcharges
latency / SLA dimensions
Operationally:
durable store
potentially written asynchronously from gateway usage events
11.4 Billing Milestone Guidance
Suggested rollout:
V1: prepaid credits or monthly quota
V2: refined quota tiers / model-specific pricing
V3+: richer billing dimensions if product traction justifies it
12. API Key Architecture
The Portal should expose Portal-native API keys, not raw router-native credentials.
12.1 Recommended Key Design
only store hashed keys
keep short prefixes for display / search
support:
revocation
expiration
scopes
environment segmentation
Suggested key families:
pk_live_xxxpk_test_xxx
Possible scopes:
inference only
usage read-only
model-specific access
environment-specific access
This makes the product feel clean and lets the gateway enforce future rules more safely.
12.2 Portal Key Flow
customer uses Portal API key
gateway verifies key and status
gateway resolves metadata / entitlement
gateway never passes customer key directly to raw router nodes
13. Logical Service Split
Even if V1 ships as one deployable codebase, the design should think in three logical services.
13.1 Portal App Backend
Responsibilities:
auth
API key CRUD
dashboard
org/account management
admin tooling
usage views
13.2 Inference Gateway
Responsibilities:
receive inference calls
authenticate keys
apply rate limits / quota checks
map model alias to router pool
normalize responses
emit request / usage events
13.3 Router Fleet Manager
Responsibilities:
router health awareness
router pool state
draining / maintenance / failover
future autoscaling
router replacement
pool scoring and selection metadata
This logical split keeps product logic separate from infrastructure logic even if early implementation is consolidated.
14. Gateway Strategy Roadmap
The architecture should allow several gateway implementations, but the spec should define a milestone path rather than present them as permanent parallel choices.
14.1 Milestone 1 – Fast Product Launch Path
A practical early milestone can use:
Supabase
lightweight Portal backend
managed API key / rate-limit support such as Unkey or similar
managed router pools underneath
Purpose:
accelerate first external Portal MVP
minimize time to first commercial product surface
14.2 Milestone 2 – Portal-Controlled Gateway Layer
As Portal becomes more central, the recommended architecture shifts toward:
Supabase for durable product/account state
custom lightweight gateway
Redis / Upstash / equivalent for runtime hot-path enforcement
managed router pools underneath
Purpose:
preserve custom control over:
model routing
quota semantics
usage and billing logic
response normalization
reduce dependence on third-party gateway product semantics
14.3 Milestone 3 – More Advanced Gateway / Edge / Enterprise Paths
Later versions may add or evaluate:
edge gateway deployments (e.g. Cloudflare-style)
managed gateway products where useful
enterprise-style gateways if needed
more advanced policy / analytics / audit controls
These should be treated as later roadmap expansions, not early blockers.
15. Recommended Architecture Milestones
15.1 V1 – Hosted Inference Portal
Target shape:
Supabase for auth and durable product DB
lightweight gateway layer
fast runtime state for rate limits
managed router pools
dedicated-backed sessions per model family
narrow model catalog
simple prepaid credits or quota
Primary objective:
launch a usable hosted inference product with strong operational clarity
15.2 V2 – Hybrid Productization
Target additions:
hybrid dedicated + selective burst capacity
richer quota and billing semantics
stronger router fleet scoring and traffic shifting
improved admin and operational tooling
broader but still curated model catalog
Primary objective:
improve efficiency and product flexibility without breaking the Portal-facing API
15.3 V3 – More Dynamic Capacity & Network Exposure
Target additions:
more dynamic capacity strategies
deeper Cortensor-native routing visibility for advanced users
possible exposure of trust/performance tiers
more agent-native or programmable product surfaces
Primary objective:
evolve from hosted inference product into a more powerful managed gateway into the Cortensor network where appropriate
16. Request Flow (Recommended V1 Baseline)
Suggested request flow:
client sends request with Portal API key
gateway validates key and key status
gateway checks cached entitlement / key policy
gateway enforces rate limit via fast runtime state
gateway checks quota / plan allowance
gateway maps requested model alias to router pool
gateway forwards request to chosen router group
router serves inference via managed dedicated session(s)
gateway emits usage / metering event
durable usage ledger and request logs are updated
Optional later refinements:
asynchronous billing ledger aggregation
event queue between gateway and usage processor
reconciliation jobs comparing gateway records and router completion records
17. Suggested Portal MVP Scope
17.1 User-Facing V1 Features
sign-in / auth
org/account identity
API key create / revoke
simple docs / API reference
minimal usage dashboard
few launch models only
basic quota / error messaging
17.2 Backend V1 Features
API key verification
revocation support
per-key rate limiting
simple quota checks
model allowlist / entitlement
request logging with request IDs
router pool selection
basic admin visibility
17.3 V1 Billing Primitive
Keep billing simple initially:
prepaid credits, or
monthly quota / subscription
Avoid launching with too many billing dimensions too early unless there is a strong product reason.
18. Operational Requirements
Even a narrow V1 should include basic operability from the beginning.
18.1 Strongly Recommended Early Capabilities
request ID tracing
per-key usage metering
per-model latency dashboards
per-model error dashboards
per-router-pool health dashboards
admin ability to disable a model quickly
admin ability to drain / pause a router pool quickly
These are necessary for real product operation, not just a technical demo.
19. Risks to Watch
19.1 Product / Architecture Risks
mixing business logic into router nodes
exposing raw session/router topology too early
supporting too many models too early
making direct DB reads part of every hot-path decision
underbuilding observability
unclear quota / billing semantics
19.2 Infrastructure Risks
weak router pool health awareness
poor failover behavior
no clear separation between customer API and backend router fleet
difficulty reconciling usage / completion state if metering is underdesigned
19.3 Product Experience Risks
too much Cortensor-native vocabulary in user-facing APIs/docs
customer confusion about models / tiers / limits
internal backend behavior leaking into the product surface
20. Roadmap Guidance
20.1 Near-Term Product Direction
The immediate roadmap should favor:
hosted inference first
dedicated-backed router pools
narrow model set
centralized Portal control plane
simple product semantics
strong observability from day one
20.2 Medium-Term Direction
Once the core hosted inference product is stable, the roadmap should add:
stronger custom gateway control
hybrid capacity tiers
richer metering / quota / pricing logic
more operational automation around router fleets
20.3 Long-Term Direction
As the Portal matures, it can evolve toward:
more dynamic capacity strategies
deeper Cortensor-native features for advanced users
more sophisticated product tiers and policy knobs
broader agent-ready and network-aware surfaces
The roadmap should therefore preserve architectural control and replaceability from the beginning, even if early milestones remain intentionally simple.
21. Open Questions for Next Iteration
21.1 Product Questions
Should V1 be framed entirely as hosted inference, or should some Cortensor-native concepts remain visible?
What should the first pricing primitive be?
prepaid
subscription quota
hybrid
Should Portal expose latency/performance tiers in V1?
21.2 Gateway Questions
How quickly should Portal move from MVP gateway choices into a more custom gateway layer?
Should per-key quota be:
request-based
token-based
both
How much response normalization should happen in the Portal vs pass-through from router responses?
21.3 Fleet Questions
How should router pool health be tracked and scored?
How should traffic shift during degradation or maintenance?
When should ephemeral capacity be introduced, and for which product tiers?
21.4 Data / Metering Questions
What should be stored synchronously vs asynchronously?
How should usage reconciliation be handled if gateway success and router completion differ?
How should model access policies be versioned when offerings change?
22. Working Summary
Cortensor Portal should be built as a managed hosted API product in front of router fleets, not as a thin dashboard directly attached to a router node.
A strong V1 architecture is:
Supabase for auth and durable product state
a dedicated gateway layer for customer inference traffic
a fast runtime state layer for rate limiting and short-window policy checks
managed router pools behind load balancers
dedicated-backed session groups for a small model catalog
Later milestones can expand into hybrid capacity, richer pricing/quotas, and more network-native controls.
This keeps the Portal API stable while allowing the underlying Cortensor router/session topology to evolve over time.
Last updated