Cortensor Portal V1 Detailed Spec 04 - Free Plan / Rate Limits / Gateway
Status
Draft
Purpose
This document defines the request-time enforcement layer for Cortensor Portal V1.
It covers:
free-plan behavior
rate-limit model
quota checks
gateway responsibilities
request flow in front of managed router pools
This spec is focused on the Portal gateway and hot-path enforcement behavior, not the full billing system or full backend implementation.
1. Summary
Portal V1 should use a simple hosted gateway layer in front of managed router pools.
That gateway should handle:
API key verification
revocation awareness
rate limiting
account or org quota checks
model entitlement checks
router-pool resolution
request forwarding
usage event emission
The V1 free-plan and rate-limit model should borrow from the existing sliding-window mental model already familiar elsewhere in Cortensor.
In short:
The gateway is the product-facing enforcement layer between customer API requests and managed router pools.
2. Goals
Portal V1 gateway and free-plan design should aim to:
keep request-time enforcement fast
keep allowance logic understandable
avoid hot-path dependence on many raw DB reads
give users clear and stable limits
make V1 enforcement simple enough to launch confidently
preserve room for richer quota and billing semantics later
3. Non-Goals
The following are intentionally out of scope or low-priority for V1:
heavy enterprise gateway platform design
deep dynamic policy engine in V1
very complex multi-dimensional pricing logic
exposing advanced user-controlled routing behavior
building a highly customizable per-request billing layer from day one
V1 should prioritize clarity and reliability over flexibility.
4. Recommended V1 Free-Plan Shape
V1 should assume a simple free-plan allowance model such as:
request or task cap
short reset window
weekly allowance
The exact numeric thresholds can be finalized later.
The important design principle is that the model should be:
simple enough to explain
simple enough to enforce
easy to inspect and debug
compatible with future upgrades to paid plans
Recommended V1 Free-Plan Characteristics
The first release should emphasize:
one clear allowance model
visible reset timing
visible remaining headroom
stable error behavior when allowance is exhausted
V1 should avoid:
multiple overlapping free tiers
too many different quota dimensions
confusing “hidden” soft limits
5. Rate-Limit Model
The current best V1 assumption is:
sliding-window logic
per-key rate limit
account-level or org-level quota overlay
This means Portal should distinguish between:
rate limits
short-window anti-abuse / load-shaping controls
quota / allowance
broader free-plan or plan-based consumption allowance
Gateway Should Check
Before forwarding a request to a router pool, the gateway should check:
key validity
key status
rate-limit allowance
plan allowance
model entitlement
These checks should happen in a consistent order so the request path is predictable and debuggable.
6. Gateway Responsibilities
The V1 gateway should own the request-time responsibilities that make the hosted API behave like a real product surface.
Core responsibilities
verify Portal API key
reject revoked or invalid keys
enforce per-key rate limit
check free-plan or paid-plan allowance
check model entitlement
map product model alias to router pool
generate request ID
forward request to healthy backend target
normalize response shape if needed
emit usage event
Important design rule
The gateway should be the place where:
user-facing request policy is enforced
backend router topology is hidden
request-time product semantics remain stable
The gateway should not be a thin pass-through to routers.
7. Recommended Request Flow
The intended V1 request flow is:
client sends request with Portal API key
gateway verifies the key
gateway checks key status
gateway enforces rate limit
gateway checks account or org quota
gateway checks model entitlement
gateway resolves model alias to router pool
gateway forwards request to selected router group
backend inference executes
usage event is emitted
durable usage and billing state is updated
In simplified form:
Client → Portal gateway → key / rate-limit / quota / entitlement checks → router pool → usage writeback
8. Relationship to Unkey
Unkey is a strong V1 fit for:
key verification
revocation support
rate limiting
fast request-path enforcement support
Portal backend must still own:
plan logic
quota semantics
model entitlement
router pool resolution
durable usage state
request and response normalization
Practical split
Unkey should support:
request-time key validity
revoked vs active status
rate-limit enforcement
basic key-scoped request controls
Portal backend / gateway should still own:
whether the account still has usage allowance
whether the account is allowed to call a given model
where the request should be routed
what usage events should be written durably
Unkey is a supporting enforcement layer, not the full Portal control plane.
9. Fast State vs Durable State
Portal V1 should distinguish between fast state and durable state.
9.1 Fast State
Used for:
hot-path verification
rate-limit counters
short-window checks
cached allow/deny information
Typical examples:
per-key request counters in the current window
temporary burst-control state
hot cache of key status or entitlement summary
9.2 Durable State
Used for:
account source of truth
usage ledger
billing ledger
subscriptions
policy metadata
router-pool metadata
request logs
This separation keeps the request path practical:
fast state supports low-latency enforcement
durable state supports correctness, audits, and later billing reconciliation
9.3 Design Principle
The gateway should not rely on multiple raw durable DB reads on every request if that can be avoided.
Instead:
durable state remains the source of truth
fast state supports request-time enforcement and smoothing
10. Error Semantics
The gateway should produce stable, product-oriented API errors.
Important gateway outcomes include:
invalid key
revoked key
rate limit reached
quota exhausted
model not allowed
no healthy backend target
backend timeout or inference failure
User-facing response principle
The Portal API should present errors in a stable, understandable product format rather than leaking low-level router details unnecessarily.
That means:
clear category
clear reason
obvious next step where possible
Examples of product-facing error categories
authentication error
authorization / plan error
rate-limit error
quota error
routing / availability error
backend execution error
The exact payload shape can be defined later, but the semantics should remain productized.
11. Router-Pool Resolution
The gateway should accept product-facing model names such as:
gpt-oss-20bgpt-oss-120bgemma-4-26bqwen-...
It should translate those names into:
router pool identifier
target backend selection
The customer should not need to know:
internal session IDs
router hostnames
pool topology
dedicated session layout
Resolution behavior
At minimum, model resolution should:
validate that the requested model is supported
validate that the caller’s plan allows that model
map model alias to a router pool
select a healthy target inside that pool
forward request
12. V1 Simplicity Recommendations
Keep V1 gateway behavior narrow:
one stable request path
simple quota checks
simple plan rules
dedicated-backed pools only by default
Avoid over-customizing the gateway until real traffic and product patterns justify it.
Good V1 principles
few decision branches
low surprise in enforcement behavior
easy-to-trace request flow
clear separation between:
key validation
rate limiting
quota allowance
routing
Avoid in V1
too many product tiers
overly dynamic routing semantics
per-model custom billing behavior everywhere
deeply nested policy trees
13. Open Questions
Questions still worth resolving in later design passes:
should the main allowance unit be request-based, task-based, token-based, or hybrid?
how much rate-limit visibility should be exposed in the Portal UI?
what should be enforced in Unkey vs custom backend logic?
should different model families have different allowance weights in V1?
should free-plan limits be per-user, per-org, per-key, or some combination?
how much request logging should happen synchronously vs asynchronously?
These should be answered with the V1 launch simplicity principle in mind.
14. Relationship to Other Specs
This spec connects directly to:
03-api-key-management.md05-portal-backend-control-plane.md06-data-model-and-durable-state.md07-router-pools-and-model-product-layer.md08-usage-metering-and-billing.md
It also supports the UI/UX spec because the gateway’s decisions ultimately determine what users see in:
usage pages
quota warnings
API error messaging
key behavior
15. Working Summary
Portal V1 should use a simple hosted gateway layer in front of managed router pools.
That gateway should:
verify keys
enforce rate limits
check plan/quota/model access
route requests to backend pools
emit usage events
hide raw router topology from customers
The V1 free-plan and rate-limit system should remain intentionally simple and likely reuse the same sliding-window allowance mental model already familiar elsewhere in Cortensor.
In one sentence:
Portal V1 gateway is the fast request-time enforcement layer that turns Portal-issued keys and simple allowance rules into a stable hosted API experience in front of managed router pools.
Last updated