Cortensor Portal V1 Detailed Spec 04 - Free Plan / Rate Limits / Gateway

Status

Draft

Purpose

This document defines the request-time enforcement layer for Cortensor Portal V1.

It covers:

  • free-plan behavior

  • rate-limit model

  • quota checks

  • gateway responsibilities

  • request flow in front of managed router pools

This spec is focused on the Portal gateway and hot-path enforcement behavior, not the full billing system or full backend implementation.


1. Summary

Portal V1 should use a simple hosted gateway layer in front of managed router pools.

That gateway should handle:

  • API key verification

  • revocation awareness

  • rate limiting

  • account or org quota checks

  • model entitlement checks

  • router-pool resolution

  • request forwarding

  • usage event emission

The V1 free-plan and rate-limit model should borrow from the existing sliding-window mental model already familiar elsewhere in Cortensor.

In short:

The gateway is the product-facing enforcement layer between customer API requests and managed router pools.


2. Goals

Portal V1 gateway and free-plan design should aim to:

  • keep request-time enforcement fast

  • keep allowance logic understandable

  • avoid hot-path dependence on many raw DB reads

  • give users clear and stable limits

  • make V1 enforcement simple enough to launch confidently

  • preserve room for richer quota and billing semantics later


3. Non-Goals

The following are intentionally out of scope or low-priority for V1:

  • heavy enterprise gateway platform design

  • deep dynamic policy engine in V1

  • very complex multi-dimensional pricing logic

  • exposing advanced user-controlled routing behavior

  • building a highly customizable per-request billing layer from day one

V1 should prioritize clarity and reliability over flexibility.


V1 should assume a simple free-plan allowance model such as:

  • request or task cap

  • short reset window

  • weekly allowance

The exact numeric thresholds can be finalized later.

The important design principle is that the model should be:

  • simple enough to explain

  • simple enough to enforce

  • easy to inspect and debug

  • compatible with future upgrades to paid plans

The first release should emphasize:

  • one clear allowance model

  • visible reset timing

  • visible remaining headroom

  • stable error behavior when allowance is exhausted

V1 should avoid:

  • multiple overlapping free tiers

  • too many different quota dimensions

  • confusing “hidden” soft limits


5. Rate-Limit Model

The current best V1 assumption is:

  • sliding-window logic

  • per-key rate limit

  • account-level or org-level quota overlay

This means Portal should distinguish between:

  • rate limits

    • short-window anti-abuse / load-shaping controls

  • quota / allowance

    • broader free-plan or plan-based consumption allowance

Gateway Should Check

Before forwarding a request to a router pool, the gateway should check:

  • key validity

  • key status

  • rate-limit allowance

  • plan allowance

  • model entitlement

These checks should happen in a consistent order so the request path is predictable and debuggable.


6. Gateway Responsibilities

The V1 gateway should own the request-time responsibilities that make the hosted API behave like a real product surface.

Core responsibilities

  • verify Portal API key

  • reject revoked or invalid keys

  • enforce per-key rate limit

  • check free-plan or paid-plan allowance

  • check model entitlement

  • map product model alias to router pool

  • generate request ID

  • forward request to healthy backend target

  • normalize response shape if needed

  • emit usage event

Important design rule

The gateway should be the place where:

  • user-facing request policy is enforced

  • backend router topology is hidden

  • request-time product semantics remain stable

The gateway should not be a thin pass-through to routers.


The intended V1 request flow is:

  1. client sends request with Portal API key

  2. gateway verifies the key

  3. gateway checks key status

  4. gateway enforces rate limit

  5. gateway checks account or org quota

  6. gateway checks model entitlement

  7. gateway resolves model alias to router pool

  8. gateway forwards request to selected router group

  9. backend inference executes

  10. usage event is emitted

  11. durable usage and billing state is updated

In simplified form:

Client → Portal gateway → key / rate-limit / quota / entitlement checks → router pool → usage writeback


8. Relationship to Unkey

Unkey is a strong V1 fit for:

  • key verification

  • revocation support

  • rate limiting

  • fast request-path enforcement support

Portal backend must still own:

  • plan logic

  • quota semantics

  • model entitlement

  • router pool resolution

  • durable usage state

  • request and response normalization

Practical split

Unkey should support:

  • request-time key validity

  • revoked vs active status

  • rate-limit enforcement

  • basic key-scoped request controls

Portal backend / gateway should still own:

  • whether the account still has usage allowance

  • whether the account is allowed to call a given model

  • where the request should be routed

  • what usage events should be written durably

Unkey is a supporting enforcement layer, not the full Portal control plane.


9. Fast State vs Durable State

Portal V1 should distinguish between fast state and durable state.

9.1 Fast State

Used for:

  • hot-path verification

  • rate-limit counters

  • short-window checks

  • cached allow/deny information

Typical examples:

  • per-key request counters in the current window

  • temporary burst-control state

  • hot cache of key status or entitlement summary

9.2 Durable State

Used for:

  • account source of truth

  • usage ledger

  • billing ledger

  • subscriptions

  • policy metadata

  • router-pool metadata

  • request logs

This separation keeps the request path practical:

  • fast state supports low-latency enforcement

  • durable state supports correctness, audits, and later billing reconciliation

9.3 Design Principle

The gateway should not rely on multiple raw durable DB reads on every request if that can be avoided.

Instead:

  • durable state remains the source of truth

  • fast state supports request-time enforcement and smoothing


10. Error Semantics

The gateway should produce stable, product-oriented API errors.

Important gateway outcomes include:

  • invalid key

  • revoked key

  • rate limit reached

  • quota exhausted

  • model not allowed

  • no healthy backend target

  • backend timeout or inference failure

User-facing response principle

The Portal API should present errors in a stable, understandable product format rather than leaking low-level router details unnecessarily.

That means:

  • clear category

  • clear reason

  • obvious next step where possible

Examples of product-facing error categories

  • authentication error

  • authorization / plan error

  • rate-limit error

  • quota error

  • routing / availability error

  • backend execution error

The exact payload shape can be defined later, but the semantics should remain productized.


11. Router-Pool Resolution

The gateway should accept product-facing model names such as:

  • gpt-oss-20b

  • gpt-oss-120b

  • gemma-4-26b

  • qwen-...

It should translate those names into:

  • router pool identifier

  • target backend selection

The customer should not need to know:

  • internal session IDs

  • router hostnames

  • pool topology

  • dedicated session layout

Resolution behavior

At minimum, model resolution should:

  1. validate that the requested model is supported

  2. validate that the caller’s plan allows that model

  3. map model alias to a router pool

  4. select a healthy target inside that pool

  5. forward request


12. V1 Simplicity Recommendations

Keep V1 gateway behavior narrow:

  • one stable request path

  • simple quota checks

  • simple plan rules

  • dedicated-backed pools only by default

Avoid over-customizing the gateway until real traffic and product patterns justify it.

Good V1 principles

  • few decision branches

  • low surprise in enforcement behavior

  • easy-to-trace request flow

  • clear separation between:

    • key validation

    • rate limiting

    • quota allowance

    • routing

Avoid in V1

  • too many product tiers

  • overly dynamic routing semantics

  • per-model custom billing behavior everywhere

  • deeply nested policy trees


13. Open Questions

Questions still worth resolving in later design passes:

  • should the main allowance unit be request-based, task-based, token-based, or hybrid?

  • how much rate-limit visibility should be exposed in the Portal UI?

  • what should be enforced in Unkey vs custom backend logic?

  • should different model families have different allowance weights in V1?

  • should free-plan limits be per-user, per-org, per-key, or some combination?

  • how much request logging should happen synchronously vs asynchronously?

These should be answered with the V1 launch simplicity principle in mind.


14. Relationship to Other Specs

This spec connects directly to:

  • 03-api-key-management.md

  • 05-portal-backend-control-plane.md

  • 06-data-model-and-durable-state.md

  • 07-router-pools-and-model-product-layer.md

  • 08-usage-metering-and-billing.md

It also supports the UI/UX spec because the gateway’s decisions ultimately determine what users see in:

  • usage pages

  • quota warnings

  • API error messaging

  • key behavior


15. Working Summary

Portal V1 should use a simple hosted gateway layer in front of managed router pools.

That gateway should:

  • verify keys

  • enforce rate limits

  • check plan/quota/model access

  • route requests to backend pools

  • emit usage events

  • hide raw router topology from customers

The V1 free-plan and rate-limit system should remain intentionally simple and likely reuse the same sliding-window allowance mental model already familiar elsewhere in Cortensor.

In one sentence:

Portal V1 gateway is the fast request-time enforcement layer that turns Portal-issued keys and simple allowance rules into a stable hosted API experience in front of managed router pools.

Last updated