Cortensor Portal V1 Detailed Spec 11 - Portal Gateway / Runtime API / Ops Spec

Status

Draft

Purpose

This document consolidates the Portal V1 runtime gateway requirements.

It tracks the pieces that sit between customer API requests and managed Cortensor router pools:

  • Portal gateway / business API

  • API key verification and rate limiting

  • quota and entitlement checks

  • router-pool selection

  • usage metering ledger

  • observability

  • admin / ops controls

  • security policy

Payment is intentionally deferred for the initial free-plan implementation, but this spec keeps the usage ledger shaped so billing can be added later without redesigning the request path.


1. Summary

The Portal Gateway is the product runtime layer for Cortensor Portal.

It should be a small service owned by the Portal product layer. It receives customer inference requests, verifies access, selects the correct router pool, forwards the request, records usage, and returns a stable product-facing response.

The gateway should not be replaced by:

  • raw Nginx routing

  • raw Supabase reads

  • direct customer access to router nodes

Nginx can proxy and load balance. Supabase can store durable state. Unkey can manage API keys and rate limits. The gateway owns the Cortensor-specific business logic that ties those systems together.


2. Goals

The Portal Gateway should aim to:

  • provide one stable product API entry point for Portal users

  • keep raw router nodes and session IDs hidden from normal customers

  • verify API keys before any router work is performed

  • enforce per-key and per-org rate limits / quotas

  • map public model aliases to managed router pools

  • emit durable usage events for billing, audits, support, credits, and disputes

  • expose enough observability to debug customer and router issues

  • support operational controls such as disabling keys and draining router pools

  • keep payment optional for the first free-plan launch


3. Non-Goals

The following are explicitly out of scope for the first Portal Gateway release:

  • full enterprise API gateway platform in V1

  • complex paid billing implementation in the first free-plan version

  • user-controlled router/session topology

  • exposing raw router internals in normal API responses

  • perfect global load balancing across every router process

  • replacing Unkey, Clerk, Supabase, or Nginx with full custom equivalents in V1

The gateway should be small, clear, and product-driven, not an overbuilt infrastructure platform.


Recommended runtime path:

Component
Owns

Clerk

Portal user auth, sessions, org membership, user/org identity

Supabase

durable product state, usage ledger, billing ledger, router-pool metadata

Unkey

API key issuance, verification, revocation, rate limits, key analytics

Portal Gateway

request authorization, entitlement, quota, model routing, metering, response normalization

Nginx / reverse proxy

TLS, upstream groups, router-node load balancing

Router nodes

Cortensor inference execution and session-level routing

Key Architecture Principle

The customer should only know:

  • a stable Portal API

  • a stable model alias

  • a stable key format

  • stable product-facing errors

They should not need to know:

  • which router served the request

  • which session executed it

  • whether a pool is draining

  • backend topology or fleet composition


5. Gateway Responsibilities

The gateway should own the following V1 runtime responsibilities:

Responsibility
Detail

Request identity

Generate or propagate request_id for every request

API key verification

Verify customer key through Unkey

Key status

Reject revoked, paused, expired, or disabled keys

Rate limiting

Enforce per-key RPM and any model-specific limits

Quota

Check org/account allowance before forwarding

Entitlement

Check whether org/key can use requested model

Model routing

Resolve public model alias to router pool

Router target selection

Pick healthy backend target or pool endpoint

Forwarding

Proxy request to router while preserving needed headers/timeouts

Response normalization

Return stable customer-facing response shape

Usage metering

Record durable usage event after request outcome is known

Support visibility

Preserve enough data for debugging without leaking secrets

Additional Responsibilities Implied by Product Ownership

The gateway should also own:

  • request validation

  • customer-safe error formatting

  • request tracing correlation

  • product-level feature-flag interpretation

  • future free-plan and paid-plan branching logic


6. Public API Surface

Recommended narrow V1 public API:

If scope needs to stay tighter, V1 can start with:

and expand later.

Public Request Shape Example

Or for chat-style requests:

The customer should not need to know:

  • router hostname

  • session ID

  • pool internals

  • dedicated vs ephemeral backend details


7. Request Lifecycle

Required V1 request lifecycle:

  1. Receive request.

  2. Assign request_id.

  3. Parse API key.

  4. Verify key with Unkey.

  5. Resolve org/account/key metadata.

  6. Check key status.

  7. Check rate limit.

  8. Check model entitlement.

  9. Check free-plan quota / credits.

  10. Resolve model alias to router pool.

  11. Select router target.

  12. Forward request to router.

  13. Capture response status, latency, and usage.

  14. Write durable usage event.

  15. Return normalized response.

Important Behavior Rules

  • Invalid or blocked requests should not reach router pools.

  • Usage should be recorded for:

    • accepted requests

    • failed router attempts

    • completed requests

  • A successful inference response should not be failed only because an async usage write had a transient problem.

    • The failed write must be retried and surfaced to ops.


8. Core Flows

8.1 Success Flow

8.2 Blocked Key Flow

8.3 Rate-Limited Flow

8.4 Quota-Exhausted Flow

8.5 Router Failure Flow

8.6 Durable Write Failure Flow


9. Rate Limit and Quota

Rate limiting is required for V1, even with a free plan.

Control
Owner
Purpose

Per-key RPM

Unkey

prevent request bursts

Per-key daily cap

Unkey or gateway

stop single key abuse

Per-org free allowance

Gateway + Supabase

product quota

Per-model access

Gateway + Supabase

entitlement

Per-IP fallback limit

Gateway or edge layer

abuse protection when key behavior is suspicious

Response Semantics

Rate-limit response:

Quota exhausted response:

For a free-plan MVP, 402 can mean free allowance exhausted even before paid billing is enabled.

If that feels too payment-oriented during beta, 403 is acceptable temporarily, but the long-term product shape likely wants 402.


10. Usage Metering Ledger

Unkey can provide usage visibility, but Portal still needs its own durable usage ledger in Supabase.

Why Portal Needs Its Own Ledger

Reasons include:

  • billing reconciliation

  • credit burn-down

  • support investigations

  • abuse analysis

  • customer usage export

  • disputes and refunds later

  • router pool quality analysis

  • internal cost accounting

Field
Purpose

id

durable event ID

request_id

correlate gateway/router/logs

org_id

account owner

user_id

optional dashboard actor

api_key_id

key attribution

api_key_prefix

safe display/debug value

model_alias

customer-requested model

router_pool_id

selected pool

router_node_id

selected router target, if known

router_session_id

optional internal field

status

accepted, rejected, completed, failed, timeout

http_status

final HTTP status

prompt_tokens

request tokens, if available

completion_tokens

response tokens, if available

total_tokens

total tokens, if available

usage_units

normalized product usage unit

estimated_cost_units

internal cost estimate

latency_ms

gateway total latency

router_latency_ms

backend latency, if measured

error_code

stable error code

created_at

event time

V1 can start with partial token data if router responses do not always include perfect usage values. The schema should still reserve these fields so the gateway and billing layer do not require redesign later.


11. Payment and Billing Scope

Initial Portal V1 can launch as a free-plan product.

Payment is deferred from the first implementation, but the gateway should still produce usage events as if billing will exist later.

  • no card required at first

  • free quota enforced by gateway

  • durable usage ledger written from day one

  • plans table supports:

    • free

    • internal

    • partner

    • future paid plans

  • billing ledger can exist with zero-dollar / free events

Deferred Paid-Billing Integrations

These can come later:

  • Stripe cards / subscriptions

  • Stripe stablecoin payments

  • Coinbase Commerce / Business

  • prepaid credits

  • monthly quotas with overage

Design Rule

Do not wait for payment implementation to build metering. Metering is needed immediately.


12. Router Pool Selection

The gateway should route by model alias, not by customer-provided session ID.

Example:

Field
Purpose

pool_id

stable internal ID

model_aliases

customer-facing names served by pool

router_targets

URLs or upstream names

status

active, draining, paused, disabled

weight

future weighted routing

health_state

healthy, degraded, unhealthy

timeout_seconds

pool-specific timeout

supports_streaming

capability flag

There are two routing layers:

  1. Gateway / Nginx balances across router nodes.

  2. Router nodes may internally balance across sessions using env-configured session pools.

The customer should never need to know those internals.


13. Failure Semantics

Stable error behavior should be defined early.

Condition
Recommended Status
Notes

Missing API key

401

no key provided

Invalid API key

401

key not verified

Revoked/disabled key

403

known key but blocked

Model not allowed

403

entitlement failure

Rate limit exceeded

429

short-window protection

Quota exhausted

402 or 403

prefer 402 long term

No healthy router target

503

pool unavailable

Router timeout

504

backend exceeded timeout

Router failed

502

upstream error

Usage write failed

200/502 based on inference result

enqueue retry, do not hide metering failure internally

Stable Error Response Shape

Suggested productized shape:

Suggested codes:


14. Observability

Observability is required in V1 because the gateway becomes the customer support boundary.

Minimum Request Log Fields

  • request_id

  • org_id

  • api_key_id

  • model_alias

  • router_pool_id

  • router_node_id

  • status

  • http_status

  • latency_ms

  • router_latency_ms

  • usage_units

  • error_code

Minimum Metrics

Metric
Purpose

requests per key/org/model

usage visibility

latency per model/pool

performance tracking

error rate per router pool

backend health

timeout rate

pool/runtime tuning

failed task rate

router quality

pool saturation

capacity planning

cost/usage units per request

future billing

Request IDs should be propagated to router calls when possible.


15. Admin / Ops Controls

V1 does not need a polished internal admin console, but the backend must support operational controls.

Required Controls

Control
V1 Form

Disable API key

backend endpoint or admin script

Disable org/account

backend endpoint or admin script

Inspect usage

query/API endpoint

Inspect request by ID

query/API endpoint

Disable model access

config/table update

Pause router pool

config/table update

Drain router pool

config/table update

View pool health

internal endpoint/dashboard

Useful Admin Views Later

  • keys by org

  • usage by org/key/model

  • recent failed requests

  • router pool health

  • quota state

  • metering retry queue

Ops Principle

Backend capability first, polished admin UI later.


16. Security Policy

Security must be explicit because the gateway handles customer keys, billing-adjacent state, and router access.

Required V1 Policies

Area
Policy

API keys

never store raw keys in Supabase; use Unkey IDs/prefixes/metadata

Secrets

keep auth-provider, Unkey, Supabase service role, and payment-provider secrets server-side only

Service role

only gateway/backend uses Supabase service role

Org access

every dashboard API checks authenticated user/org membership

Router access

routers are not directly customer-facing where possible

Webhooks

verify signatures for external auth / key / billing systems

Audit logs

record admin actions and key lifecycle changes

Logs

redact auth headers, raw API keys, and sensitive payloads when required

RLS

use RLS for browser-readable tables; keep sensitive writes backend-owned

Avoid split-brain auth where multiple systems act as the authoritative human identity layer.


17. Data Dependencies

The gateway depends on these durable concepts:

  • organizations

  • Portal users

  • organization memberships

  • API key metadata / shadow records

  • model catalog

  • model entitlement rules

  • router pools

  • router targets

  • plan / quota state

  • usage events

  • request logs

  • audit logs

Unkey can own key verification, but Supabase should still store the Portal-facing key metadata needed for dashboard display and internal reporting.


18. Prototype Checklist

The first useful prototype should prove the complete request path.

Required Prototype

  • create an API key

  • verify the key at gateway

  • enforce a simple rate limit

  • accept one public model alias

  • resolve model alias to one router pool

  • forward to a local/test router endpoint

  • return normalized response

  • write one usage event

  • show usage row in a simple query/dashboard

Stretch Prototype

  • simulate quota exhaustion

  • revoke key and confirm access fails

  • mark pool disabled and confirm model returns unavailable

  • retry one alternate router target after failure

  • log request ID through gateway and router


19. Open Questions

  • Should quota exhaustion return 402 from the beginning, or use 403 during free beta?

  • Should V1 usage units be request-based, task-based, token-based, or weighted by model?

  • Should the gateway synchronously write usage_events, or write to a queue first?

  • What is the minimum router health signal needed for pool selection?

  • Should customer responses include router/request debug fields under a debug flag?

  • How much Unkey usage analytics should be mirrored into Supabase?

  • Do we need per-project keys in addition to per-organization keys for V1?


20. External References

Useful external references for later implementation comparison:

  • Supabase Auth

  • Supabase Row Level Security

  • Clerk docs

  • Clerk Organizations

  • Unkey docs

  • Unkey rate limiting

  • NGINX HTTP load balancing

  • NGINX reverse proxy

  • Stripe stablecoin payments

  • Coinbase Commerce docs

(These can be expanded into actual linked references in the documentation site if desired.)


21. Relationship To Other Specs

This spec consolidates and operationalizes:

  • 03-api-key-management.md

  • 04-free-plan-rate-limits-and-gateway.md

  • 05-portal-backend-control-plane.md

  • 06-data-model-and-durable-state.md

  • 07-router-pools-and-model-product-layer.md

  • 08-usage-metering-and-billing.md

  • 09-admin-and-ops.md

It should be treated as the main runtime tracking document for the Portal Gateway.


22. Working Summary

Portal V1 Gateway should be treated as the runtime product boundary for the hosted Portal API.

It exists to:

  • authenticate and authorize requests

  • apply rate limits and quota semantics

  • resolve product model aliases to router pools

  • route requests into managed Cortensor infrastructure

  • emit durable usage and request data

  • present stable product-facing responses and errors

In one sentence:

The Portal Gateway is the runtime layer that turns Portal-issued credentials, product-level models, and simple allowance rules into a stable hosted Cortensor API in front of managed router pools.

Last updated