Cortensor Portal V1 Detailed Spec 09 - Admin / Ops

Status

Draft

Purpose

This document defines the internal operational needs around Portal V1.

Portal V1 does not need a large internal platform at launch, but it does need a small set of critical admin and ops capabilities so the hosted product can be supported safely.

This spec focuses on:

  • what internal actions must exist in V1

  • what can exist as backend-only controls first

  • what operational visibility is required to support a hosted API product

  • how admin/ops concerns relate to Portal backend, durable state, and router pools


1. Summary

Portal V1 should support a minimal but useful set of internal admin and operations actions.

At a minimum, the system should let operators:

  • disable or revoke a key

  • disable or pause an account

  • disable model access

  • inspect usage and request state

  • inspect key and plan state

  • drain or pause pool behavior later if needed

Some of these capabilities may exist first as:

  • backend/admin-only endpoints

  • internal scripts

  • database-backed support workflows

rather than polished internal UI.

The priority for V1 is operational safety, not building a large internal admin suite.


2. Goals

Portal V1 admin/ops design should aim to:

  • give the team enough operational control to support the hosted API

  • make abuse response possible

  • make support and debugging practical

  • keep product-layer control centralized

  • avoid needing direct router-level intervention for normal support tasks

  • support a clear path to richer internal admin tooling later


3. Design Principles

3.1 Minimal but real operational control

Portal V1 does not need a heavy internal platform, but it does need enough control so the team can safely operate a hosted API product.

That means:

  • the team can disable unsafe access quickly

  • the team can inspect why a user or key is failing

  • the team can reason about model/pool availability

  • the team can respond to abuse or misconfiguration without manual infrastructure surgery every time

3.2 Product-layer ownership

Admin and ops capabilities should live at the Portal product/control-plane layer, not only in router-native tooling.

That means Portal should own:

  • key state

  • account state

  • entitlement state

  • usage state

  • pool/product availability state

This keeps support and ops aligned with the actual customer-facing product.

3.3 Backend-first is acceptable in V1

Not every admin action needs a polished internal UI immediately.

It is acceptable for V1 if some controls first exist as:

  • backend APIs

  • internal scripts

  • support-only dashboard queries

  • controlled database-backed operational commands

What matters is that the capability exists and is safe to use.


4.1 Key Actions

Portal V1 should support the following key-related admin actions:

  • view key metadata

  • revoke key

  • inspect key status

Useful metadata to inspect:

  • key ID

  • key prefix

  • owner

  • status

  • created time

  • revoked time

  • last used later if available

Primary use cases:

  • compromised key

  • abusive key

  • support request from user

  • debugging why a key is not working


4.2 Account Actions

Portal V1 should support basic account-level operational controls:

  • pause account

  • inspect plan state

  • inspect quota / usage state

Primary use cases:

  • abuse mitigation

  • suspicious behavior

  • support workflows

  • confirming whether a user is blocked due to:

    • auth issue

    • revoked key

    • exhausted allowance

    • plan restriction

If orgs are enabled, this model may later extend to:

  • organization pause

  • org-level usage inspection

  • org-level entitlement checks


4.3 Model / Routing Actions

Portal V1 should support at least a minimal way to influence model availability:

  • disable model access

  • pause or drain a backend pool when needed

These actions may be implemented first as:

  • backend config changes

  • admin-only backend APIs

  • durable state flags interpreted by the Portal backend

Primary use cases:

  • model instability

  • backend degradation

  • temporary maintenance

  • controlled rollout/rollback of model availability

The Portal product surface should be able to respond quickly when a model family must be paused without requiring a redesign of the public API surface.


4.4 Request Inspection

Portal V1 should support at least lightweight request inspection for support and debugging.

This can include:

  • recent request history

  • recent usage events

  • request status summary

  • request IDs linked to:

    • account

    • key

    • model

    • pool if needed internally

Primary use cases:

  • support debugging

  • quota disputes

  • backend failure analysis

  • confirming whether a request was:

    • accepted

    • blocked

    • forwarded

    • completed

    • recorded in durable state

V1 does not need a powerful observability console, but some operational visibility is required.


5. V1 Delivery Guidance

Not every admin action needs a polished UI immediately.

For V1, a practical delivery approach is:

  1. backend support first

  2. lightweight internal visibility second

  3. richer internal admin UI later

This keeps the team focused on shipping the customer-facing Portal product while still preserving operational safety.

Practical interpretation

For V1, some controls can exist as:

  • internal backend endpoints

  • SQL-backed internal views

  • admin scripts

  • support runbooks

before they become a full internal Portal admin console.

That is acceptable as long as:

  • the actions are safe

  • the actions are auditable

  • the team can use them reliably during support/ops incidents


6. Operational Priorities

The most important early admin/ops needs are likely:

  • revoke a problematic key

  • confirm why a user is blocked

  • inspect quota state

  • inspect whether a model or pool is currently available

These are the “must-have” operational capabilities because they directly affect:

  • support quality

  • abuse response

  • system stability

  • user trust

If scope is tight, these should be prioritized above richer internal reporting or admin polish.


7. Internal Surface Expectations

7.1 What should likely exist in V1

At minimum, the team should have a way to:

  • look up a user

  • look up a key

  • see plan/quota status

  • revoke a key

  • disable account access

  • inspect whether a model/pool is enabled or degraded

7.2 What can wait until later

These can likely be deferred or kept backend-only in early V1:

  • polished internal admin dashboard

  • complex analytics exploration

  • advanced search/filter tooling

  • large-scale incident management surfaces

  • full enterprise-style operational control plane

Portal V1 should aim for operational sufficiency, not internal platform completeness.


8. Relationship to Router Pools

Admin / Ops should be able to reason about model and pool state at the Portal layer, even if actual pool mechanics live below.

That means the Portal control plane should eventually know whether a pool is:

  • live

  • paused

  • draining

  • degraded

This does not mean the customer-facing UI needs to expose this. It means internal/admin workflows should be able to understand and act on it.

Examples:

  • disable a model because its pool is unhealthy

  • drain a pool before maintenance

  • pause traffic to a backend group

  • explain to support why a model is temporarily unavailable


9. Auditing & Safety

Because admin actions can directly affect customer access, they should be auditable.

Important actions to audit:

  • key revoked

  • account paused

  • model disabled

  • pool drained or paused

  • admin inspection actions later if needed

V1 does not need a highly advanced audit subsystem, but it should be possible to answer:

  • who changed what

  • when they changed it

  • what customer/product state was affected

This becomes more important as Portal usage grows.


10. Operational Modes in V1

A practical way to think about V1 admin/ops is in three layers:

10.1 Support Mode

Used for:

  • user issues

  • key problems

  • quota confusion

  • plan confusion

Needs:

  • key lookup

  • account lookup

  • usage / plan visibility

  • simple revoke / disable actions

10.2 Safety / Abuse Mode

Used for:

  • suspicious activity

  • runaway usage

  • malicious automation

  • compromised key or account behavior

Needs:

  • immediate key revoke

  • account pause

  • clear state propagation into gateway behavior

10.3 Infra Availability Mode

Used for:

  • backend pool issues

  • degraded model serving

  • temporary maintenance

Needs:

  • model disable

  • pool pause / drain

  • visibility into current availability state

These modes do not need separate tools in V1, but they are useful mental models for deciding what capabilities must exist.


11. Open Questions

Open questions for later refinement:

  • how much internal UI is needed at launch vs backend-only controls?

  • should pool drain and health visibility appear in the same admin surface?

  • how much request-log search is necessary for support?

  • how much admin audit logging is required in V1?

  • which actions must be immediate vs eventually consistent?

These should be answered with the V1 philosophy in mind:

  • enough to operate safely

  • not so much that the team overbuilds before launch


12. Relationship to Other Specs

This spec connects directly to:

  • 05-portal-backend-control-plane.md

  • 06-data-model-and-durable-state.md

  • 07-router-pools-and-model-product-layer.md

  • 08-usage-metering-and-billing.md

It also depends indirectly on:

  • API key lifecycle rules

  • gateway enforcement semantics

  • durable request and usage state

  • model-to-pool mapping


13. Working Summary

Portal V1 should include a minimal but operationally meaningful admin/ops layer.

At minimum, the team needs the ability to:

  • revoke keys

  • pause or inspect accounts

  • inspect usage / request state

  • disable model access

  • reason about backend availability

Not all of this needs a polished internal UI in the first release. Backend capabilities and lightweight internal visibility are acceptable in V1 as long as they are safe and reliable.

In one sentence:

Portal V1 admin/ops should give the team just enough control to safely run and support a hosted API product, while keeping the heavier internal platform work for later.

Last updated