Cortensor Portal V1 Detailed Spec 09 - Admin / Ops
Status
Draft
Purpose
This document defines the internal operational needs around Portal V1.
Portal V1 does not need a large internal platform at launch, but it does need a small set of critical admin and ops capabilities so the hosted product can be supported safely.
This spec focuses on:
what internal actions must exist in V1
what can exist as backend-only controls first
what operational visibility is required to support a hosted API product
how admin/ops concerns relate to Portal backend, durable state, and router pools
1. Summary
Portal V1 should support a minimal but useful set of internal admin and operations actions.
At a minimum, the system should let operators:
disable or revoke a key
disable or pause an account
disable model access
inspect usage and request state
inspect key and plan state
drain or pause pool behavior later if needed
Some of these capabilities may exist first as:
backend/admin-only endpoints
internal scripts
database-backed support workflows
rather than polished internal UI.
The priority for V1 is operational safety, not building a large internal admin suite.
2. Goals
Portal V1 admin/ops design should aim to:
give the team enough operational control to support the hosted API
make abuse response possible
make support and debugging practical
keep product-layer control centralized
avoid needing direct router-level intervention for normal support tasks
support a clear path to richer internal admin tooling later
3. Design Principles
3.1 Minimal but real operational control
Portal V1 does not need a heavy internal platform, but it does need enough control so the team can safely operate a hosted API product.
That means:
the team can disable unsafe access quickly
the team can inspect why a user or key is failing
the team can reason about model/pool availability
the team can respond to abuse or misconfiguration without manual infrastructure surgery every time
3.2 Product-layer ownership
Admin and ops capabilities should live at the Portal product/control-plane layer, not only in router-native tooling.
That means Portal should own:
key state
account state
entitlement state
usage state
pool/product availability state
This keeps support and ops aligned with the actual customer-facing product.
3.3 Backend-first is acceptable in V1
Not every admin action needs a polished internal UI immediately.
It is acceptable for V1 if some controls first exist as:
backend APIs
internal scripts
support-only dashboard queries
controlled database-backed operational commands
What matters is that the capability exists and is safe to use.
4. Recommended V1 Admin Capabilities
4.1 Key Actions
Portal V1 should support the following key-related admin actions:
view key metadata
revoke key
inspect key status
Useful metadata to inspect:
key ID
key prefix
owner
status
created time
revoked time
last used later if available
Primary use cases:
compromised key
abusive key
support request from user
debugging why a key is not working
4.2 Account Actions
Portal V1 should support basic account-level operational controls:
pause account
inspect plan state
inspect quota / usage state
Primary use cases:
abuse mitigation
suspicious behavior
support workflows
confirming whether a user is blocked due to:
auth issue
revoked key
exhausted allowance
plan restriction
If orgs are enabled, this model may later extend to:
organization pause
org-level usage inspection
org-level entitlement checks
4.3 Model / Routing Actions
Portal V1 should support at least a minimal way to influence model availability:
disable model access
pause or drain a backend pool when needed
These actions may be implemented first as:
backend config changes
admin-only backend APIs
durable state flags interpreted by the Portal backend
Primary use cases:
model instability
backend degradation
temporary maintenance
controlled rollout/rollback of model availability
The Portal product surface should be able to respond quickly when a model family must be paused without requiring a redesign of the public API surface.
4.4 Request Inspection
Portal V1 should support at least lightweight request inspection for support and debugging.
This can include:
recent request history
recent usage events
request status summary
request IDs linked to:
account
key
model
pool if needed internally
Primary use cases:
support debugging
quota disputes
backend failure analysis
confirming whether a request was:
accepted
blocked
forwarded
completed
recorded in durable state
V1 does not need a powerful observability console, but some operational visibility is required.
5. V1 Delivery Guidance
Not every admin action needs a polished UI immediately.
For V1, a practical delivery approach is:
backend support first
lightweight internal visibility second
richer internal admin UI later
This keeps the team focused on shipping the customer-facing Portal product while still preserving operational safety.
Practical interpretation
For V1, some controls can exist as:
internal backend endpoints
SQL-backed internal views
admin scripts
support runbooks
before they become a full internal Portal admin console.
That is acceptable as long as:
the actions are safe
the actions are auditable
the team can use them reliably during support/ops incidents
6. Operational Priorities
The most important early admin/ops needs are likely:
revoke a problematic key
confirm why a user is blocked
inspect quota state
inspect whether a model or pool is currently available
These are the “must-have” operational capabilities because they directly affect:
support quality
abuse response
system stability
user trust
If scope is tight, these should be prioritized above richer internal reporting or admin polish.
7. Internal Surface Expectations
7.1 What should likely exist in V1
At minimum, the team should have a way to:
look up a user
look up a key
see plan/quota status
revoke a key
disable account access
inspect whether a model/pool is enabled or degraded
7.2 What can wait until later
These can likely be deferred or kept backend-only in early V1:
polished internal admin dashboard
complex analytics exploration
advanced search/filter tooling
large-scale incident management surfaces
full enterprise-style operational control plane
Portal V1 should aim for operational sufficiency, not internal platform completeness.
8. Relationship to Router Pools
Admin / Ops should be able to reason about model and pool state at the Portal layer, even if actual pool mechanics live below.
That means the Portal control plane should eventually know whether a pool is:
live
paused
draining
degraded
This does not mean the customer-facing UI needs to expose this. It means internal/admin workflows should be able to understand and act on it.
Examples:
disable a model because its pool is unhealthy
drain a pool before maintenance
pause traffic to a backend group
explain to support why a model is temporarily unavailable
9. Auditing & Safety
Because admin actions can directly affect customer access, they should be auditable.
Important actions to audit:
key revoked
account paused
model disabled
pool drained or paused
admin inspection actions later if needed
V1 does not need a highly advanced audit subsystem, but it should be possible to answer:
who changed what
when they changed it
what customer/product state was affected
This becomes more important as Portal usage grows.
10. Operational Modes in V1
A practical way to think about V1 admin/ops is in three layers:
10.1 Support Mode
Used for:
user issues
key problems
quota confusion
plan confusion
Needs:
key lookup
account lookup
usage / plan visibility
simple revoke / disable actions
10.2 Safety / Abuse Mode
Used for:
suspicious activity
runaway usage
malicious automation
compromised key or account behavior
Needs:
immediate key revoke
account pause
clear state propagation into gateway behavior
10.3 Infra Availability Mode
Used for:
backend pool issues
degraded model serving
temporary maintenance
Needs:
model disable
pool pause / drain
visibility into current availability state
These modes do not need separate tools in V1, but they are useful mental models for deciding what capabilities must exist.
11. Open Questions
Open questions for later refinement:
how much internal UI is needed at launch vs backend-only controls?
should pool drain and health visibility appear in the same admin surface?
how much request-log search is necessary for support?
how much admin audit logging is required in V1?
which actions must be immediate vs eventually consistent?
These should be answered with the V1 philosophy in mind:
enough to operate safely
not so much that the team overbuilds before launch
12. Relationship to Other Specs
This spec connects directly to:
05-portal-backend-control-plane.md06-data-model-and-durable-state.md07-router-pools-and-model-product-layer.md08-usage-metering-and-billing.md
It also depends indirectly on:
API key lifecycle rules
gateway enforcement semantics
durable request and usage state
model-to-pool mapping
13. Working Summary
Portal V1 should include a minimal but operationally meaningful admin/ops layer.
At minimum, the team needs the ability to:
revoke keys
pause or inspect accounts
inspect usage / request state
disable model access
reason about backend availability
Not all of this needs a polished internal UI in the first release. Backend capabilities and lightweight internal visibility are acceptable in V1 as long as they are safe and reliable.
In one sentence:
Portal V1 admin/ops should give the team just enough control to safely run and support a hosted API product, while keeping the heavier internal platform work for later.
Last updated