fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072
The opencode-go relay defaults max_tokens to 262144 when none is sent, but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every request 400s with "max_tokens is too large: 262144" before the agent can do anything. Add a get_max_tokens(model) hook on ProviderProfile (default returns default_max_tokens) so profiles fronting multiple upstreams can vary the cap per-model. Wire chat_completions transport through the hook. Override on OpenCodeGoProfile with mimo-v2.5-pro=131072. Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm, qwen, minimax, other mimo variants) unchanged.
This commit is contained in:
@ -129,6 +129,20 @@ class ProviderProfile:
|
||||
"""
|
||||
return {}, {}
|
||||
|
||||
def get_max_tokens(self, model: str | None) -> int | None:
|
||||
"""Return the default max_tokens cap for *model*.
|
||||
|
||||
Overrideable hook for providers that need per-model output caps —
|
||||
e.g. a relay that fronts several upstream backends, each with a
|
||||
different completion-token limit. The transport calls this when
|
||||
the user hasn't set an explicit max_tokens.
|
||||
|
||||
Default: return self.default_max_tokens (the static profile field),
|
||||
ignoring the model name. Override in a subclass to vary the cap
|
||||
per-model.
|
||||
"""
|
||||
return self.default_max_tokens
|
||||
|
||||
def fetch_models(
|
||||
self,
|
||||
*,
|
||||
|
||||
Reference in New Issue
Block a user