Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here' (#35048)

Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.

Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20

- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
  guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
  compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
  user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
  changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.

Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
This commit is contained in:
Teknium
2026-05-29 17:49:15 -07:00
committed by GitHub
parent 54aa4db1de
commit bcc8301000
7 changed files with 650 additions and 19 deletions

70
cli.py
View File

@ -9947,10 +9947,20 @@ class HermesCLI:
def _manual_compress(self, cmd_original: str = ""):
"""Manually trigger context compression on the current conversation.
Accepts an optional focus topic: ``/compress <focus>`` guides the
summariser to preserve information related to *focus* while being
more aggressive about discarding everything else. Inspired by
Claude Code's ``/compact <focus>`` feature.
Two modes:
* ``/compress [<focus>]`` — compress the *whole* history. An
optional focus topic guides the summariser to preserve
information related to *focus* while being more aggressive
about discarding everything else. Inspired by Claude Code's
``/compact <focus>`` feature.
* ``/compress here [N]`` — boundary-aware compression. Summarize
everything *except* the most recent ``N`` exchanges (default
2), which are preserved verbatim. Inspired by Claude Code's
Rewind "Summarize up to here" action (v2.1.139, May 2026,
https://code.claude.com/docs/en/whats-new/2026-w20). Lets the
user pick the compression boundary instead of leaving it to
the automatic token-budget heuristic.
"""
if not self.conversation_history or len(self.conversation_history) < 4:
print("(._.) Not enough conversation to compress (need at least 4 messages).")
@ -9964,12 +9974,21 @@ class HermesCLI:
print("(._.) Compression is disabled in config.")
return
# Extract optional focus topic from the command (e.g. "/compress database schema")
focus_topic = ""
from hermes_cli.partial_compress import (
parse_partial_compress_args,
rejoin_compressed_head_and_tail,
split_history_for_partial_compress,
)
# Args after the command word (e.g. "/compress here 3" -> "here 3").
raw_args = ""
if cmd_original:
parts = cmd_original.strip().split(None, 1)
if len(parts) > 1:
focus_topic = parts[1].strip()
_parts = cmd_original.strip().split(None, 1)
if len(_parts) > 1:
raw_args = _parts[1].strip()
partial, keep_last, focus_topic = parse_partial_compress_args(raw_args)
focus_topic = focus_topic or ""
original_count = len(self.conversation_history)
with self._busy_command("Compressing context..."):
@ -9977,6 +9996,22 @@ class HermesCLI:
from agent.model_metadata import estimate_request_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
# Boundary-aware split: only the head is summarized; the
# most recent `keep_last` exchanges ride along verbatim.
tail: list = []
head = original_history
if partial:
head, tail = split_history_for_partial_compress(
original_history, keep_last
)
if not tail:
# Split degenerated (everything would be kept, or
# no head left to compress). Fall back to full
# compression so the user still gets an action.
partial = False
head = original_history
# Include system prompt + tool schemas in the estimate —
# a transcript-only number understates real request pressure
# and can even appear to grow after compression because a
@ -9988,7 +10023,11 @@ class HermesCLI:
system_prompt=_sys_prompt,
tools=_tools,
)
if focus_topic:
if partial:
print(f"🗜️ Summarizing up to here: compressing {len(head)} of "
f"{original_count} messages (~{approx_tokens:,} tokens), "
f"keeping last {keep_last} exchange(s) verbatim...")
elif focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
@ -10001,12 +10040,21 @@ class HermesCLI:
# which already contain the agent identity — resulting in the
# identity block appearing twice (issue #15281).
compressed, _ = self.agent._compress_context(
original_history,
head,
None,
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
force=True,
)
# Re-append the verbatim tail after the compressed head.
# The split guarantees `tail` begins on a user turn, so the
# compressed-head -> tail boundary is normally valid
# (the head's compressed output ends on assistant/tool).
# rejoin_compressed_head_and_tail() additionally guards the
# seam against any illegal user->user / assistant->assistant
# adjacency, defending provider role-alternation rules.
if partial and tail:
compressed = rejoin_compressed_head_and_tail(compressed, tail)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the

View File

@ -12449,6 +12449,12 @@ class GatewayRunner:
Accepts an optional focus topic: ``/compress <focus>`` guides the
summariser to preserve information related to *focus* while being
more aggressive about discarding everything else.
Also accepts the boundary-aware form ``/compress here [N]``:
summarize everything except the most recent ``N`` exchanges
(default 2), kept verbatim. Inspired by Claude Code's Rewind
"Summarize up to here" action (v2.1.139, May 2026,
https://code.claude.com/docs/en/whats-new/2026-w20).
"""
source = event.source
session_entry = self.session_store.get_or_create_session(source)
@ -12457,8 +12463,15 @@ class GatewayRunner:
if not history or len(history) < 4:
return t("gateway.compress.not_enough")
# Extract optional focus topic from command args
focus_topic = (event.get_command_args() or "").strip() or None
# Parse args: either a focus topic (full compress) or the
# boundary-aware "here [N]" form (partial compress).
from hermes_cli.partial_compress import (
parse_partial_compress_args,
rejoin_compressed_head_and_tail,
split_history_for_partial_compress,
)
_raw_args = (event.get_command_args() or "").strip()
partial, keep_last, focus_topic = parse_partial_compress_args(_raw_args)
try:
from run_agent import AIAgent
@ -12479,6 +12492,19 @@ class GatewayRunner:
if m.get("role") in {"user", "assistant"} and m.get("content")
]
# Boundary-aware split: only the head is summarized; the most
# recent `keep_last` exchanges are preserved verbatim. The
# split snaps the tail to a user-turn start so the rejoined
# transcript keeps role alternation valid.
tail: list = []
head = msgs
if partial:
head, tail = split_history_for_partial_compress(msgs, keep_last)
if not tail:
# Degenerate split — fall back to full compression.
partial = False
head = msgs
tmp_agent = AIAgent(
**runtime_kwargs,
model=model,
@ -12502,15 +12528,20 @@ class GatewayRunner:
)
compressor = tmp_agent.context_compressor
if not compressor.has_content_to_compress(msgs):
if not compressor.has_content_to_compress(head):
return t("gateway.compress.nothing_to_do")
loop = asyncio.get_running_loop()
compressed, _ = await loop.run_in_executor(
None,
lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens, focus_topic=focus_topic, force=True)
lambda: tmp_agent._compress_context(head, "", approx_tokens=approx_tokens, focus_topic=focus_topic, force=True)
)
# Re-append the verbatim tail after the compressed head,
# guarding the seam against illegal role adjacency.
if partial and tail:
compressed = rejoin_compressed_head_and_tail(compressed, tail)
# _compress_context already calls end_session() on the old session
# (preserving its full transcript in SQLite) and creates a new
# session_id for the continuation. Write the compressed messages

View File

@ -85,8 +85,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="<platform>", cli_only=True),
CommandDef("branch", "Branch the current session (explore a different path)", "Session",
aliases=("fork",), args_hint="[name]"),
CommandDef("compress", "Manually compress conversation context", "Session",
args_hint="[focus topic]"),
CommandDef("compress", "Compress conversation context (add 'here [N]' to keep recent N turns)", "Session",
args_hint="[here [N] | focus topic]"),
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",

View File

@ -0,0 +1,235 @@
"""Boundary-aware partial compression — "summarize up to here".
Inspired by Claude Code's Rewind menu "Summarize up to here" action
(v2.1.139v2.1.142, Week 20, May 2026):
https://code.claude.com/docs/en/whats-new/2026-w20
Hermes already has ``/compress`` (full-history compaction) and an
automatic token-budget tail-protection heuristic inside
``ContextCompressor``. What was missing is *user-chosen* boundary
control: "fold everything before this point into a summary, but keep
my most recent N exchanges exactly as they are." That is the value of
the Claude Code feature — the user decides the compression boundary
instead of leaving it to the token-budget heuristic.
This module owns the pure, side-effect-free split logic so both the
CLI (``cli.py::_manual_compress``) and the gateway
(``gateway/run.py::_handle_compress_command``) share one
implementation. The slash-command surfaces handle compression of the
*head* via the existing ``_compress_context`` pipeline (preserving all
the session-rotation / lock / memory-notify machinery) and then
re-append the verbatim *tail* returned here.
Design notes / invariants honored:
* **Role alternation.** The compressed head ends with summary/handoff
content (assistant- or user-role, possibly a trailing todo snapshot).
The verbatim tail must begin with a ``user`` message so the rejoined
history keeps the user↔assistant alternation that providers validate.
:func:`split_history_for_partial_compress` snaps the tail boundary
backwards to the nearest ``user`` turn so the rejoin is always legal.
* **No silent context mutation.** This is a manual, user-invoked
action. It rotates the session exactly like ``/compress`` does (via
the caller), so the prompt-cache reset is explicit and expected, not
silent.
* **Conservative defaults.** ``keep_last`` counts *exchanges* (a user
turn plus its following assistant/tool turns), defaulting to 2. The
split never compresses if doing so would leave nothing in the head.
"""
from __future__ import annotations
from typing import Any, Dict, List, Optional, Tuple
#: Default number of recent exchanges to preserve verbatim when the user
#: runs ``/compress here`` without an explicit count.
DEFAULT_KEEP_LAST = 2
#: Hard ceiling so a fat-fingered ``/compress here 9999`` doesn't turn
#: into a no-op surprise — clamp instead.
MAX_KEEP_LAST = 100
def parse_partial_compress_args(
raw_args: str,
) -> Tuple[bool, int, Optional[str]]:
"""Parse the argument string after ``/compress``.
Recognizes the boundary-aware forms:
* ``here`` → partial compress, keep ``DEFAULT_KEEP_LAST``
* ``here 4`` → partial compress, keep 4 exchanges
* ``--keep 4`` → partial compress, keep 4 exchanges
* ``up to here`` → alias for ``here`` (matches Claude Code's
menu label "Summarize up to here")
Anything else is treated as a focus topic for the existing full
``/compress <focus>`` behavior.
Returns ``(partial, keep_last, focus_topic)``:
* ``partial`` — True when a boundary-aware form was requested.
* ``keep_last`` — exchanges to preserve verbatim (only meaningful
when ``partial`` is True).
* ``focus_topic`` — focus string for full compression, or None.
Always None when ``partial`` is True (the two modes are exclusive;
a focused partial compress is not a documented Claude Code
behavior and would muddy the UX).
"""
text = (raw_args or "").strip()
if not text:
return False, DEFAULT_KEEP_LAST, None
lowered = text.lower()
# Normalize the "up to here" alias to "here".
if lowered.startswith("up to here"):
lowered = lowered[len("up to ") :]
text = text[len("up to ") :]
tokens = lowered.split()
# Form: here [N]
if tokens and tokens[0] == "here":
keep = DEFAULT_KEEP_LAST
if len(tokens) >= 2:
keep = _coerce_keep(tokens[1])
return True, keep, None
# Form: --keep N (or --keep=N)
if tokens and tokens[0] in ("--keep", "-k") and len(tokens) >= 2:
return True, _coerce_keep(tokens[1]), None
if tokens and tokens[0].startswith("--keep="):
return True, _coerce_keep(tokens[0].split("=", 1)[1]), None
# Otherwise: full compression with this as the focus topic.
return False, DEFAULT_KEEP_LAST, text or None
def _coerce_keep(value: str) -> int:
"""Parse a keep-count token, clamping to [1, MAX_KEEP_LAST]."""
try:
n = int(value)
except (TypeError, ValueError):
return DEFAULT_KEEP_LAST
if n < 1:
return 1
if n > MAX_KEEP_LAST:
return MAX_KEEP_LAST
return n
def split_history_for_partial_compress(
history: List[Dict[str, Any]],
keep_last: int,
) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
"""Split ``history`` into ``(head, tail)`` for partial compression.
``head`` is the earlier portion that will be summarized; ``tail`` is
the most recent ``keep_last`` exchanges, preserved verbatim.
An *exchange* is counted by ``user``-role messages: keeping N
exchanges means keeping everything from the Nth-most-recent ``user``
message onward. This guarantees the tail starts on a ``user`` turn,
so when the caller rejoins ``compressed_head + tail`` the
user↔assistant alternation stays valid (the compressed head's
trailing content is followed by a fresh user turn).
Returns ``(head, tail)``. If the split would leave the head empty
(not enough history to compress meaningfully), returns
``(history, [])`` — signaling the caller to fall back to full
compression or report "nothing to do".
"""
if keep_last < 1:
keep_last = 1
n = len(history)
if n == 0:
return [], []
# Walk backwards collecting the indices of the most recent `keep_last`
# user-message starts. The tail begins at the earliest such index.
user_starts: List[int] = []
for idx in range(n - 1, -1, -1):
if history[idx].get("role") == "user":
user_starts.append(idx)
if len(user_starts) >= keep_last:
break
if not user_starts:
# No user turns at all (degenerate) — nothing sensible to keep
# as a "recent exchange"; treat as full compression.
return list(history), []
boundary = user_starts[-1] # earliest of the kept user starts
head = history[:boundary]
tail = history[boundary:]
# If everything is in the tail (nothing left to compress), signal the
# caller to fall back to full compression rather than producing a
# no-op that rotates the session for no benefit.
if not head:
return list(history), []
return head, tail
def rejoin_compressed_head_and_tail(
compressed_head: List[Dict[str, Any]],
tail: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Concatenate a compressed head with the verbatim tail, defending
the seam against an illegal user→user / assistant→assistant adjacency.
In normal operation the compressed head ends with the head's own
protected verbatim tail (the ``ContextCompressor`` always preserves a
recent window), which terminates on an ``assistant``/``tool`` turn —
so ``assistant → user`` at the seam is already valid. But the head
compressor's exact output shape is not contractually guaranteed (a
plugin context engine could return something that ends on a ``user``
turn, or a degenerate single-summary message). Rather than trust the
seam, this helper inspects the boundary and, if the last head message
and the first tail message share a ``user``/``assistant`` role, folds
the tail's first message content onto the head's last message so the
rejoined list never violates provider role-alternation rules.
``tool`` messages are left alone — consecutive ``tool`` entries are
the one legal repetition (parallel tool results).
"""
if not tail:
return list(compressed_head)
if not compressed_head:
return list(tail)
head = list(compressed_head)
rest = list(tail)
last = head[-1]
first = rest[0]
last_role = last.get("role")
first_role = first.get("role")
if last_role == first_role and last_role in ("user", "assistant"):
# Illegal adjacency. Merge the tail's first message text into the
# head's last message so alternation is preserved. Only string
# contents are merged inline; structured/multimodal contents fall
# back to dropping the redundant standalone (the content is
# preserved by concatenation when both are strings).
last_content = last.get("content")
first_content = first.get("content")
if isinstance(last_content, str) and isinstance(first_content, str):
merged = dict(last)
merged["content"] = f"{last_content}\n\n{first_content}"
head[-1] = merged
rest = rest[1:]
else:
# Can't safely string-merge multimodal content. Insert a
# minimal bridging turn so the seam alternates rather than
# losing data.
bridge_role = "assistant" if first_role == "user" else "user"
head.append({"role": bridge_role, "content": ""})
return head + rest

View File

@ -0,0 +1,119 @@
"""Tests for /compress here [N] — boundary-aware partial compression.
Verifies the CLI handler (_manual_compress) splits the history, compresses
only the head, and re-appends the verbatim tail. Inspired by Claude Code's
Rewind "Summarize up to here" action (v2.1.139, May 2026).
"""
from unittest.mock import MagicMock, patch
from tests.cli.test_cli_init import _make_cli
def _make_history() -> list[dict[str, str]]:
# 8 messages = 4 exchanges.
h: list[dict[str, str]] = []
for i in range(4):
h.append({"role": "user", "content": f"u{i}"})
h.append({"role": "assistant", "content": f"a{i}"})
return h
def _wire_agent(shell, compressed_head):
shell.agent = MagicMock()
shell.agent.compression_enabled = True
shell.agent._cached_system_prompt = ""
shell.agent.session_id = None
shell.agent.tools = None
shell.agent._compress_context.return_value = (compressed_head, "")
def test_compress_here_compresses_head_only(capsys):
"""/compress here 2 passes only the head to _compress_context."""
shell = _make_cli()
history = _make_history()
shell.conversation_history = history
# Pretend compression collapses the head into a single summary message.
summary = [{"role": "user", "content": "[summary of earlier turns]"}]
_wire_agent(shell, summary)
with patch("agent.model_metadata.estimate_request_tokens_rough", return_value=100):
shell._manual_compress("/compress here 2")
# _compress_context should have been called with the HEAD only
# (everything before the last 2 user-starts = first 4 messages).
shell.agent._compress_context.assert_called_once()
call = shell.agent._compress_context.call_args
passed_head = call.args[0]
assert passed_head == history[:4]
# focus_topic must be None in partial mode (modes are exclusive).
assert call.kwargs.get("focus_topic") is None
def test_compress_here_reappends_verbatim_tail(capsys):
"""The most recent exchanges are preserved verbatim after the summary."""
shell = _make_cli()
history = _make_history()
shell.conversation_history = history
# Head compresses to an assistant-role summary so the seam
# (assistant -> user tail) is already valid — tail rides along whole.
summary = [{"role": "assistant", "content": "[summary]"}]
_wire_agent(shell, summary)
with patch("agent.model_metadata.estimate_request_tokens_rough", return_value=100):
shell._manual_compress("/compress here 2")
# Result = compressed head + verbatim tail (last 2 exchanges).
assert shell.conversation_history == summary + history[4:]
# Tail boundary keeps role alternation valid (tail starts on user).
assert history[4]["role"] == "user"
# No consecutive same-role user/assistant messages anywhere.
roles = [m["role"] for m in shell.conversation_history
if m["role"] in ("user", "assistant")]
assert all(roles[i] != roles[i + 1] for i in range(len(roles) - 1))
def test_compress_here_banner_mentions_summarizing_up_to_here(capsys):
shell = _make_cli()
history = _make_history()
shell.conversation_history = history
_wire_agent(shell, [{"role": "user", "content": "[summary]"}])
with patch("agent.model_metadata.estimate_request_tokens_rough", return_value=100):
shell._manual_compress("/compress here")
out = capsys.readouterr().out
assert "Summarizing up to here" in out
assert "verbatim" in out
def test_bare_compress_still_full(capsys):
"""/compress with no args compresses the whole history (full mode)."""
shell = _make_cli()
history = _make_history()
shell.conversation_history = history
_wire_agent(shell, list(history))
with patch("agent.model_metadata.estimate_request_tokens_rough", return_value=100):
shell._manual_compress("/compress")
call = shell.agent._compress_context.call_args
# Full mode passes the entire history as the head.
assert call.args[0] == history
out = capsys.readouterr().out
assert "Summarizing up to here" not in out
def test_focus_still_works(capsys):
"""/compress <focus> keeps the existing focus behavior."""
shell = _make_cli()
history = _make_history()
shell.conversation_history = history
_wire_agent(shell, list(history))
with patch("agent.model_metadata.estimate_request_tokens_rough", return_value=100):
shell._manual_compress("/compress database schema")
call = shell.agent._compress_context.call_args
assert call.args[0] == history
assert call.kwargs.get("focus_topic") == "database schema"

View File

@ -0,0 +1,198 @@
"""Tests for hermes_cli.partial_compress — the pure split/parse helpers
behind ``/compress here [N]`` (boundary-aware "summarize up to here").
Inspired by Claude Code's Rewind "Summarize up to here" action.
"""
from hermes_cli.partial_compress import (
DEFAULT_KEEP_LAST,
MAX_KEEP_LAST,
parse_partial_compress_args,
rejoin_compressed_head_and_tail,
split_history_for_partial_compress,
)
def _history(n_pairs: int) -> list[dict[str, str]]:
"""Build n_pairs of (user, assistant) exchanges."""
h: list[dict[str, str]] = []
for i in range(n_pairs):
h.append({"role": "user", "content": f"u{i}"})
h.append({"role": "assistant", "content": f"a{i}"})
return h
# ── parse_partial_compress_args ──────────────────────────────────────
def test_empty_args_is_full_compress():
partial, keep, focus = parse_partial_compress_args("")
assert partial is False
assert keep == DEFAULT_KEEP_LAST
assert focus is None
def test_here_defaults_keep_last():
partial, keep, focus = parse_partial_compress_args("here")
assert partial is True
assert keep == DEFAULT_KEEP_LAST
assert focus is None
def test_here_with_count():
partial, keep, focus = parse_partial_compress_args("here 4")
assert partial is True
assert keep == 4
assert focus is None
def test_up_to_here_alias():
partial, keep, focus = parse_partial_compress_args("up to here 3")
assert partial is True
assert keep == 3
assert focus is None
def test_keep_flag_forms():
for arg in ("--keep 5", "-k 5", "--keep=5"):
partial, keep, focus = parse_partial_compress_args(arg)
assert partial is True, arg
assert keep == 5, arg
assert focus is None, arg
def test_focus_topic_when_not_boundary_form():
partial, keep, focus = parse_partial_compress_args("database schema")
assert partial is False
assert focus == "database schema"
def test_here_count_clamped_low_and_high():
_, keep_low, _ = parse_partial_compress_args("here 0")
assert keep_low == 1
_, keep_high, _ = parse_partial_compress_args(f"here {MAX_KEEP_LAST + 50}")
assert keep_high == MAX_KEEP_LAST
def test_here_garbage_count_falls_back_to_default():
partial, keep, focus = parse_partial_compress_args("here lots")
assert partial is True
assert keep == DEFAULT_KEEP_LAST
# ── split_history_for_partial_compress ───────────────────────────────
def test_split_keeps_last_n_exchanges():
h = _history(5) # 10 messages: u0 a0 u1 a1 u2 a2 u3 a3 u4 a4
head, tail = split_history_for_partial_compress(h, keep_last=2)
# Keep last 2 user-starts → tail begins at u3 (index 6).
assert tail == h[6:]
assert head == h[:6]
# Tail must begin on a user turn (role-alternation safety).
assert tail[0]["role"] == "user"
def test_split_default_keep():
h = _history(4) # 8 messages
head, tail = split_history_for_partial_compress(h, keep_last=DEFAULT_KEEP_LAST)
assert tail[0]["role"] == "user"
assert head + tail == h
assert len(head) > 0
def test_split_tail_always_starts_on_user():
# Tool messages interleaved — tail must still snap to a user turn.
h = [
{"role": "user", "content": "u0"},
{"role": "assistant", "content": "a0"},
{"role": "user", "content": "u1"},
{"role": "assistant", "content": "a1"},
{"role": "tool", "content": "t1"},
{"role": "assistant", "content": "a1b"},
{"role": "user", "content": "u2"},
{"role": "assistant", "content": "a2"},
]
head, tail = split_history_for_partial_compress(h, keep_last=1)
assert tail[0]["role"] == "user"
assert tail[0]["content"] == "u2"
assert head + tail == h
def test_split_degenerate_returns_no_tail():
# keep_last larger than the number of exchanges → nothing to compress.
h = _history(2) # 4 messages, 2 user turns
head, tail = split_history_for_partial_compress(h, keep_last=5)
# Boundary lands at the first user turn → head empty → signal full.
assert tail == []
assert head == h
def test_split_empty_history():
head, tail = split_history_for_partial_compress([], keep_last=2)
assert head == []
assert tail == []
def test_split_rejoin_preserves_all_messages():
h = _history(6)
head, tail = split_history_for_partial_compress(h, keep_last=3)
assert head + tail == h
# ── rejoin_compressed_head_and_tail (seam-alternation guard) ─────────
def _roles(msgs):
return [m["role"] for m in msgs if m["role"] in ("user", "assistant")]
def _no_consecutive_dupes(msgs):
r = _roles(msgs)
return all(r[i] != r[i + 1] for i in range(len(r) - 1))
def test_rejoin_valid_seam_assistant_then_user():
# Normal case: head ends on assistant, tail starts on user → valid.
head = [{"role": "user", "content": "[summary]"},
{"role": "assistant", "content": "ack"}]
tail = [{"role": "user", "content": "next"},
{"role": "assistant", "content": "reply"}]
out = rejoin_compressed_head_and_tail(head, tail)
assert out == head + tail
assert _no_consecutive_dupes(out)
def test_rejoin_user_user_seam_merges():
# Degenerate head ending on a user summary; tail starts on user.
head = [{"role": "user", "content": "[summary of head]"}]
tail = [{"role": "user", "content": "latest question"},
{"role": "assistant", "content": "answer"}]
out = rejoin_compressed_head_and_tail(head, tail)
assert _no_consecutive_dupes(out), out
# The two user messages were merged into one.
assert out[0]["content"] == "[summary of head]\n\nlatest question"
assert out[1] == {"role": "assistant", "content": "answer"}
def test_rejoin_assistant_assistant_seam_merges():
head = [{"role": "user", "content": "q"},
{"role": "assistant", "content": "head end"}]
tail = [{"role": "assistant", "content": "tail start"},
{"role": "user", "content": "u"}]
out = rejoin_compressed_head_and_tail(head, tail)
assert _no_consecutive_dupes(out), out
assert out[-2]["content"] == "head end\n\ntail start"
def test_rejoin_empty_tail_returns_head():
head = [{"role": "user", "content": "x"}]
assert rejoin_compressed_head_and_tail(head, []) == head
def test_rejoin_tool_seam_left_alone():
# tool->tool is the one legal repetition; don't merge.
head = [{"role": "user", "content": "q"}, {"role": "tool", "content": "t1"}]
tail = [{"role": "user", "content": "u"}]
out = rejoin_compressed_head_and_tail(head, tail)
assert out == head + tail

View File

@ -43,7 +43,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
| `/retry` | Retry the last message (resend to agent) |
| `/undo` | Remove the last user/assistant exchange |
| `/title` | Set a title for the current session (usage: /title My Session Name) |
| `/compress [focus topic]` | Manually compress conversation context (flush memories + summarize). Optional focus topic narrows what the summary preserves. |
| `/compress [here [N] \| focus topic]` | Manually compress conversation context (flush memories + summarize). `/compress here [N]` summarizes everything except the most recent N exchanges (default 2), kept verbatim — pick your own compression boundary. A focus topic narrows what a full summary preserves. |
| `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) |
| `/snapshot [create\|restore <id>\|prune]` (alias: `/snap`) | Create or restore state snapshots of Hermes config/state. `create [label]` saves a snapshot, `restore <id>` reverts to it, `prune [N]` removes old snapshots, or list all with no args. |
| `/stop` | Kill all running background processes |
@ -206,7 +206,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
| `/retry` | Retry the last message. |
| `/undo` | Remove the last exchange. |
| `/sethome` (alias: `/set-home`) | Mark the current chat as the platform home channel for deliveries. |
| `/compress [focus topic]` | Manually compress conversation context. Optional focus topic narrows what the summary preserves. |
| `/compress [here [N] \| focus topic]` | Manually compress conversation context. `/compress here [N]` keeps the most recent N exchanges (default 2) verbatim and summarizes the rest. A focus topic narrows what a full summary preserves. |
| `/topic [off\|help\|session-id]` | **Telegram DM only.** Manage user-managed multi-session topic mode. `/topic` enables it or shows status; `/topic off` disables it and clears bindings; `/topic help` shows usage; `/topic <session-id>` inside a topic restores a previous session. See [Multi-session DM mode](/user-guide/messaging/telegram#multi-session-dm-mode-topic). |
| `/title [name]` | Set or show the session title. |
| `/resume [name]` | Resume a previously named session. |