Files
hermes-agent/gateway/platforms
Teknium 781604ce4c fix(gateway): unify MEDIA: extraction extension set + close the unknown-ext black hole (#34517) (#34844)
MEDIA:<path> tags for .md/.json/.yaml/.xml/.html and other document
extensions were silently dropped. extract_media() carried a narrow
extension allowlist that omitted them, while extract_local_files()
had a broad one. The dispatch sites then ran an unconditional
re.sub(r'MEDIA:\\s*\\S+', '') that stripped the tag from the body even
when extract_media had not matched it — so extract_local_files (broad
list) ran on text where the path was already gone, and the file was
delivered by neither path.

- Add MEDIA_DELIVERY_EXTS in gateway/platforms/base.py as the single
  source of truth; extract_media and extract_local_files both derive
  their extension set from it (no more drift).
- Replace the loose MEDIA cleanup at the non-streaming dispatch site
  (base.py) and the streaming consumer (stream_consumer.py) with the
  shared, extension-anchored MEDIA_TAG_CLEANUP_RE. A MEDIA: tag with an
  unknown extension is left in the body so the bare-path detector can
  still pick it up instead of being black-holed.
- Chain cleaned text through extract_media -> extract_images ->
  extract_local_files in run.py's post-stream media delivery (it was
  dropping the cleaned text and rescanning raw text with MEDIA: tags).
- Regression tests covering both halves: previously-dropped extensions
  now extract, and unknown-ext paths survive the cleanup.

Consolidates the MEDIA extension-allowlist PR cluster.

Co-authored-by: Bartok9 <259807879+Bartok9@users.noreply.github.com>
Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com>
Co-authored-by: Kyzcreig <9063726+Kyzcreig@users.noreply.github.com>
2026-05-29 13:24:01 -07:00
..
2026-04-26 18:50:49 -07:00