MEDIA:<path> tags for .md/.json/.yaml/.xml/.html and other document
extensions were silently dropped. extract_media() carried a narrow
extension allowlist that omitted them, while extract_local_files()
had a broad one. The dispatch sites then ran an unconditional
re.sub(r'MEDIA:\\s*\\S+', '') that stripped the tag from the body even
when extract_media had not matched it — so extract_local_files (broad
list) ran on text where the path was already gone, and the file was
delivered by neither path.
- Add MEDIA_DELIVERY_EXTS in gateway/platforms/base.py as the single
source of truth; extract_media and extract_local_files both derive
their extension set from it (no more drift).
- Replace the loose MEDIA cleanup at the non-streaming dispatch site
(base.py) and the streaming consumer (stream_consumer.py) with the
shared, extension-anchored MEDIA_TAG_CLEANUP_RE. A MEDIA: tag with an
unknown extension is left in the body so the bare-path detector can
still pick it up instead of being black-holed.
- Chain cleaned text through extract_media -> extract_images ->
extract_local_files in run.py's post-stream media delivery (it was
dropping the cleaned text and rescanning raw text with MEDIA: tags).
- Regression tests covering both halves: previously-dropped extensions
now extract, and unknown-ext paths survive the cleanup.
Consolidates the MEDIA extension-allowlist PR cluster.
Co-authored-by: Bartok9 <259807879+Bartok9@users.noreply.github.com>
Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com>
Co-authored-by: Kyzcreig <9063726+Kyzcreig@users.noreply.github.com>