Commit Graph

17 Commits (8abce8a84dc93ddd474fb7067bcb13a8794805ea)

Author SHA1 Message Date
Ji f49297e2c1
fix: skip audio files from text extraction to prevent binary processing (#7475)
* fix: skip audio files from text extraction early

Audio files should not be processed through extractFileBlocks for text
extraction - they are handled by the dedicated audio transcription
capability (STT).

Previously, audio files were only skipped if they didn't "look like text"
(looksLikeUtf8Text check). This caused issues where some audio binary
data (e.g., long Telegram voice messages) could accidentally pass the
heuristic check and get processed as text content.

This fix:
1. Adds audio to the early skip alongside image/video (more efficient)
2. Removes the redundant secondary check that had the flawed condition

Fixes audio binary being incorrectly processed as text in Telegram and
other platforms.

* Media: skip binary media in file extraction (#7475) (thanks @AlexZhangji)

---------

Co-authored-by: Shakker <shakkerdroid@gmail.com>
2026-02-02 22:20:04 +00:00
VACInc b796f6ec01
Security: harden web tools and file parsing (#4058)
* feat: web content security wrapping + gkeep/simple-backup skills

* fix: harden web fetch + media text detection (#4058) (thanks @VACInc)

---------

Co-authored-by: VAC <vac@vacs-mac-mini.localdomain>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-01 15:23:25 -08:00
cpojer f06dd8df06
chore: Enable "experimentalSortImports" in Oxfmt and reformat all imorts. 2026-02-01 10:03:47 +09:00
cpojer 5ceff756e1
chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
Peter Steinberger 9a7160786a refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
Conroy Whitney c20035094d
fix: use & instead of <> in XML escaping test for Windows NTFS compatibility (#3750)
NTFS does not allow < or > in filenames, causing the XML filename
escaping test to fail on Windows CI with ENOENT.

Replace file<test>.txt with file&test.txt — & is valid on all platforms
and still requires XML escaping (&amp;), preserving the test's intent.

Fixes #3748
2026-01-29 05:46:50 +00:00
Shakker b717724275
fix: add security hardening for media text attachments (#3700)
* fix: Prevent XML attribute injection by escaping special characters in file name and MIME type attributes.

* fix: text attachment MIME misclassification with security hardening (#3628)

- Fix CSV/TSV inference from content heuristics
- Add UTF-16 detection and BOM handling
- Add XML attribute escaping for file output (security)
- Add MIME override logging for auditability
- Add comprehensive test coverage for edge cases

Thanks @frankekn
2026-01-29 02:39:01 +00:00
Frank Yang cb18ce7a85
Fix text attachment MIME misclassification (#3628)
* Fix text file attachment detection

* Add file attachment extraction tests
2026-01-29 02:33:03 +00:00
Peter Steinberger 6d16a658e5 refactor: rename clawdbot to moltbot with legacy compat 2026-01-27 12:21:02 +00:00
Peter Steinberger a5adedea91 refactor: add aws-sdk auth mode and tighten provider auth 2026-01-20 08:28:40 +00:00
Peter Steinberger 7cebe7a506 style: run oxfmt 2026-01-17 08:00:05 +00:00
Peter Steinberger 6d969fe58e refactor: normalize media attachment selection 2026-01-17 07:38:11 +00:00
Peter Steinberger 5a1ff5b9e7 refactor: tune media understanding 2026-01-17 06:44:19 +00:00
Peter Steinberger e59d8c5436 style: oxfmt format 2026-01-17 05:48:56 +00:00
Peter Steinberger bc49c20434 fix: finalize inbound contexts 2026-01-17 05:06:39 +00:00
Peter Steinberger fcb7c9ff65 refactor: unify media understanding pipeline 2026-01-17 04:39:00 +00:00
Peter Steinberger 1b973f7506 feat: add inbound media understanding
Co-authored-by: Tristan Manchester <tmanchester96@gmail.com>
2026-01-17 03:54:46 +00:00