Skip to content

fix: prevent panics on multi-byte UTF-8 string slicing#57

Closed
wunitb wants to merge 2 commits intoRightNow-AI:mainfrom
wunitb:fix/utf8-string-slicing-panic
Closed

fix: prevent panics on multi-byte UTF-8 string slicing#57
wunitb wants to merge 2 commits intoRightNow-AI:mainfrom
wunitb:fix/utf8-string-slicing-panic

Conversation

@wunitb
Copy link

@wunitb wunitb commented Feb 27, 2026

Summary

  • Replace 12 instances of direct byte-offset string slicing (&s[..N]) with floor_char_boundary(N) across 8 files
  • Prevents panics when the byte offset falls inside a multi-byte character (Thai, Chinese, Japanese, emoji, etc.)
  • floor_char_boundary() is stable since Rust 1.80 — finds the nearest valid char boundary ≤ the given byte index

Problem

Slicing a &str at an arbitrary byte offset (e.g. &s[..500]) panics with byte index N is not a char boundary when the offset lands inside a multi-byte UTF-8 character. This affects all non-ASCII languages:

  • Thai: 3 bytes per character
  • Chinese/Japanese: 3 bytes per character
  • Emoji: 4 bytes per character

Files Changed

Crate File Instances
openfang-kernel kernel.rs 2
openfang-runtime prompt_builder.rs 1
openfang-runtime compactor.rs 2
openfang-runtime loop_guard.rs 1
openfang-runtime llm_errors.rs 1
openfang-runtime workspace_context.rs 1
openfang-runtime tts.rs 2
openfang-runtime image_gen.rs 1
openfang-memory session.rs 1
openfang-api channel_bridge.rs 1

Test plan

  • cargo build --workspace --lib compiles successfully
  • Verified with Thai language input via Telegram bot — no panics
  • cargo test --workspace all tests pass

🤖 Generated with Claude Code

UNITB MACHINE and others added 2 commits February 28, 2026 03:56
…strings

Direct byte-offset slicing like `&s[..500]` panics when the offset
falls inside a multi-byte character (e.g. Thai, Chinese, emoji — each
3-4 bytes). Replace all 12 instances across 8 files with
`floor_char_boundary()` (stable since Rust 1.80) which finds the
nearest valid char boundary <= the given byte index.

Affected crates:
- openfang-kernel (kernel.rs)
- openfang-runtime (prompt_builder, compactor, loop_guard, llm_errors,
  workspace_context, tts, image_gen)
- openfang-memory (session.rs)
- openfang-api (channel_bridge.rs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. split_message() in types.rs panics when max_len (4096 for Telegram)
   falls inside a multi-byte character. Use floor_char_boundary() to
   find a safe split point before searching for newlines.

2. /agent command in bridge.rs only takes args[0], so "/agent Data Analyst"
   becomes "/agent Data" — join all args to support names with spaces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wunitb wunitb closed this Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant