feat(summarize): condense spoken output to a single sentence#2
feat(summarize): condense spoken output to a single sentence#2
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughRequire spoken summaries to be exactly one short sentence; add sentence-splitting, markdown/list marker stripping, whitespace collapsing, and sentence-clamping helpers; update prompt, README, and tests to reflect the single-sentence constraint. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43a15c1618
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/koko_cli/summarization.py`:
- Around line 12-43: Ruff flags literal en‑dash characters in the regex and
string literals; update the patterns and any affected strings (notably
SUMMARY_META_PREFIX_PATTERNS and SUMMARY_META_SENTENCE_EXACT) to replace the
literal en‑dash character (–) with an explicit escape like \u2013 (e.g. change
"[:\-–]" to "[:\-\u2013]") or use the Unicode name escape, or alternatively
append "# noqa: RUF001" to the specific lines; modify
SUMMARY_META_PREFIX_PATTERNS entries and any matching literals around them
accordingly so no literal en‑dash remains.
- Around line 163-174: The function strip_summary_meta_sentences currently
returns the original text when all sentences are detected as meta, allowing
meta-only outputs to pass through; update strip_summary_meta_sentences so that
after computing sentences = split_sentences(text) you return an empty string
when there are no sentences or when filtered_sentences (computed via
is_summary_meta_sentence) is empty instead of returning text or the original
sentences; locate the variables/functions split_sentences,
is_summary_meta_sentence and the filtered_sentences logic and replace the
fallback returns with "" (or raise a clear exception if preferred).
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/koko_cli/summarization.py`:
- Around line 15-27: The regex literals contain literal Unicode dash characters
(e.g. the "–" inside the character class like "[:\\-–]") which Ruff flags;
replace those literal Unicode dashes with explicit unicode escapes (e.g. use
"\u2013") or use the escaped form compatible with your string type (in raw
strings use "\\u2013", or convert to a normal string and use "\u2013") for each
regex in summarization.py (look for the patterns containing "[:\\-–]" and the
long multi-line patterns shown and also the occurrences around the other noted
spots). Ensure you update every occurrence (including the ones referenced at
lines ~50-57 and ~197) so no literal en-dash remains.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/koko_cli/summarization.py`:
- Around line 186-214: The trimming logic in trim_leading_meta_words currently
only returns a trimmed result when saw_meta_core and remove_until >= 2, which
skips cases where a single meta-core word (e.g., "Summary") is directly followed
by content and later gets treated as meta-only by strip_summary_meta_sentences;
change the condition to saw_meta_core and remove_until >= 1 and remove_until <
len(tokens) so a single leading meta-core token is stripped when followed by
content, keeping the existing token-splitting, normalization via
is_meta_or_filler_word and is_meta_core_word, and the lstrip behavior to produce
the trimmed text if non-empty.
Summary
Test Plan
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests