fix(riva): rebuild segment assembly for long dictation by rbright · Pull Request #4 · rbright/sotto

rbright · 2026-02-23T20:01:04Z

Summary

restore robust interim segment assembly to avoid dropped early speech in longer dictation sessions
add injectable seams for pipeline transcriber dependencies to enable deterministic lifecycle tests
expand internal module unit coverage (app/audio/config/doctor/pipeline/riva)

Testing

go test ./apps/sotto/internal/... -cover
just ci-check
nix build 'path:.#sotto'

Summary by CodeRabbit

Tests
- Expanded test coverage across audio capture, config parsing, diagnostics, streaming, and transcription to exercise error paths and edge cases.
Bug Fixes
- Improved interim transcript handling, segment merging, and boundary decisions for more accurate recognition.
- Better recovery and clearer reporting when audio services or sockets are unavailable.
Refactor
- Introduced internal abstractions to simplify testing and improve reliability.

Remove interim divergence heuristics and keep session behavior strict: - commit only final ASR segments during an active toggle session - keep a single latest interim snapshot only as end-of-session fallback - no interim pre-commit or boundary inference across updates This aligns runtime behavior with toggle boundaries and avoids stale-prefix commit logic. Tests: - go test ./apps/sotto/internal/riva - just ci-check - nix build 'path:.#sotto'

Restore segment-building behavior inside a single toggle session by sealing mature interim chains on divergence while still treating rewrite/correction updates as continuation. This preserves earlier speech when Riva drops or rewrites interim hypotheses during long utterances and avoids stale-prefix prepends. Tests: - go test ./apps/sotto/internal/riva - just ci-check - nix build 'path:.#sotto'

Introduce capture/stream interfaces and dependency hooks in pipeline.Transcriber so lifecycle logic can be tested without live Pulse/Riva runtime dependencies. Also adds broad transcriber unit coverage for start/stop/cancel/send-loop success and error paths. Tests: - go test ./apps/sotto/internal/pipeline - just ci-check - nix build 'path:.#sotto'

Add focused unit tests across remaining lower-coverage internal modules: - app: status/socket helpers, toggle owner startup failure cleanup, session log fields - audio: pulse-unavailable selection/listing paths and capture close/device wrappers - config: JSONC normalization/decoder/field-validation edge cases - doctor: report + runtime check routing for paste override vs hyprctl defaults Tests: - go test ./apps/sotto/internal/... -cover - just ci-check - nix build 'path:.#sotto'

coderabbitai · 2026-02-23T20:01:29Z

📝 Walkthrough

Walkthrough

Adds numerous tests across app, audio, config, doctor, and riva packages; refactors Transcriber for dependency injection; and changes Riva interim handling from stability-based commits to an age/continuation and boundary-based approach.

Changes

Cohort / File(s)	Summary
App & Pulse tests `apps/sotto/internal/app/app_test.go`, `apps/sotto/internal/audio/pulse_test.go`	Adds runner tests for startup/capture errors, socket helpers, structured logging checks, and Pulse-unavailable tests including Capture channel closure regression.
Config JSONC tests & parser tweak `apps/sotto/internal/config/parser_jsonc_test.go`, `apps/sotto/internal/config/parser_jsonc.go`	Adds comprehensive JSONC parsing and validation tests; ensureSingleJSONValue now decodes into json.RawMessage to detect additional top-level values.
Doctor diagnostics tests `apps/sotto/internal/doctor/doctor_test.go`	Adds tests asserting paste command override behavior and hyprctl fallback selection in Run/Report flows.
Transcriber DI & tests `apps/sotto/internal/pipeline/transcriber.go`, `apps/sotto/internal/pipeline/transcriber_test.go`	Introduces captureClient and streamClient interfaces and function-hook fields (selectDevice, startCapture, dialStream) for dependency injection; extensive tests with fakeCapture/fakeStream covering start/stop, send, error paths, and latency recording.
Riva stream & segments refactor `apps/sotto/internal/riva/stream_receive.go`, `apps/sotto/internal/riva/transcript_segments.go`, `apps/sotto/internal/riva/client.go`, `apps/sotto/internal/riva/client_test.go`	Reworks interim handling to use isInterimContinuation and shouldCommitInterimBoundary with lastInterimAge tracking; updates helpers, constants, and tests to reflect transcript-continuation and chain update thresholds.

Sequence Diagram(s)

sequenceDiagram
  participant T as Transcriber
  participant A as Audio (select/start)
  participant C as Capture
  participant R as Riva (dial/stream)

  T->>A: selectDevice(ctx, selector)
  A-->>T: Device
  T->>R: dialStream(ctx, config)
  R-->>T: StreamClient
  T->>A: startCapture(ctx, device)
  A-->>T: Capture
  loop capture loop
    C-->>T: audio chunk
    T->>R: SendAudio(chunk)
    R-->>T: interim/final transcripts
  end
  T->>R: CloseAndCollect()
  R-->>T: results / latency

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(riva): avoid stale interim carryover in segment assembly #3: Modifies the same Riva interim-handling logic and related tests (stream_receive.go, transcript_segments.go, client.go); likely overlaps in design and tests.
feat(asr): launch local-first cli runtime with tests and docs #1: Adds app tests and IPC/socket/log helpers touching the same app test areas and related helpers.

Poem

🐇 I hopped through tests with ears held high,
I wired hooks so fakes can try,
Age now guards the interim sky,
Chunks and transcripts dancing by—
A tiny hop, a joyful sigh.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.93% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(riva): rebuild segment assembly for long dictation' directly and clearly describes the main change—improving segment assembly in the riva module for extended dictation sessions, which aligns with the substantial modifications to interim handling, segment accumulation, and test coverage across the riva package.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/riva-segment-builder-and-tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6b51096ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

apps/sotto/internal/riva/transcript_segments.go

coderabbitai

🧹 Nitpick comments (1)

apps/sotto/internal/config/parser_jsonc_test.go (1)

114-123: Tighten the expected error to avoid masking regressions.
Accepting “unknown field” here could let multi-value detection regress without failing the test. Consider asserting the specific “multiple JSON values” error once behavior is stable.

💡 Suggested tweak

-	require.True(
-		t,
-		strings.Contains(err.Error(), "multiple JSON values") || strings.Contains(err.Error(), "unknown field"),
-		"unexpected error: %v",
-		err,
-	)
+	require.Contains(t, err.Error(), "multiple JSON values")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 114 - 123, The
test TestParseJSONCRejectsMultipleTopLevelValues currently allows either
"multiple JSON values" or "unknown field", which can mask regressions; update
the assertion to require the specific "multiple JSON values" error from
parseJSONC (referencing parseJSONC and
TestParseJSONCRejectsMultipleTopLevelValues) by replacing the dual-condition
check with a single strict check (e.g., use require.Contains or
require.ErrorContains(err, "multiple JSON values") or assert
strings.Contains(err.Error(), "multiple JSON values")) so the test only passes
when the parser reports the multi-value error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/config/parser_jsonc_test.go`:
- Around line 114-123: The test TestParseJSONCRejectsMultipleTopLevelValues
currently allows either "multiple JSON values" or "unknown field", which can
mask regressions; update the assertion to require the specific "multiple JSON
values" error from parseJSONC (referencing parseJSONC and
TestParseJSONCRejectsMultipleTopLevelValues) by replacing the dual-condition
check with a single strict check (e.g., use require.Contains or
require.ErrorContains(err, "multiple JSON values") or assert
strings.Contains(err.Error(), "multiple JSON values")) so the test only passes
when the parser reports the multi-value error.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7f91e and d6b5109.

📒 Files selected for processing (10)

apps/sotto/internal/app/app_test.go
apps/sotto/internal/audio/pulse_test.go
apps/sotto/internal/config/parser_jsonc_test.go
apps/sotto/internal/doctor/doctor_test.go
apps/sotto/internal/pipeline/transcriber.go
apps/sotto/internal/pipeline/transcriber_test.go
apps/sotto/internal/riva/client.go
apps/sotto/internal/riva/client_test.go
apps/sotto/internal/riva/stream_receive.go
apps/sotto/internal/riva/transcript_segments.go

Make multi-value JSON payloads fail with a stable "multiple JSON values" error and tighten parser regression coverage accordingly. Tests: - go test ./apps/sotto/internal/config - just ci-check - nix build 'path:.#sotto'

Address PR feedback on divergence handling by preserving single-update interim segments when they appear reliable (high stability or sentence punctuation), while retaining age-based chain commits and continuation matching. Tests: - go test ./apps/sotto/internal/riva - just ci-check - nix build 'path:.#sotto'

coderabbitai

🧹 Nitpick comments (2)

apps/sotto/internal/config/parser_jsonc_test.go (2)

120-127: Optional: add a syntax error test to complete wrapJSONDecodeError coverage.

TestParseJSONCTypeErrorIncludesLocation covers the *json.UnmarshalTypeError branch. The *json.SyntaxError branch is untested.

🧪 Suggested additional test

+func TestParseJSONCTypeErrorIncludesLocationForSyntaxError(t *testing.T) {
+	_, _, err := parseJSONC("{\n  \"riva\": {\"grpc\": \"val\"\n}", Default())
+	// Intentionally malformed — missing closing brace to trigger SyntaxError.
+	_, _, err = parseJSONC("{\n  \"riva\": {\n", Default())
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "line")
+	require.Contains(t, err.Error(), "column")
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 120 - 127, Add
a new unit test that triggers and asserts the *json.SyntaxError branch in
wrapJSONDecodeError: call parseJSONC with malformed JSONC input (e.g. a
truncated/invalid token) so parsing returns an error, then require.Error and
check err.Error() contains "line" and "column" (similar to
TestParseJSONCTypeErrorIncludesLocation). Reference parseJSONC and
wrapJSONDecodeError in the test name/description (e.g.,
TestParseJSONCSyntaxErrorIncludesLocation) so the SyntaxError path is exercised
and covered.

84-118: Consider adding a non-object second value case to TestParseJSONCRejectsMultipleTopLevelValues.

The existing test uses {...}{...} (object + object). The core motivation for switching to json.RawMessage is that a concrete struct target returns *json.UnmarshalTypeError when the extra value is a non-object type (number, string, array), silently bypassing the "multiple JSON values" sentinel. A companion case like {"paste":{"enable":false}} 42 would directly document and protect that behavior.

🧪 Suggested additional sub-case

 func TestParseJSONCRejectsMultipleTopLevelValues(t *testing.T) {
 	_, _, err := parseJSONC(`{"paste":{"enable":false}}{"paste":{"enable":true}}`, Default())
 	require.Error(t, err)
 	require.Contains(t, err.Error(), "multiple JSON values")
+
+	// Non-object second value: previously returned UnmarshalTypeError with a struct target.
+	_, _, err = parseJSONC(`{"paste":{"enable":false}} 42`, Default())
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "multiple JSON values")
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 84 - 118, Add a
sub-case to TestParseJSONCRejectsMultipleTopLevelValues that feeds parseJSONC a
JSON string where the second top-level value is a non-object (e.g.,
`{"paste":{"enable":false}} 42`), then require an error and require that the
error message contains "multiple JSON values"; locate the test function
TestParseJSONCRejectsMultipleTopLevelValues and the parseJSONC call to add this
assertion alongside the existing object+object case.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/config/parser_jsonc_test.go`:
- Around line 120-127: Add a new unit test that triggers and asserts the
*json.SyntaxError branch in wrapJSONDecodeError: call parseJSONC with malformed
JSONC input (e.g. a truncated/invalid token) so parsing returns an error, then
require.Error and check err.Error() contains "line" and "column" (similar to
TestParseJSONCTypeErrorIncludesLocation). Reference parseJSONC and
wrapJSONDecodeError in the test name/description (e.g.,
TestParseJSONCSyntaxErrorIncludesLocation) so the SyntaxError path is exercised
and covered.
- Around line 84-118: Add a sub-case to
TestParseJSONCRejectsMultipleTopLevelValues that feeds parseJSONC a JSON string
where the second top-level value is a non-object (e.g.,
`{"paste":{"enable":false}} 42`), then require an error and require that the
error message contains "multiple JSON values"; locate the test function
TestParseJSONCRejectsMultipleTopLevelValues and the parseJSONC call to add this
assertion alongside the existing object+object case.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6b5109 and 120312e.

📒 Files selected for processing (2)

apps/sotto/internal/config/parser_jsonc.go
apps/sotto/internal/config/parser_jsonc_test.go

coderabbitai

🧹 Nitpick comments (2)

apps/sotto/internal/riva/transcript_segments.go (1)
67-69: shorter == 0 guard is unreachable dead code.

At this point both previous and current have already passed the non-empty check on lines 48–49, and cleanSegment guarantees that a non-empty result produces at least one strings.Fields token, so shorter is always ≥ 1 when execution reaches line 67.
🧹 Proposed cleanup
-	if shorter == 0 {
-		return true
-	}
-
 	if commonPrefixWords(prevWords, currWords)*2 >= shorter {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/riva/transcript_segments.go` around lines 67 - 69, The
if-block checking "shorter == 0" is unreachable because both previous and
current are already non-empty (lines that call cleanSegment ensure
strings.Fields produces at least one token), so remove the unreachable guard;
delete the "if shorter == 0 { return true }" check referencing the shorter
variable and rely on the existing comparison logic between previous and current
(and functions/variables cleanSegment, previous, current, shorter) to determine
equality.
apps/sotto/internal/riva/client_test.go (1)
140-156: Consider adding intermediate-state assertions to the long-stream test.

Unlike TestRecordResponseCommitsInterimChainOnDivergence (which asserts s.segments, s.lastInterim, and s.lastInterimAge individually), this test only asserts the final collectSegments output. A mid-stream s.segments assertion (e.g., after "beta" triggers the first boundary commit) would pin down exactly where each segment is sealed and make regressions easier to diagnose.
♻️ Suggested intermediate assertion
 	for _, resp := range responses {
 		s.recordResponse(resp)
 	}

+	// After "alpha one" and "beta two" chains are sealed by "gamma"
+	require.Equal(t, []string{"alpha one", "beta two"}, s.segments)
+	require.Equal(t, "gamma", s.lastInterim)
+	require.Equal(t, 1, s.lastInterimAge)
+
 	segments := collectSegments(s.segments, s.lastInterim)
 	require.Equal(t, []string{"alpha one", "beta two", "gamma"}, segments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/riva/client_test.go` around lines 140 - 156, Add an
intermediate assertion in
TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream to verify stream
state after the "beta" interim triggers a boundary: after calling
s.recordResponse for the response with Transcript "beta", assert s.segments
contains the expected sealed segment(s) (e.g., ["alpha one"]), check
s.lastInterim equals the current interim ("beta") and assert s.lastInterimAge
has been updated appropriately; use the existing helpers/variables
(TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream,
s.recordResponse, s.segments, s.lastInterim, s.lastInterimAge, collectSegments)
so the test pins the mid-stream commit point before continuing to the remaining
responses.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/riva/client_test.go`:
- Around line 140-156: Add an intermediate assertion in
TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream to verify stream
state after the "beta" interim triggers a boundary: after calling
s.recordResponse for the response with Transcript "beta", assert s.segments
contains the expected sealed segment(s) (e.g., ["alpha one"]), check
s.lastInterim equals the current interim ("beta") and assert s.lastInterimAge
has been updated appropriately; use the existing helpers/variables
(TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream,
s.recordResponse, s.segments, s.lastInterim, s.lastInterimAge, collectSegments)
so the test pins the mid-stream commit point before continuing to the remaining
responses.

In `@apps/sotto/internal/riva/transcript_segments.go`:
- Around line 67-69: The if-block checking "shorter == 0" is unreachable because
both previous and current are already non-empty (lines that call cleanSegment
ensure strings.Fields produces at least one token), so remove the unreachable
guard; delete the "if shorter == 0 { return true }" check referencing the
shorter variable and rely on the existing comparison logic between previous and
current (and functions/variables cleanSegment, previous, current, shorter) to
determine equality.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 120312e and 63ca29e.

📒 Files selected for processing (4)

apps/sotto/internal/riva/client.go
apps/sotto/internal/riva/client_test.go
apps/sotto/internal/riva/stream_receive.go
apps/sotto/internal/riva/transcript_segments.go

rbright added 4 commits February 22, 2026 20:49

chatgpt-codex-connector bot reviewed Feb 23, 2026

View reviewed changes

apps/sotto/internal/riva/transcript_segments.go Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

rbright added 2 commits February 23, 2026 13:44

fix(config): enforce single top-level JSONC value

120312e

Make multi-value JSON payloads fail with a stable "multiple JSON values" error and tighten parser regression coverage accordingly. Tests: - go test ./apps/sotto/internal/config - just ci-check - nix build 'path:.#sotto'

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

rbright merged commit 8307e65 into main Feb 23, 2026
2 checks passed

rbright deleted the fix/riva-segment-builder-and-tests branch February 23, 2026 20:59

coderabbitai bot mentioned this pull request Feb 27, 2026

fix(riva): preserve long-dictation leading segments on interim divergence #5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(riva): rebuild segment assembly for long dictation#4

fix(riva): rebuild segment assembly for long dictation#4
rbright merged 6 commits intomainfrom
fix/riva-segment-builder-and-tests

rbright commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rbright commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rbright commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading