Skip to content

fix(riva): rebuild segment assembly for long dictation#4

Merged
rbright merged 6 commits intomainfrom
fix/riva-segment-builder-and-tests
Feb 23, 2026
Merged

fix(riva): rebuild segment assembly for long dictation#4
rbright merged 6 commits intomainfrom
fix/riva-segment-builder-and-tests

Conversation

@rbright
Copy link
Owner

@rbright rbright commented Feb 23, 2026

Summary

  • restore robust interim segment assembly to avoid dropped early speech in longer dictation sessions
  • add injectable seams for pipeline transcriber dependencies to enable deterministic lifecycle tests
  • expand internal module unit coverage (app/audio/config/doctor/pipeline/riva)

Testing

  • go test ./apps/sotto/internal/... -cover
  • just ci-check
  • nix build 'path:.#sotto'

Summary by CodeRabbit

  • Tests

    • Expanded test coverage across audio capture, config parsing, diagnostics, streaming, and transcription to exercise error paths and edge cases.
  • Bug Fixes

    • Improved interim transcript handling, segment merging, and boundary decisions for more accurate recognition.
    • Better recovery and clearer reporting when audio services or sockets are unavailable.
  • Refactor

    • Introduced internal abstractions to simplify testing and improve reliability.

Remove interim divergence heuristics and keep session behavior strict:

- commit only final ASR segments during an active toggle session
- keep a single latest interim snapshot only as end-of-session fallback
- no interim pre-commit or boundary inference across updates

This aligns runtime behavior with toggle boundaries and avoids stale-prefix commit logic.

Tests:
- go test ./apps/sotto/internal/riva
- just ci-check
- nix build 'path:.#sotto'
Restore segment-building behavior inside a single toggle session by sealing mature interim chains on divergence while still treating rewrite/correction updates as continuation.

This preserves earlier speech when Riva drops or rewrites interim hypotheses during long utterances and avoids stale-prefix prepends.

Tests:
- go test ./apps/sotto/internal/riva
- just ci-check
- nix build 'path:.#sotto'
Introduce capture/stream interfaces and dependency hooks in pipeline.Transcriber so lifecycle logic can be tested without live Pulse/Riva runtime dependencies.

Also adds broad transcriber unit coverage for start/stop/cancel/send-loop success and error paths.

Tests:
- go test ./apps/sotto/internal/pipeline
- just ci-check
- nix build 'path:.#sotto'
Add focused unit tests across remaining lower-coverage internal modules:
- app: status/socket helpers, toggle owner startup failure cleanup, session log fields
- audio: pulse-unavailable selection/listing paths and capture close/device wrappers
- config: JSONC normalization/decoder/field-validation edge cases
- doctor: report + runtime check routing for paste override vs hyprctl defaults

Tests:
- go test ./apps/sotto/internal/... -cover
- just ci-check
- nix build 'path:.#sotto'
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Adds numerous tests across app, audio, config, doctor, and riva packages; refactors Transcriber for dependency injection; and changes Riva interim handling from stability-based commits to an age/continuation and boundary-based approach.

Changes

Cohort / File(s) Summary
App & Pulse tests
apps/sotto/internal/app/app_test.go, apps/sotto/internal/audio/pulse_test.go
Adds runner tests for startup/capture errors, socket helpers, structured logging checks, and Pulse-unavailable tests including Capture channel closure regression.
Config JSONC tests & parser tweak
apps/sotto/internal/config/parser_jsonc_test.go, apps/sotto/internal/config/parser_jsonc.go
Adds comprehensive JSONC parsing and validation tests; ensureSingleJSONValue now decodes into json.RawMessage to detect additional top-level values.
Doctor diagnostics tests
apps/sotto/internal/doctor/doctor_test.go
Adds tests asserting paste command override behavior and hyprctl fallback selection in Run/Report flows.
Transcriber DI & tests
apps/sotto/internal/pipeline/transcriber.go, apps/sotto/internal/pipeline/transcriber_test.go
Introduces captureClient and streamClient interfaces and function-hook fields (selectDevice, startCapture, dialStream) for dependency injection; extensive tests with fakeCapture/fakeStream covering start/stop, send, error paths, and latency recording.
Riva stream & segments refactor
apps/sotto/internal/riva/stream_receive.go, apps/sotto/internal/riva/transcript_segments.go, apps/sotto/internal/riva/client.go, apps/sotto/internal/riva/client_test.go
Reworks interim handling to use isInterimContinuation and shouldCommitInterimBoundary with lastInterimAge tracking; updates helpers, constants, and tests to reflect transcript-continuation and chain update thresholds.

Sequence Diagram(s)

sequenceDiagram
  participant T as Transcriber
  participant A as Audio (select/start)
  participant C as Capture
  participant R as Riva (dial/stream)

  T->>A: selectDevice(ctx, selector)
  A-->>T: Device
  T->>R: dialStream(ctx, config)
  R-->>T: StreamClient
  T->>A: startCapture(ctx, device)
  A-->>T: Capture
  loop capture loop
    C-->>T: audio chunk
    T->>R: SendAudio(chunk)
    R-->>T: interim/final transcripts
  end
  T->>R: CloseAndCollect()
  R-->>T: results / latency
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through tests with ears held high,
I wired hooks so fakes can try,
Age now guards the interim sky,
Chunks and transcripts dancing by—
A tiny hop, a joyful sigh.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(riva): rebuild segment assembly for long dictation' directly and clearly describes the main change—improving segment assembly in the riva module for extended dictation sessions, which aligns with the substantial modifications to interim handling, segment accumulation, and test coverage across the riva package.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/riva-segment-builder-and-tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6b51096ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/sotto/internal/config/parser_jsonc_test.go (1)

114-123: Tighten the expected error to avoid masking regressions.
Accepting “unknown field” here could let multi-value detection regress without failing the test. Consider asserting the specific “multiple JSON values” error once behavior is stable.

💡 Suggested tweak
-	require.True(
-		t,
-		strings.Contains(err.Error(), "multiple JSON values") || strings.Contains(err.Error(), "unknown field"),
-		"unexpected error: %v",
-		err,
-	)
+	require.Contains(t, err.Error(), "multiple JSON values")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 114 - 123, The
test TestParseJSONCRejectsMultipleTopLevelValues currently allows either
"multiple JSON values" or "unknown field", which can mask regressions; update
the assertion to require the specific "multiple JSON values" error from
parseJSONC (referencing parseJSONC and
TestParseJSONCRejectsMultipleTopLevelValues) by replacing the dual-condition
check with a single strict check (e.g., use require.Contains or
require.ErrorContains(err, "multiple JSON values") or assert
strings.Contains(err.Error(), "multiple JSON values")) so the test only passes
when the parser reports the multi-value error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/config/parser_jsonc_test.go`:
- Around line 114-123: The test TestParseJSONCRejectsMultipleTopLevelValues
currently allows either "multiple JSON values" or "unknown field", which can
mask regressions; update the assertion to require the specific "multiple JSON
values" error from parseJSONC (referencing parseJSONC and
TestParseJSONCRejectsMultipleTopLevelValues) by replacing the dual-condition
check with a single strict check (e.g., use require.Contains or
require.ErrorContains(err, "multiple JSON values") or assert
strings.Contains(err.Error(), "multiple JSON values")) so the test only passes
when the parser reports the multi-value error.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7f91e and d6b5109.

📒 Files selected for processing (10)
  • apps/sotto/internal/app/app_test.go
  • apps/sotto/internal/audio/pulse_test.go
  • apps/sotto/internal/config/parser_jsonc_test.go
  • apps/sotto/internal/doctor/doctor_test.go
  • apps/sotto/internal/pipeline/transcriber.go
  • apps/sotto/internal/pipeline/transcriber_test.go
  • apps/sotto/internal/riva/client.go
  • apps/sotto/internal/riva/client_test.go
  • apps/sotto/internal/riva/stream_receive.go
  • apps/sotto/internal/riva/transcript_segments.go

Make multi-value JSON payloads fail with a stable "multiple JSON values" error and tighten parser regression coverage accordingly.

Tests:
- go test ./apps/sotto/internal/config
- just ci-check
- nix build 'path:.#sotto'
Address PR feedback on divergence handling by preserving single-update interim segments when they appear reliable (high stability or sentence punctuation), while retaining age-based chain commits and continuation matching.

Tests:
- go test ./apps/sotto/internal/riva
- just ci-check
- nix build 'path:.#sotto'
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/sotto/internal/config/parser_jsonc_test.go (2)

120-127: Optional: add a syntax error test to complete wrapJSONDecodeError coverage.

TestParseJSONCTypeErrorIncludesLocation covers the *json.UnmarshalTypeError branch. The *json.SyntaxError branch is untested.

🧪 Suggested additional test
+func TestParseJSONCTypeErrorIncludesLocationForSyntaxError(t *testing.T) {
+	_, _, err := parseJSONC("{\n  \"riva\": {\"grpc\": \"val\"\n}", Default())
+	// Intentionally malformed — missing closing brace to trigger SyntaxError.
+	_, _, err = parseJSONC("{\n  \"riva\": {\n", Default())
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "line")
+	require.Contains(t, err.Error(), "column")
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 120 - 127, Add
a new unit test that triggers and asserts the *json.SyntaxError branch in
wrapJSONDecodeError: call parseJSONC with malformed JSONC input (e.g. a
truncated/invalid token) so parsing returns an error, then require.Error and
check err.Error() contains "line" and "column" (similar to
TestParseJSONCTypeErrorIncludesLocation). Reference parseJSONC and
wrapJSONDecodeError in the test name/description (e.g.,
TestParseJSONCSyntaxErrorIncludesLocation) so the SyntaxError path is exercised
and covered.

84-118: Consider adding a non-object second value case to TestParseJSONCRejectsMultipleTopLevelValues.

The existing test uses {...}{...} (object + object). The core motivation for switching to json.RawMessage is that a concrete struct target returns *json.UnmarshalTypeError when the extra value is a non-object type (number, string, array), silently bypassing the "multiple JSON values" sentinel. A companion case like {"paste":{"enable":false}} 42 would directly document and protect that behavior.

🧪 Suggested additional sub-case
 func TestParseJSONCRejectsMultipleTopLevelValues(t *testing.T) {
 	_, _, err := parseJSONC(`{"paste":{"enable":false}}{"paste":{"enable":true}}`, Default())
 	require.Error(t, err)
 	require.Contains(t, err.Error(), "multiple JSON values")
+
+	// Non-object second value: previously returned UnmarshalTypeError with a struct target.
+	_, _, err = parseJSONC(`{"paste":{"enable":false}} 42`, Default())
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "multiple JSON values")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/config/parser_jsonc_test.go` around lines 84 - 118, Add a
sub-case to TestParseJSONCRejectsMultipleTopLevelValues that feeds parseJSONC a
JSON string where the second top-level value is a non-object (e.g.,
`{"paste":{"enable":false}} 42`), then require an error and require that the
error message contains "multiple JSON values"; locate the test function
TestParseJSONCRejectsMultipleTopLevelValues and the parseJSONC call to add this
assertion alongside the existing object+object case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/config/parser_jsonc_test.go`:
- Around line 120-127: Add a new unit test that triggers and asserts the
*json.SyntaxError branch in wrapJSONDecodeError: call parseJSONC with malformed
JSONC input (e.g. a truncated/invalid token) so parsing returns an error, then
require.Error and check err.Error() contains "line" and "column" (similar to
TestParseJSONCTypeErrorIncludesLocation). Reference parseJSONC and
wrapJSONDecodeError in the test name/description (e.g.,
TestParseJSONCSyntaxErrorIncludesLocation) so the SyntaxError path is exercised
and covered.
- Around line 84-118: Add a sub-case to
TestParseJSONCRejectsMultipleTopLevelValues that feeds parseJSONC a JSON string
where the second top-level value is a non-object (e.g.,
`{"paste":{"enable":false}} 42`), then require an error and require that the
error message contains "multiple JSON values"; locate the test function
TestParseJSONCRejectsMultipleTopLevelValues and the parseJSONC call to add this
assertion alongside the existing object+object case.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6b5109 and 120312e.

📒 Files selected for processing (2)
  • apps/sotto/internal/config/parser_jsonc.go
  • apps/sotto/internal/config/parser_jsonc_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/sotto/internal/riva/transcript_segments.go (1)

67-69: shorter == 0 guard is unreachable dead code.

At this point both previous and current have already passed the non-empty check on lines 48–49, and cleanSegment guarantees that a non-empty result produces at least one strings.Fields token, so shorter is always ≥ 1 when execution reaches line 67.

🧹 Proposed cleanup
-	if shorter == 0 {
-		return true
-	}
-
 	if commonPrefixWords(prevWords, currWords)*2 >= shorter {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/riva/transcript_segments.go` around lines 67 - 69, The
if-block checking "shorter == 0" is unreachable because both previous and
current are already non-empty (lines that call cleanSegment ensure
strings.Fields produces at least one token), so remove the unreachable guard;
delete the "if shorter == 0 { return true }" check referencing the shorter
variable and rely on the existing comparison logic between previous and current
(and functions/variables cleanSegment, previous, current, shorter) to determine
equality.
apps/sotto/internal/riva/client_test.go (1)

140-156: Consider adding intermediate-state assertions to the long-stream test.

Unlike TestRecordResponseCommitsInterimChainOnDivergence (which asserts s.segments, s.lastInterim, and s.lastInterimAge individually), this test only asserts the final collectSegments output. A mid-stream s.segments assertion (e.g., after "beta" triggers the first boundary commit) would pin down exactly where each segment is sealed and make regressions easier to diagnose.

♻️ Suggested intermediate assertion
 	for _, resp := range responses {
 		s.recordResponse(resp)
 	}

+	// After "alpha one" and "beta two" chains are sealed by "gamma"
+	require.Equal(t, []string{"alpha one", "beta two"}, s.segments)
+	require.Equal(t, "gamma", s.lastInterim)
+	require.Equal(t, 1, s.lastInterimAge)
+
 	segments := collectSegments(s.segments, s.lastInterim)
 	require.Equal(t, []string{"alpha one", "beta two", "gamma"}, segments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/sotto/internal/riva/client_test.go` around lines 140 - 156, Add an
intermediate assertion in
TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream to verify stream
state after the "beta" interim triggers a boundary: after calling
s.recordResponse for the response with Transcript "beta", assert s.segments
contains the expected sealed segment(s) (e.g., ["alpha one"]), check
s.lastInterim equals the current interim ("beta") and assert s.lastInterimAge
has been updated appropriately; use the existing helpers/variables
(TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream,
s.recordResponse, s.segments, s.lastInterim, s.lastInterimAge, collectSegments)
so the test pins the mid-stream commit point before continuing to the remaining
responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/sotto/internal/riva/client_test.go`:
- Around line 140-156: Add an intermediate assertion in
TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream to verify stream
state after the "beta" interim triggers a boundary: after calling
s.recordResponse for the response with Transcript "beta", assert s.segments
contains the expected sealed segment(s) (e.g., ["alpha one"]), check
s.lastInterim equals the current interim ("beta") and assert s.lastInterimAge
has been updated appropriately; use the existing helpers/variables
(TestRecordResponseBuildsMultipleSegmentsAcrossLongInterimStream,
s.recordResponse, s.segments, s.lastInterim, s.lastInterimAge, collectSegments)
so the test pins the mid-stream commit point before continuing to the remaining
responses.

In `@apps/sotto/internal/riva/transcript_segments.go`:
- Around line 67-69: The if-block checking "shorter == 0" is unreachable because
both previous and current are already non-empty (lines that call cleanSegment
ensure strings.Fields produces at least one token), so remove the unreachable
guard; delete the "if shorter == 0 { return true }" check referencing the
shorter variable and rely on the existing comparison logic between previous and
current (and functions/variables cleanSegment, previous, current, shorter) to
determine equality.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 120312e and 63ca29e.

📒 Files selected for processing (4)
  • apps/sotto/internal/riva/client.go
  • apps/sotto/internal/riva/client_test.go
  • apps/sotto/internal/riva/stream_receive.go
  • apps/sotto/internal/riva/transcript_segments.go

@rbright rbright merged commit 8307e65 into main Feb 23, 2026
2 checks passed
@rbright rbright deleted the fix/riva-segment-builder-and-tests branch February 23, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant