Skip to content

Conversation

@HavenDV
Copy link
Contributor

@HavenDV HavenDV commented Sep 11, 2025

Summary by CodeRabbit

  • New Features

    • Added Semantic VAD turn detection with selectable eagerness levels.
    • Unified turn detection into a reusable option selectable by type.
  • Changes

    • Streamlined server-side VAD configuration and applied idle timeout limits (5–30s).
    • Renamed the input_audio_buffer timeout event to a stable name; clients using the old event or inline configs may need updates.
  • Documentation

    • Expanded turn detection guidance and clarified timeout behavior.
    • Updated examples to match the new event shape and schema.

@coderabbitai
Copy link

coderabbitai bot commented Sep 11, 2025

Walkthrough

Defines a reusable RealtimeTurnDetection union schema (server_vad and semantic_vad) with a discriminator, updates all turn_detection references to use it, adds constraints and descriptions, renames a timeout event schema, and revises related metadata, descriptions, and examples in src/libs/tryAGI.OpenAI/openapi.yaml.

Changes

Cohort / File(s) Change Summary
Turn detection schema refactor
src/libs/tryAGI.OpenAI/openapi.yaml
Added components/schemas/RealtimeTurnDetection as a discriminated union with variants: Server VAD and Semantic VAD; set union-level nullable: true; replaced inline turn_detection objects with $ref to the new schema.
Server VAD constraints
src/libs/tryAGI.OpenAI/openapi.yaml
Tightened idle_timeout_ms (min 5000, max 30000) and expanded descriptions; retained related fields under Server VAD variant.
Semantic VAD introduction
src/libs/tryAGI.OpenAI/openapi.yaml
Added Semantic VAD variant with type: semantic_vad, eagerness enum, and fields like create_response and interrupt_response.
Event renaming and metadata
src/libs/tryAGI.OpenAI/openapi.yaml
Renamed RealtimeBetaServerEventInputAudioBufferTimeoutTriggered to RealtimeServerEventInputAudioBufferTimeoutTriggered; updated x-oaiMeta keys and references.
Docs and examples updates
src/libs/tryAGI.OpenAI/openapi.yaml
Revised descriptions to align with union semantics; updated input_audio_buffer.timeout_triggered example (field order and details).
General description edits
src/libs/tryAGI.OpenAI/openapi.yaml
Trimmed and clarified long-form descriptions and truncation guidance to match new public schema structure.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant R as Realtime API
  participant TD as Turn Detection (discriminated)

  Note over C,R: Session setup with reusable turn_detection schema
  C->>R: Create/Configure session { turn_detection: { type, ... } }
  R->>TD: Initialize turn detection via discriminator on type
  TD-->>R: Active variant (server_vad or semantic_vad)

  Note over C,R: Audio streaming and detection
  C->>R: Stream input_audio_buffer
  R->>TD: Analyze audio and timings
  alt Silence/Idle reached
    TD-->>R: Timeout condition met
    R-->>C: input_audio_buffer.timeout_triggered (renamed schema)
  else Speech continues
    TD-->>R: No timeout
    R-->>C: Continue streaming/responses as configured
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The current title "feat:@coderabbitai" is vague and does not describe the primary changes in this PR (the new RealtimeTurnDetection union, refactor of turn_detection references, and the timeout event rename), so it does not provide a concise summary a reviewer can scan to understand the main change. It therefore fails the guideline requiring a short, specific single-sentence title. Please replace the title with a short, specific sentence that highlights the main change (for example: "feat(realtime): add RealtimeTurnDetection union and rename timeout event" or "refactor(openapi): consolidate turn_detection into RealtimeTurnDetection and rename timeout schema"); avoid mentions/emojis and keep it focused on the primary API change so the check can be re-evaluated.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

I twitch my ears at schemas new,
Two paths to turns—semantic, VAD—woo-hoo! 🥕
A timeout renamed, neat and tidy,
Discriminator guides me spry and spritely.
I thump approval, hop along—
Realtime’s rhythm, my review song.

✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bot/update-openapi_202509111819

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@HavenDV HavenDV merged commit 89c6575 into main Sep 11, 2025
2 of 4 checks passed
@HavenDV HavenDV deleted the bot/update-openapi_202509111819 branch September 11, 2025 18:20
@coderabbitai coderabbitai bot changed the title feat:@coderabbitai feat:OpenAPI: add RealtimeTurnDetection union; update refs, rename timeout event Sep 11, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
src/libs/tryAGI.OpenAI/openapi.yaml (4)

23050-23050: Tighten wording for offsets (clarity).

Minor phrasing tweaks improve precision and reduce ambiguity.

-          description: Millisecond offset of audio written to the input audio buffer at the time the timeout was triggered.
+          description: Millisecond offset, from the start of the input audio buffer, of the last audio written when the timeout was triggered.
-          description: Millisecond offset of audio written to the input audio buffer that was after the playback time of the last model response.
+          description: Millisecond offset, from the start of the input audio buffer, of the first audio written after playback of the last model response.

Also applies to: 23053-23053


23065-23065: Polish description; small grammar/flow fixes.

Optional edits for readability; semantics unchanged.

-There may be speech
-that didn't trigger VAD but is still detected by the model, so the model may respond with
-something relevant to the conversation or a prompt to continue speaking.
+There may be speech that didn't trigger VAD but is still detected by the model, so the model may respond with something relevant to the conversation or a prompt to continue speaking.

24911-24919: Add defaults to match docs.

Docs mention defaults for prefix_padding_ms (300ms) and silence_duration_ms (500ms); encode them as defaults for stronger contracts.

             prefix_padding_ms:
               type: integer
               description: "Used only for `server_vad` mode. Amount of audio to include before the VAD detected speech (in
 milliseconds). Defaults to 300ms.\n"
+              default: 300
             silence_duration_ms:
               type: integer
               description: "Used only for `server_vad` mode. Duration of silence to detect speech stop (in milliseconds). Defaults
 to 500ms. With shorter values the model will respond more quickly,
 but may jump in on short pauses from the user.\n"
+              default: 500

24957-24958: Consider discriminator mappings for robust codegen.

Inline anyOf variants + discriminator work, but some generators (incl. Stainless in strict modes) prefer explicit mappings to component refs. Optional: extract variants into named component schemas and add discriminator.mapping.

Example structure:

components:
  schemas:
    RealtimeTurnDetectionServerVAD: { ... }
    RealtimeTurnDetectionSemanticVAD: { ... }
    RealtimeTurnDetection:
      anyOf:
        - $ref: '#/components/schemas/RealtimeTurnDetectionServerVAD'
        - $ref: '#/components/schemas/RealtimeTurnDetectionSemanticVAD'
      discriminator:
        propertyName: type
        mapping:
          server_vad: '#/components/schemas/RealtimeTurnDetectionServerVAD'
          semantic_vad: '#/components/schemas/RealtimeTurnDetectionSemanticVAD'
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73d0cdd and 4abce5a.

⛔ Files ignored due to path filters (43)
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI..JsonSerializerContext.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeSessionCreateRequestGAAudioInputTurnDetectionEagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeSessionCreateRequestGAAudioInputTurnDetectionEagernessNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeSessionCreateResponseGAAudioInputTurnDetectionEagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeSessionCreateResponseGAAudioInputTurnDetectionEagernessNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionEagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionEagernessNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionType.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionTypeNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetection.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant1Type.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant1TypeNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant2Eagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant2EagernessNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant2Type.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonConverters.RealtimeTurnDetectionVariant2TypeNullable.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeBetaServerEventInputAudioBufferTimeoutTriggered.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeBetaServerEventInputAudioBufferTimeoutTriggeredType.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeServerEventInputAudioBufferTimeoutTriggered.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateRequestGAAudioInput.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateRequestGAAudioInputTurnDetection.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateRequestGAAudioInputTurnDetectionType.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateResponseGAAudioInput.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateResponseGAAudioInputTurnDetection.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateResponseGAAudioInputTurnDetectionEagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeSessionCreateResponseGAAudioInputTurnDetectionType.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTranscriptionSessionCreateRequestGAAudioInput.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionEagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTranscriptionSessionCreateRequestGAAudioInputTurnDetectionType.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetection.Json.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetection.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionDiscriminator.Json.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionDiscriminator.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant1.Json.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant1.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant1Type.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant2.Json.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant2.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant2Eagerness.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.RealtimeTurnDetectionVariant2Type.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ResponseProperties.g.cs is excluded by !**/generated/**
  • src/libs/tryAGI.OpenAI/Generated/tryAGI.OpenAI.Models.ResponsePropertiesTruncation.g.cs is excluded by !**/generated/**
📒 Files selected for processing (1)
  • src/libs/tryAGI.OpenAI/openapi.yaml (9 hunks)
🔇 Additional comments (6)
src/libs/tryAGI.OpenAI/openapi.yaml (6)

24535-24535: Consistent reuse of RealtimeTurnDetection.

Good alignment across schemas.


24726-24726: Consistent reuse (session schema).

LGTM.


24901-24906: Confirm idle_timeout_ms limits align with server validation.

Min/max (5000–30000 ms) look reasonable; please confirm backend enforces the same to avoid SDK/server drift.


24956-24958: nullable in OAS 3.1 vs 3.0 — verify spec version and adjust if needed.

If this spec is OAS 3.1, prefer JSON Schema nullability (type: [object, "null"] / oneOf) over nullable: true. If OAS 3.0, current usage is fine.

Proposed 3.1-style tweak:

-      nullable: true
+      type: [object, "null"]
-              nullable: true
+              oneOf:
+                - type: integer
+                - type: 'null'

Also applies to: 24906-24906


26578-26578: LGTM on truncation description.

Clearer behavior explanation without changing semantics.


33957-33959: Rename propagation check — no remaining Beta references.
Search for RealtimeBetaServerEventInputAudioBufferTimeoutTriggered returned no matches; only input_audio_buffer.timeout_triggered appears in openapi.yaml and generated code.

python: "from openai import OpenAI\n\nclient = OpenAI(\n api_key=\"My API Key\",\n)\nclient_secret = client.realtime.client_secrets.create()\nprint(client_secret.expires_at)"
ruby: "require \"openai\"\n\nopenai = OpenAI::Client.new(api_key: \"My API Key\")\n\nclient_secret = openai.realtime.client_secrets.create\n\nputs(client_secret)"
response: "{\n \"value\": \"ek_68af296e8e408191a1120ab6383263c2\",\n \"expires_at\": 1756310470,\n \"session\": {\n \"type\": \"realtime\",\n \"object\": \"realtime.session\",\n \"id\": \"sess_C9CiUVUzUzYIssh3ELY1d\",\n \"model\": \"gpt-realtime\",\n \"output_modalities\": [\n \"audio\"\n ],\n \"instructions\": \"You are a friendly assistant.\",\n \"tools\": [],\n \"tool_choice\": \"auto\",\n \"max_output_tokens\": \"inf\",\n \"tracing\": null,\n \"truncation\": \"auto\",\n \"prompt\": null,\n \"expires_at\": 0,\n \"audio\": {\n \"input\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"transcription\": null,\n \"noise_reduction\": null,\n \"turn_detection\": {\n \"type\": \"server_vad\",\n \"threshold\": 0.5,\n \"prefix_padding_ms\": 300,\n \"silence_duration_ms\": 200,\n \"idle_timeout_ms\": null,\n \"create_response\": true,\n \"interrupt_response\": true\n }\n },\n \"output\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"voice\": \"alloy\",\n \"speed\": 1.0\n }\n },\n \"include\": null\n }\n}\n"
response: "{\n \"value\": \"ek_68af296e8e408191a1120ab6383263c2\",\n \"expires_at\": 1756310470,\n \"session\": {\n \"type\": \"realtime\",\n \"object\": \"realtime.session\",\n \"id\": \"sess_C9CiUVUzUzYIssh3ELY1d\",\n \"model\": \"gpt-realtime\",\n \"output_modalities\": [\n \"audio\"\n ],\n \"instructions\": \"You are a friendly assistant.\",\n \"tools\": [],\n \"tool_choice\": \"auto\",\n \"max_output_tokens\": \"inf\",\n \"tracing\": null,\n \"truncation\": \"auto\",\n \"prompt\": null,\n \"expires_at\": 0,\n \"audio\": {\n \"input\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"transcription\": null,\n \"noise_reduction\": null,\n \"turn_detection\": {\n \"type\": \"server_vad\",\n }\n },\n \"output\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"voice\": \"alloy\",\n \"speed\": 1.0\n }\n },\n \"include\": null\n }\n}\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix invalid JSON in example: trailing comma in turn_detection block.

The response example has a trailing comma after "server_vad", making it invalid JSON.

Apply this diff inside the stringified JSON:

-        "turn_detection": {
-          "type": "server_vad",
-        }
+        "turn_detection": {
+          "type": "server_vad"
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response: "{\n \"value\": \"ek_68af296e8e408191a1120ab6383263c2\",\n \"expires_at\": 1756310470,\n \"session\": {\n \"type\": \"realtime\",\n \"object\": \"realtime.session\",\n \"id\": \"sess_C9CiUVUzUzYIssh3ELY1d\",\n \"model\": \"gpt-realtime\",\n \"output_modalities\": [\n \"audio\"\n ],\n \"instructions\": \"You are a friendly assistant.\",\n \"tools\": [],\n \"tool_choice\": \"auto\",\n \"max_output_tokens\": \"inf\",\n \"tracing\": null,\n \"truncation\": \"auto\",\n \"prompt\": null,\n \"expires_at\": 0,\n \"audio\": {\n \"input\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"transcription\": null,\n \"noise_reduction\": null,\n \"turn_detection\": {\n \"type\": \"server_vad\",\n }\n },\n \"output\": {\n \"format\": {\n \"type\": \"audio/pcm\",\n \"rate\": 24000\n },\n \"voice\": \"alloy\",\n \"speed\": 1.0\n }\n },\n \"include\": null\n }\n}\n"
"turn_detection": {
"type": "server_vad"
}
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around line 5468, the example JSON
response inside the string contains a trailing comma after the "server_vad"
field in the turn_detection object, making it invalid JSON; remove that trailing
comma (ensure items inside turn_detection are comma-separated only when followed
by another key), validate the entire stringified JSON with a JSON
linter/validator, and update the YAML example so the JSON parses correctly.

description: "Type of turn detection.\n"
default: server_vad
description: "Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.\nServer VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.\nSemantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with \"uhhm\", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.\n"
$ref: '#/components/schemas/RealtimeTurnDetection'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Refactor to shared schema looks good.

turn_detection now references the reusable RealtimeTurnDetection. Nice consolidation.

Run to ensure no stray inline turn_detection definitions remain:


🏁 Script executed:

#!/bin/bash
rg -nP -C2 'turn_detection:\s*(\n\s+(?!\$ref).*){1,10}' src/libs/tryAGI.OpenAI/openapi.yaml

Length of output: 83


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/libs/tryAGI.OpenAI/openapi.yaml"

if [ ! -f "$file" ]; then
  echo "FILE_NOT_FOUND:$file"
  exit 0
fi

echo "1) grep for 'turn_detection:' with context:"
rg -n -C3 '^\s*turn_detection\s*:' "$file" || true

echo
echo "2) For each occurrence show up to 12 following lines and whether \$ref appears in that block:"
awk '
/turn_detection:/{
  start=NR
  print "MATCH_AT_LINE:" start
  block=$0 "\n"
  for(i=1;i<=12;i++){
    if(getline){
      block = block $0 "\n"
    } else break
  }
  print block
  if(block ~ /\$ref/){
    print "HAS_REF: true"
  } else {
    print "HAS_REF: false"
  }
  print "----"
}
' "$file" || true

echo
echo "3) occurrences of 'RealtimeTurnDetection' (should exist if ref points to it):"
rg -n "RealtimeTurnDetection" "$file" || true

echo
echo "4) scan repo for any other 'turn_detection:' occurrences (all files):"
rg -n --hidden -S '^\s*turn_detection\s*:' || true

Length of output: 15140


Replace remaining inline turn_detection definitions with $ref to RealtimeTurnDetection

Mixed usage found — convert the inline schema definitions to reference '#/components/schemas/RealtimeTurnDetection' or document why they must differ.

  • src/libs/tryAGI.OpenAI/openapi.yaml — inline schemas at lines: 24051, 24225, 24400, 24491, 24684, 24772, 24817.
  • src/libs/tryAGI.OpenAI/openapi.yaml — refs present at lines: 24270, 24535, 24726; component RealtimeTurnDetection at 24889.
  • src/libs/tryAGI.OpenAI/openapi.yaml — inline example/instance at 31288 (example can remain if intended).
  • src/libs/tryAGI.OpenAI/openapi.realtime.yaml — occurrences at lines: 138, 433.

type: string
description: "Type of turn detection, `semantic_vad` to turn on Semantic VAD.\n"
description: Server-side semantic turn detection which uses a model to determine when the user has finished speaking.
description: "Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.\n\nServer VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.\n\nSemantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with \"uhhm\", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo: “ether” → “either”.

-      description: "Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
+      description: "Configuration for turn detection, either Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
description: "Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response.\n\nServer VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.\n\nSemantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with \"uhhm\", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.\n"
description: "Configuration for turn detection, either Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response."
🤖 Prompt for AI Agents
In src/libs/tryAGI.OpenAI/openapi.yaml around line 24955, there's a typo using
"ether" where the correct word is "either"; update the text to replace "ether"
with "either" to fix the spelling error while preserving surrounding punctuation
and formatting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants