Skip to content

more openAI compatibility, tools support etc.#12

Closed
hyorman wants to merge 5 commits intolutzleonhardt:masterfrom
hyorman:some-improvements
Closed

more openAI compatibility, tools support etc.#12
hyorman wants to merge 5 commits intolutzleonhardt:masterfrom
hyorman:some-improvements

Conversation

@hyorman
Copy link

@hyorman hyorman commented Jan 28, 2026

PR Type

Enhancement


Description

  • Implements complete OpenAI Assistants API with async generator-based run execution engine supporting streaming and tool calling

  • Adds comprehensive Express routes for CRUD operations on assistants, threads, messages, and runs with SSE streaming support

  • Extends OpenAI API compatibility with /v1/models, /v1/responses, and enhanced /v1/chat/completions endpoints featuring tool calling

  • Implements prompt-based tool calling utilities with XML marker parsing and ToolCallBuffer for streaming support (VS Code LM API lacks native function calling)

  • Adds in-memory state management with debounced persistence to VS Code globalState for assistants, threads, messages, runs, and run steps

  • Integrates state persistence and model discovery in extension with auto-start server and model listing command

  • Creates complete web UI with Flask proxy server, Python streaming client, and interactive chat application with persistent message storage

  • Defines comprehensive TypeScript types for Assistants API, tool calling, and new response formats

  • Updates package configuration to require VS Code ^1.95.0 and refactors command IDs to camelCase


Diagram Walkthrough

flowchart LR
  VSCode["VS Code Extension"]
  Server["Express Server"]
  AssistantsAPI["Assistants API<br/>Routes & Runner"]
  ToolUtils["Tool Calling<br/>Utilities"]
  State["State<br/>Management"]
  WebUI["Web UI<br/>Flask + Chat App"]
  
  VSCode -->|"Persistence & Models"| Server
  Server -->|"Mount Routes"| AssistantsAPI
  AssistantsAPI -->|"Use Tools"| ToolUtils
  AssistantsAPI -->|"Manage State"| State
  WebUI -->|"Proxy Requests"| Server
  State -->|"Debounced Save"| VSCode
Loading

File Walkthrough

Relevant files
Enhancement
14 files
runner.ts
Run execution engine with streaming and tool support         

src/assistants/runner.ts

  • Implements async generator-based run execution engine with streaming
    and non-streaming modes
  • Supports tool calling with prompt-based parsing and validation against
    available tools
  • Handles run cancellation, step tracking, and state management
    throughout execution
  • Implements continuation logic for runs after tool outputs are
    submitted
+933/-0 
routes.ts
OpenAI Assistants API Express routes                                         

src/assistants/routes.ts

  • Implements complete Express routes for OpenAI Assistants API endpoints
  • Supports CRUD operations for assistants, threads, messages, and runs
  • Handles streaming and non-streaming run execution with SSE
  • Implements tool output submission and run cancellation endpoints
+762/-0 
server.ts
Extended OpenAI API compatibility with tools and responses

src/server.ts

  • Adds /v1/models endpoints for listing and retrieving available models
  • Implements /v1/embeddings stub endpoint returning 501 Not Implemented
  • Adds legacy /v1/completions endpoint wrapping to chat completions
  • Implements new /v1/responses API with tool calling support and
    streaming
  • Enhances /v1/chat/completions with tool calling and function
    definitions
  • Mounts assistants router and adds health check and 404 handler
+631/-6 
state.ts
In-memory state management with persistence                           

src/assistants/state.ts

  • Implements in-memory state management for assistants, threads,
    messages, runs, and run steps
  • Provides debounced persistence callback for VS Code globalState
    integration
  • Supports serialization/deserialization for state restoration
  • Manages pending tool contexts for runs awaiting tool outputs
+434/-0 
types.ts
OpenAI Assistants API TypeScript type definitions               

src/assistants/types.ts

  • Defines comprehensive TypeScript types for OpenAI Assistants API
  • Includes types for assistants, threads, messages, runs, and run steps
  • Defines tool calling types and streaming event types
  • Provides types for future extensibility (code_interpreter,
    file_search)
+379/-0 
extension.ts
State persistence and model discovery integration               

src/extension.ts

  • Adds state persistence using VS Code globalState with debounced saves
  • Implements getAvailableModels() function to query VS Code Language
    Model API
  • Auto-starts server on extension activation
  • Adds command to list available LLM models via VS Code picker
  • Improves system message handling by prepending to first user message
+120/-13
types.ts
Extended type definitions for tools and new APIs                 

src/types.ts

  • Adds tool/function calling types (FunctionTool, ToolCall)
  • Implements legacy completions API types for backward compatibility
  • Adds embeddings API stub types
  • Implements new Responses API types with tool calling support
  • Extends ChatCompletionRequest with tool parameters
+238/-3 
index.ts
Assistants API module exports                                                       

src/assistants/index.ts

  • Creates module barrel export for assistants API functionality
  • Exports types, state management, run execution, tool utilities, and
    routes
+31/-0   
tools.ts
Tool calling utilities for VS Code LM API                               

src/assistants/tools.ts

  • Implements prompt-based tool calling utilities for VS Code LM API
    which lacks native function calling support
  • Provides tool definition formatting for system prompt injection with
    detailed parameter documentation
  • Includes tool call parsing from model output using XML markers and
    JSON extraction
  • Implements ToolCallBuffer class for streaming support to detect
    complete tool calls before parsing
+240/-0 
copilot_proxy.py
Python streaming client for chat completions                         

client/copilot_proxy.py

  • Creates a Python client that streams chat completions from the local
    API server
  • Implements SSE (Server-Sent Events) parsing with proper handling of
    data: prefixed lines
  • Handles JSON decoding errors gracefully and outputs streamed content
    fragments in real-time
+60/-0   
server.py
Flask proxy server for API requests                                           

client/server.py

  • Implements Flask web server that proxies requests to the underlying
    API at http://localhost:3000/v1
  • Provides /api/chat endpoint to forward chat completion requests with
    proper error handling
  • Provides /api/models endpoint to fetch and return available models
    from upstream API
  • Serves static web files from the web directory
+41/-0   
app.js
Interactive chat application frontend                                       

client/web/app.js

  • Implements interactive chat UI with persistent message storage using
    localStorage
  • Dynamically loads available models from /api/models endpoint with
    fallback options
  • Handles form submission with Enter key support (Shift+Enter for
    newlines)
  • Parses various API response formats and displays assistant responses
    with error handling
+150/-0 
styles.css
Dark theme styling for chat interface                                       

client/web/styles.css

  • Defines dark theme CSS variables for background, panel, text, and
    accent colors
  • Implements responsive chat UI layout with flexbox for header, messages
    container, and form
  • Styles message bubbles with different backgrounds for user vs AI
    messages
  • Applies gradient backgrounds and subtle borders for modern dark theme
    appearance
+22/-0   
index.html
HTML structure for chat application                                           

client/web/index.html

  • Creates HTML structure for chat application with header containing
    title, model selector, and new chat button
  • Implements messages container section for displaying conversation
    history
  • Includes chat form with textarea for user input and send button
  • Links external CSS and JavaScript files for styling and functionality
+30/-0   
Documentation
1 files
README_WEB.md
Web UI setup and usage documentation                                         

client/README_WEB.md

  • Adds documentation for minimal Flask web UI server
  • Provides setup instructions for Python virtual environment and
    dependencies
  • Explains how to configure API endpoint and access the chat interface
+22/-0   
Configuration changes
2 files
package.json
Package configuration and metadata updates                             

package.json

  • Updates version from 1.0.2 to 1.0.4 and improves package metadata with
    displayName and readme fields
  • Changes description to reflect OpenAI compatibility focus
  • Updates VS Code engine requirement from ^1.70.0 to ^1.95.0
  • Refactors command IDs to use camelCase (copilotProxy.startServer,
    copilotProxy.stopServer, etc.) and adds new listModels command
  • Simplifies activationEvents to use wildcard * and removes specific
    command activation events
  • Updates build script to use @vscode/vsce package instead of standalone
    vsce dependency
+22/-19 
.vscodeignore
VS Code extension packaging exclusions                                     

.vscodeignore

  • Defines files and directories to exclude from VS Code extension
    package
  • Excludes Python cache, source files, tests, and configuration files
  • Preserves node_modules directory in the packaged extension
+22/-0   
Dependencies
1 files
requirements-web.txt
Python dependencies for web server                                             

client/requirements-web.txt

  • Specifies Python dependencies for the web server: Flask>=2.0 and
    requests>=2.25
+2/-0     
Additional files
7 files
plan-rest-api.md +0/-47   
plan-ui-settings.md +0/-183 
plan-use-vscode-llm.md +0/-81   
plan-vscode-extension.md +0/-130 
client.py +0/-34   
requirements.txt +0/-1     
run_client.sh +0/-5     

Halil Yasin Orman and others added 2 commits January 27, 2026 20:00
Copilot AI review requested due to automatic review settings January 28, 2026 21:54
@qodo-code-review
Copy link

qodo-code-review bot commented Jan 28, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Missing authentication

Description: The new OpenAI-compatible HTTP API endpoints (e.g., /v1/models, /v1/responses,
/v1/chat/completions, and mounted assistants routes) are exposed without any
authentication/authorization checks, enabling any network-reachable client to
create/read/update/delete assistants/threads/messages/runs and trigger model executions,
which is a realistic security risk if the server is not strictly bound to localhost or is
reachable via port-forwarding.
server.ts [54-681]

Referred Code
app.get('/v1/models', async (req: Request, res: Response) => {
  try {
    const models = await getAvailableModels();
    const response: ModelsListResponse = {
      object: 'list',
      data: models.map(m => ({
        id: m.family,
        object: 'model' as const,
        created: Math.floor(Date.now() / 1000),
        owned_by: m.vendor
      }))
    };
    res.json(response);
  } catch (error) {
    console.error('Error listing models:', error);
    res.status(500).json(errorResponse('Failed to list models', 'server_error'));
  }
});

// GET /v1/models/:model - Get specific model
app.get('/v1/models/:model', async (req: Request, res: Response) => {


 ... (clipped 607 lines)
DoS via SSE

Description: The newly added Assistants API CRUD and run execution endpoints (including SSE streaming
at /v1/threads/:thread_id/runs and
/v1/threads/:thread_id/runs/:run_id/submit_tool_outputs) accept untrusted client input and
can be invoked repeatedly without throttling/limits, creating a realistic
denial-of-service vector via many concurrent long-lived SSE connections and repeated run
creation/execution.
routes.ts [87-760]

Referred Code
router.post('/v1/assistants', (req: Request, res: Response) => {
  const body = req.body as CreateAssistantRequest;

  const validationError = validateRequired(body, ['model']);
  if (validationError) {
    return res.status(400).json(errorResponse(validationError, 'invalid_request_error', 'model'));
  }

  const assistant: Assistant = {
    id: state.generateAssistantId(),
    object: 'assistant',
    created_at: Math.floor(Date.now() / 1000),
    name: body.name ?? null,
    description: body.description ?? null,
    model: body.model,
    instructions: body.instructions ?? null,
    tools: body.tools ?? [],
    metadata: body.metadata ?? {}
  };

  state.createAssistant(assistant);


 ... (clipped 653 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🔴
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: Critical CRUD actions over assistants/threads/messages/runs are performed without any
audit logging that includes actor identity, timestamped action context, and outcome.

Referred Code
// Create assistant
router.post('/v1/assistants', (req: Request, res: Response) => {
  const body = req.body as CreateAssistantRequest;

  const validationError = validateRequired(body, ['model']);
  if (validationError) {
    return res.status(400).json(errorResponse(validationError, 'invalid_request_error', 'model'));
  }

  const assistant: Assistant = {
    id: state.generateAssistantId(),
    object: 'assistant',
    created_at: Math.floor(Date.now() / 1000),
    name: body.name ?? null,
    description: body.description ?? null,
    model: body.model,
    instructions: body.instructions ?? null,
    tools: body.tools ?? [],
    metadata: body.metadata ?? {}
  };



 ... (clipped 614 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing input validation: External inputs are used without robust validation (e.g., prompt normalization allows
undefined and tool-call JSON parsing swallows parse errors), which can cause undefined
behavior and opaque failures.

Referred Code
app.post<{}, {}, CompletionRequest>('/v1/completions', async (req: Request, res: Response) => {
  const { model, prompt, stream, ...rest } = req.body;

  // Normalize prompt to string
  const promptText = Array.isArray(prompt) ? prompt.join('\n') : prompt;

  // Remove vendor prefixes
  const cleanModel = model.split('/').pop()!;

  // Convert to chat completion request
  const chatRequest: ChatCompletionRequest = {
    model: cleanModel,
    messages: [{ role: 'user', content: promptText }],
    stream: stream ?? false
  };

  if (stream) {
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');



 ... (clipped 176 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Leaks internal errors: Streaming error events and run last_error propagate error.message to clients, potentially
exposing internal implementation details instead of a generic user-facing message.

Referred Code
} catch (error) {
  console.error('Run execution error:', error);
  state.updateRun(threadId, runId, {
    status: 'failed',
    failed_at: Math.floor(Date.now() / 1000),
    last_error: {
      code: 'server_error',
      message: error instanceof Error ? error.message : 'Unknown error'
    }
  });

  if (streaming) {
    yield createEvent('error', {
      error: {
        message: error instanceof Error ? error.message : 'Unknown error',
        code: 'server_error'
      }
    });
    yield createEvent('thread.run.failed', state.getRun(threadId, runId));
    yield createEvent('done', '[DONE]');
  }

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
No authz checks: The newly added public CRUD endpoints (including destructive operations and run
execution/cancellation) implement no authentication/authorization or caller identity
checks, allowing any caller to read/write/delete state.

Referred Code
// Create assistant
router.post('/v1/assistants', (req: Request, res: Response) => {
  const body = req.body as CreateAssistantRequest;

  const validationError = validateRequired(body, ['model']);
  if (validationError) {
    return res.status(400).json(errorResponse(validationError, 'invalid_request_error', 'model'));
  }

  const assistant: Assistant = {
    id: state.generateAssistantId(),
    object: 'assistant',
    created_at: Math.floor(Date.now() / 1000),
    name: body.name ?? null,
    description: body.description ?? null,
    model: body.model,
    instructions: body.instructions ?? null,
    tools: body.tools ?? [],
    metadata: body.metadata ?? {}
  };



 ... (clipped 407 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic identifiers: Several generic names (e.g., body, rest, parsed, calls, match) reduce clarity in complex
parsing/translation logic and may warrant refactoring for readability.

Referred Code
// POST /v1/completions - Wrap as chat completion
app.post<{}, {}, CompletionRequest>('/v1/completions', async (req: Request, res: Response) => {
  const { model, prompt, stream, ...rest } = req.body;

  // Normalize prompt to string
  const promptText = Array.isArray(prompt) ? prompt.join('\n') : prompt;

  // Remove vendor prefixes
  const cleanModel = model.split('/').pop()!;

  // Convert to chat completion request
  const chatRequest: ChatCompletionRequest = {
    model: cleanModel,
    messages: [{ role: 'user', content: promptText }],
    stream: stream ?? false
  };

  if (stream) {
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');


 ... (clipped 425 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured error logs: The new code logs raw error objects via console.error(...)/console.warn(...) without
structured logging or redaction, which may leak sensitive request or model data depending
on upstream error contents.

Referred Code
  } catch (error) {
    console.error('Error listing models:', error);
    res.status(500).json(errorResponse('Failed to list models', 'server_error'));
  }
});

// GET /v1/models/:model - Get specific model
app.get('/v1/models/:model', async (req: Request, res: Response) => {
  try {
    const models = await getAvailableModels();
    const model = models.find(m => m.family === req.params.model);

    if (!model) {
      return res.status(404).json(
        errorResponse(`Model '${req.params.model}' not found`, 'invalid_request_error', 'model', 'model_not_found')
      );
    }

    const response: ModelObject = {
      id: model.family,
      object: 'model',


 ... (clipped 427 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Jan 28, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Implement streaming logic for tool continuations

Implement the missing streaming logic in the continueRunWithToolOutputs function
to correctly handle async iterators from processChatRequest and prevent a
runtime error.

src/assistants/runner.ts [742-760]

-// Non-streaming continuation (for now - streaming follows same pattern as executeRun)
-const response = await processChatRequest(request) as any;
+let fullContent = '';
+let promptTokens = 0;
+let completionTokens = 0;
 
-const responseContent = response.choices[0]?.message?.content;
-if (!responseContent) {
-  state.updateRun(threadId, runId, {
-    status: 'failed',
-    failed_at: Math.floor(Date.now() / 1000),
-    last_error: { code: 'server_error', message: 'Empty response from model' }
-  });
-  return;
+if (streaming) {
+  // Create message in progress
+  const assistantMessage: Message = {
+    id: messageId,
+    object: 'thread.message',
+    created_at: Math.floor(Date.now() / 1000),
+    thread_id: threadId,
+    status: 'in_progress',
+    role: 'assistant',
+    content: [{ type: 'text', text: { value: '', annotations: [] } }],
+    assistant_id: assistant.id,
+    run_id: runId,
+    // ... other fields
+  };
+  state.addMessage(threadId, assistantMessage);
+  yield createEvent('thread.message.created', assistantMessage);
+  yield createEvent('thread.message.in_progress', assistantMessage);
+
+  const streamIterator = await processChatRequest(request) as AsyncIterable<ChatCompletionChunk>;
+  let deltaIndex = 0;
+  for await (const chunk of streamIterator) {
+    // (Add cancellation check here)
+    const content = chunk.choices[0]?.delta?.content ?? '';
+    if (content) {
+      fullContent += content;
+      const delta: MessageDelta = {
+        id: messageId,
+        object: 'thread.message.delta',
+        delta: { content: [{ index: deltaIndex++, type: 'text', text: { value: content } }] }
+      };
+      yield createEvent('thread.message.delta', delta);
+    }
+  }
+  // Rough token estimation for streaming
+  completionTokens = fullContent.length;
+  promptTokens = chatMessages.reduce((sum, m) => sum + (typeof m.content === 'string' ? m.content.length : 0), 0);
+
+} else {
+  // Non-streaming continuation
+  const response = await processChatRequest(request) as any;
+
+  const responseContent = response.choices[0]?.message?.content;
+  if (!responseContent) {
+    state.updateRun(threadId, runId, {
+      status: 'failed',
+      failed_at: Math.floor(Date.now() / 1000),
+      last_error: { code: 'server_error', message: 'Empty response from model' }
+    });
+    return;
+  }
+
+  fullContent = typeof responseContent === 'string' 
+    ? responseContent 
+    : JSON.stringify(responseContent);
+
+  promptTokens = response.usage?.prompt_tokens ?? 0;
+  completionTokens = response.usage?.completion_tokens ?? fullContent.length;
 }
 
-fullContent = typeof responseContent === 'string' 
-  ? responseContent 
-  : JSON.stringify(responseContent);
-
-promptTokens = response.usage?.prompt_tokens ?? 0;
-completionTokens = response.usage?.completion_tokens ?? fullContent.length;
-
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a significant bug where the streaming path in continueRunWithToolOutputs is unimplemented, which would cause a runtime error, and it provides a correct implementation pattern.

High
Flush buffer on complete

Modify the ToolCallBuffer.append method to return the buffered content as
safeContent when tool calls are complete, preventing data loss and ensuring text
is correctly flushed.

src/assistants/tools.ts [200-216]

 append(chunk: string): { safeContent: string; complete: boolean } {
   this.content += chunk;
   const openCount = (this.content.match(/<tool_call>/gi) || []).length;
   const closeCount = (this.content.match(/<\/tool_call>/gi) || []).length;
   this.inToolCall = openCount > closeCount;
   if (!this.inToolCall && openCount === closeCount) {
-    return { safeContent: '', complete: true };
+    const safe = this.content;
+    this.reset();
+    return { safeContent: safe, complete: true };
   }
   return { safeContent: '', complete: false };
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion fixes a critical bug in the ToolCallBuffer where it would drop all non-tool-call content, leading to empty responses. The fix ensures all content is correctly processed.

High
Add null check to prevent crash

Add a null/undefined check at the beginning of extractMessageContent to prevent
a potential TypeError when processing messages that have no content.

src/extension.ts [168-175]

-function extractMessageContent(content: string | StructuredMessageContent[]): string {
+function extractMessageContent(content: string | StructuredMessageContent[] | null | undefined): string {
+  if (content === null || content === undefined) {
+    return '';
+  }
   if (typeof content === 'string') {
     return content;
   }
   return content.map(item => item.text).join('\n');
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the extractMessageContent function will crash if passed null or undefined content, which is a valid scenario, and provides a simple fix to prevent the error.

Medium
Fix multi-line SSE data parsing

Correct the Server-Sent Events (SSE) parsing logic to properly handle fragmented
messages by concatenating only the content of data: lines, preventing JSON
decoding errors.

client/copilot_proxy.py [53-55]

 elif line.startswith("data:"):
-    # append JSON after "data:"
-    buffer += (line + "\n")
+    # append JSON part after "data:"
+    buffer += line[len("data:"):].strip()
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion fixes a significant bug in the SSE parsing logic that would cause JSON decoding to fail for fragmented messages, making the client more robust.

Medium
Remove racy run cancellation timeout

Remove the setTimeout block from the run cancellation route handler to eliminate
a race condition and rely on the executeRun function for the final state
transition to cancelled.

src/assistants/routes.ts [504-512]

-// After a short delay, mark as cancelled
-setTimeout(() => {
-  const currentRun = state.getRun(thread_id, run_id);
-  if (currentRun?.status === 'cancelling') {
-    state.updateRun(thread_id, run_id, { status: 'cancelled' });
-  }
-}, 100);
-
 res.json(updated);
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a race condition in the run cancellation logic and proposes removing the unreliable setTimeout to rely on the robust state handling within the run executor.

Medium
Align tool choice with spec

Align the tool_choice object shape in CreateResponseRequest with the OpenAI
specification by nesting the function name within a function object, matching
the ChatCompletionRequest interface.

src/types.ts [204-218]

 export interface CreateResponseRequest {
   model: string;
   input: string | ResponseInputItem[];
   instructions?: string;
   stream?: boolean;
   temperature?: number;
   max_output_tokens?: number;
   top_p?: number;
   store?: boolean;
   metadata?: Record<string, string>;
   tools?: FunctionTool[];
-  tool_choice?: 'none' | 'auto' | 'required' | { type: 'function'; name: string };
+  tool_choice?: 'none' | 'auto' | 'required' | { type: 'function'; function: { name: string } };
   previous_response_id?: string;
   parallel_tool_calls?: boolean;
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies an inconsistency between the CreateResponseRequest and ChatCompletionRequest interfaces, which would likely cause validation errors and bugs.

Medium
General
Use object for tool arguments

Update the ToolCall interface to use Record<string, unknown> for the arguments
field instead of string to improve type safety and avoid double JSON
stringification.

src/types.ts [43-49]

 export interface ToolCall {
   id: string;
   type: 'function';
   function: {
     name: string;
-    arguments: string;
+    arguments: Record<string, unknown>;
   };
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion improves type safety and code clarity by changing arguments from a string to an object, which avoids unnecessary JSON stringification and aligns the type with its actual usage.

Low
Enforce case-sensitive tool call parsing

Remove the case-insensitive (i) flag from the tool call parsing regular
expression to enforce the strict, case-sensitive format specified in the system
prompt.

src/assistants/tools.ts [113]

-const regex = /<tool_call>\s*([\s\S]*?)\s*<\/tool_call>/gi;
+const regex = /<tool_call>\s*([\s\S]*?)\s*<\/tool_call>/g;
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out that the regex should be case-sensitive to match the strict format instructed in the prompt, improving parsing accuracy.

Low
  • Update

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly expands the copilot-proxy extension to provide broader OpenAI API compatibility. It adds support for tools/function calling, implements the Assistants API, and includes legacy completions and models endpoints. The extension now auto-starts the server and persists assistant state.

Changes:

  • Added comprehensive OpenAI-compatible type definitions including tools, embeddings, models, responses, and assistants APIs
  • Implemented full Assistants API with threads, messages, runs, and tool calling via prompt engineering
  • Added legacy completions endpoint, models listing, and embeddings stub
  • Introduced state persistence for assistants with debounced auto-save
  • Changed extension behavior to auto-start server on activation
  • Added web-based chat client with Flask backend

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
src/types.ts Added extensive type definitions for OpenAI API compatibility including tools, assistants, embeddings, and responses
src/server.ts Expanded with models endpoints, completions API, responses API, tool parsing, and assistants router integration
src/extension.ts Added state persistence, auto-start behavior, model listing command, and improved system message handling
src/assistants/*.ts New comprehensive assistants module with types, state management, tool utilities, run execution engine, and routes
package.json Updated version, commands, engine requirements, removed vsce dependency, changed activation events
client/* New web client with HTML/CSS/JS frontend and Flask proxy backend, plus Python streaming client
package-lock.json Deleted (breaks reproducible builds)
docs/specs/archive/* Removed planning documents
Comments suppressed due to low confidence (1)

src/extension.ts:180

  • The extractMessageContent function now handles null and undefined values, but there's a potential issue: when content is an array of StructuredMessageContent, the function returns the concatenated text from items. However, in the types, StructuredMessageContent has a type property, but the function assumes it has a text property. This could lead to undefined being returned for valid structured content that doesn't match the expected shape.
function extractMessageContent(content: string | StructuredMessageContent[] | null | undefined): string {
  if (content === null || content === undefined) {
    return '';
  }
  if (typeof content === 'string') {
    return content;
  }
  if (Array.isArray(content)) {
    return content.map(item => item.text).join('\n');
  }
  return String(content);
}


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/extension.ts Outdated
Comment on lines 88 to 93
// Auto-start the server on extension activation
if (!serverInstance) {
const configPort = vscode.workspace.getConfiguration("copilotProxy").get("port", 3000);
serverInstance = startServer(configPort);
outputChannel.appendLine(`Express server auto-started on port ${configPort}.`);
}
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server now auto-starts on extension activation (lines 89-93), which is a significant behavior change. This could cause issues if the configured port is already in use, potentially blocking the extension from loading. Consider adding error handling and user notification if the server fails to start, or making auto-start optional via configuration.

Copilot uses AI. Check for mistakes.
src/server.ts Outdated
Comment on lines 199 to 238
function parseToolCalls(content: string, tools: FunctionTool[]): { text: string; toolCalls: ResponseFunctionCallItem[] } {
const toolCalls: ResponseFunctionCallItem[] = [];
let remainingText = content;

// Look for JSON-formatted tool calls in the response
// Common patterns: <tool_call>, ```json, or direct JSON objects
const toolCallPatterns = [
/<tool_call>([\s\S]*?)<\/tool_call>/g,
/```(?:json)?\s*\n?({[\s\S]*?"name"[\s\S]*?"arguments"[\s\S]*?})\s*\n?```/g,
/\{\s*"tool_calls?"\s*:\s*\[([\s\S]*?)\]\s*\}/g
];

for (const pattern of toolCallPatterns) {
let match;
while ((match = pattern.exec(content)) !== null) {
try {
let parsed = JSON.parse(match[1] || match[0]);

// Handle both single tool call and array of tool calls
const calls = Array.isArray(parsed) ? parsed : (parsed.tool_calls || [parsed]);

for (const call of calls) {
if (call.name && tools.some(t => t.function.name === call.name)) {
const toolCall: ResponseFunctionCallItem = {
type: 'function_call',
id: generateId('fc'),
call_id: generateId('call'),
name: call.name,
arguments: typeof call.arguments === 'string' ? call.arguments : JSON.stringify(call.arguments || {}),
status: 'completed'
};
toolCalls.push(toolCall);
remainingText = remainingText.replace(match[0], '').trim();
}
}
} catch (e) {
// Not valid JSON, continue
}
}
}
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseToolCalls function uses regular expressions with global flags but doesn't reset lastIndex between iterations or patterns. When using .exec() with global regex, the lastIndex property persists across calls, which can cause the regex to miss matches or behave unexpectedly. Consider using String.prototype.matchAll() or resetting the regex between uses.

Copilot uses AI. Check for mistakes.
src/server.ts Outdated
status: 'completed'
};
toolCalls.push(toolCall);
remainingText = remainingText.replace(match[0], '').trim();
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool parsing logic removes matched tool calls from the content by calling replace(match[0], '') on line 231. However, remainingText is defined once at the start (line 201) and never reassigned when tool calls are found. This means the text content isn't actually being stripped of tool call markers as intended - the replace operation is performed on a variable that's immediately reassigned.

Copilot uses AI. Check for mistakes.
src/server.ts Outdated
}
};
toolCalls.push(toolCall);
remainingText = remainingText.replace(match[0], '').trim();
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseChatToolCalls function has the same issue as parseToolCalls - the remainingText variable at line 523 is initialized but line 550 performs a replace operation that doesn't get assigned back to remainingText. This means tool call markers won't be removed from the returned text content.

Copilot uses AI. Check for mistakes.
Comment on lines 191 to 240
export class ToolCallBuffer {
private content: string = '';
private inToolCall: boolean = false;
private toolCallDepth: number = 0;

/**
* Add content to buffer
* Returns content that can be safely emitted (not part of a tool call)
*/
append(chunk: string): { safeContent: string; complete: boolean } {
this.content += chunk;

// Check for tool call markers
const openCount = (this.content.match(/<tool_call>/gi) || []).length;
const closeCount = (this.content.match(/<\/tool_call>/gi) || []).length;

this.inToolCall = openCount > closeCount;

if (!this.inToolCall && openCount === closeCount) {
// All tool calls are complete (or there are none)
return { safeContent: '', complete: true };
}

// We're in the middle of a tool call, don't emit anything yet
return { safeContent: '', complete: false };
}

/**
* Get the full accumulated content
*/
getContent(): string {
return this.content;
}

/**
* Check if we're currently inside a tool call block
*/
isInToolCall(): boolean {
return this.inToolCall;
}

/**
* Reset the buffer
*/
reset(): void {
this.content = '';
this.inToolCall = false;
this.toolCallDepth = 0;
}
}
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ToolCallBuffer class tracks tool call depth but never uses the toolCallDepth property (line 194). It's incremented and reset but not used in any logic. Either implement proper depth tracking for nested tool calls or remove the unused property to avoid confusion.

Copilot uses AI. Check for mistakes.

// POST /v1/responses - Create a model response (new OpenAI API)
app.post<{}, {}, CreateResponseRequest>('/v1/responses', async (req, res) => {
const { model, input, instructions, stream, temperature, max_output_tokens, metadata, tools, tool_choice } = req.body;
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable tool_choice.

Suggested change
const { model, input, instructions, stream, temperature, max_output_tokens, metadata, tools, tool_choice } = req.body;
const { model, input, instructions, stream, temperature, max_output_tokens, metadata, tools } = req.body;

Copilot uses AI. Check for mistakes.

app.post<{}, {}, ChatCompletionRequest>('/v1/chat/completions', async (req, res) => {
const { model, stream } = req.body;
const { model, stream, tools, tool_choice } = req.body;
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable tool_choice.

Suggested change
const { model, stream, tools, tool_choice } = req.body;
const { model, stream, tools } = req.body;

Copilot uses AI. Check for mistakes.
else if(choices.length && choices[0].delta) content = choices.map(c=>c.delta?.content||'').join('');
else if(data.text) content = data.text;
else content = JSON.stringify(data);
}catch(e){ content = JSON.stringify(data) }
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid automated semicolon insertion (96% of all statements in the enclosing function have an explicit semicolon).

Suggested change
}catch(e){ content = JSON.stringify(data) }
}catch(e){ content = JSON.stringify(data); }

Copilot uses AI. Check for mistakes.
return send_from_directory('web', 'index.html')

@app.route('/api/chat', methods=['POST'])
def api_chat():
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_chat returns tuple of size 2 and tuple of size 3.

Copilot uses AI. Check for mistakes.


@app.route('/api/models', methods=['GET'])
def api_models():
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_models returns tuple of size 2 and tuple of size 3.

Copilot uses AI. Check for mistakes.
@hyorman hyorman closed this Feb 5, 2026
@hyorman hyorman deleted the some-improvements branch February 5, 2026 20:51
@hyorman hyorman restored the some-improvements branch February 5, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant