feat(converters): add base action converter with OAGI implementation#11
Merged
gwynbleidd003 merged 11 commits intoagiopen-org:mainfrom Feb 3, 2026
Merged
Conversation
Add action converters for Claude, Qwen3, and Gemini models to enable remote execution of VLM actions via pyautogui command strings. Key changes: - Add oagi.converters module with BaseActionConverter base class - Add ClaudeActionConverter (XGA 1024x768 coordinate space) - Add Qwen3ActionConverter (0-999 coordinate space) - Add GeminiActionConverter (0-1000 coordinate space) - Add OagiActionConverter (0-1000 coordinate space) - Extract shared utilities to oagi.handler.utils: - CoordinateScaler class for coordinate transformation - KEY_MAP and PYAUTOGUI_VALID_KEYS constants - normalize_key(), parse_hotkey(), validate_keys() functions - Refactor PyautoguiActionHandler to use shared utilities The converters generate pyautogui command strings that can be: 1. Executed locally via PyautoguiActionHandler 2. Sent to remote sandbox via runtime API (action_string_to_step) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove model-specific converters (claude, gemini, qwen3, models) - Keep BaseActionConverter for third-party inheritance - Keep OagiActionConverter as reference implementation - Add comprehensive test suite for OagiActionConverter - Update exports in __init__.py files Third parties can now create custom converters by inheriting from BaseActionConverter and implementing the required abstract methods. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add optional strict_coordinate_validation config option (default: False). When enabled, coordinates outside valid range [0, source_width/height] will raise ValueError instead of being clamped. This helps surface model output issues during training/debugging. Default behavior (clamp) remains unchanged for backwards compatibility. Usage: config = ConverterConfig(strict_coordinate_validation=True) converter = OagiActionConverter(config=config) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reduce KEY_MAP to minimal mappings matching original PyautoguiActionHandler.hotkey_variations_mapping: - caps_lock, caps -> capslock - page_up -> pageup - page_down -> pagedown Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update KEY_MAP to normalize page keys to short forms (pgup/pgdn) matching original PyautoguiActionHandler.hotkey_variations_mapping: - page_up, pageup -> pgup - page_down, pagedown -> pgdn Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Match the original PyautoguiActionHandler behavior exactly. The original used int() (truncation) while the new CoordinateScaler used round() (rounding to nearest). This could cause 1-pixel differences in some edge cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using round() instead of int() provides more accurate coordinate transformation by rounding to the nearest pixel rather than truncating. This is a minor improvement over the original PyautoguiActionHandler behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove redundant is_last tracking from converter return type. Analysis showed sandbox-platform ignores this value and recalculates based on index position. Changes: - BaseActionConverter.__call__() now returns list[str] instead of list[tuple[str, bool]] - OagiActionConverter._convert_action() simplified to just repeat commands without is_last tracking - Updated all tests to match new return type - Updated docstrings and examples This simplifies the API while maintaining full compatibility with existing consumers that already ignored the is_last value. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add action converter framework with base class for third-party extension and OAGI reference implementation.
Key Changes
BaseActionConverter[T]- Abstract base class with shared functionality:scale_coordinate()normalize_key(),parse_hotkey(),validate_keys()action_string_to_step()for runtime API format conversionOagiActionConverter- Reference implementation for OAGI actions (0-1000 coordinate space)ConverterConfig- Unified configuration dataclassShared utilities in
oagi.handler.utils- Reusable by both handler and convertersUsage
Creating Custom Converters
Third parties can create custom converters by inheriting from
BaseActionConverter:Test plan
from oagi.converters import OagiActionConverter, BaseActionConverter, ConverterConfig🤖 Generated with Claude Code