Skip to content

feat(converters): add base action converter with OAGI implementation#11

Merged
gwynbleidd003 merged 11 commits intoagiopen-org:mainfrom
aoshen524:feat/action-converters
Feb 3, 2026
Merged

feat(converters): add base action converter with OAGI implementation#11
gwynbleidd003 merged 11 commits intoagiopen-org:mainfrom
aoshen524:feat/action-converters

Conversation

@aoshen524
Copy link
Contributor

@aoshen524 aoshen524 commented Jan 30, 2026

Summary

Add action converter framework with base class for third-party extension and OAGI reference implementation.

Key Changes

  • BaseActionConverter[T] - Abstract base class with shared functionality:

    • Coordinate scaling via scale_coordinate()
    • Key normalization via normalize_key(), parse_hotkey(), validate_keys()
    • action_string_to_step() for runtime API format conversion
  • OagiActionConverter - Reference implementation for OAGI actions (0-1000 coordinate space)

  • ConverterConfig - Unified configuration dataclass

  • Shared utilities in oagi.handler.utils - Reusable by both handler and converters

Usage

from oagi.converters import OagiActionConverter, ConverterConfig

config = ConverterConfig(sandbox_width=1920, sandbox_height=1080)
converter = OagiActionConverter(config=config)

# Convert OAGI actions to pyautogui strings
result = converter(actions)  # list[tuple[str, bool]]

# Convert to runtime API steps
for cmd, is_last in result:
    step = converter.action_string_to_step(cmd)

Creating Custom Converters

Third parties can create custom converters by inheriting from BaseActionConverter:

from oagi.converters import BaseActionConverter, ConverterConfig

class MyModelConverter(BaseActionConverter[MyAction]):
    @property
    def coord_width(self) -> int:
        return 1024  # Your model's coordinate width

    @property
    def coord_height(self) -> int:
        return 768  # Your model's coordinate height

    def _convert_single_action(self, action: MyAction) -> list[str]:
        # Convert action to pyautogui command strings
        ...

    def serialize_actions(self, actions: list[MyAction]) -> list[dict]:
        # Serialize actions for trajectory logging
        ...

Test plan

  • Imports work: from oagi.converters import OagiActionConverter, BaseActionConverter, ConverterConfig
  • 18 unit tests for OagiActionConverter (coordinate actions, drag, hotkey, type, scroll, wait, finish, action_string_to_step, multiple actions)
  • Base class properly exported for inheritance

🤖 Generated with Claude Code

Add action converters for Claude, Qwen3, and Gemini models to enable
remote execution of VLM actions via pyautogui command strings.

Key changes:
- Add oagi.converters module with BaseActionConverter base class
- Add ClaudeActionConverter (XGA 1024x768 coordinate space)
- Add Qwen3ActionConverter (0-999 coordinate space)
- Add GeminiActionConverter (0-1000 coordinate space)
- Add OagiActionConverter (0-1000 coordinate space)
- Extract shared utilities to oagi.handler.utils:
  - CoordinateScaler class for coordinate transformation
  - KEY_MAP and PYAUTOGUI_VALID_KEYS constants
  - normalize_key(), parse_hotkey(), validate_keys() functions
- Refactor PyautoguiActionHandler to use shared utilities

The converters generate pyautogui command strings that can be:
1. Executed locally via PyautoguiActionHandler
2. Sent to remote sandbox via runtime API (action_string_to_step)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@aoshen524 aoshen524 marked this pull request as draft January 30, 2026 09:15
- Remove model-specific converters (claude, gemini, qwen3, models)
- Keep BaseActionConverter for third-party inheritance
- Keep OagiActionConverter as reference implementation
- Add comprehensive test suite for OagiActionConverter
- Update exports in __init__.py files

Third parties can now create custom converters by inheriting from
BaseActionConverter and implementing the required abstract methods.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@aoshen524 aoshen524 changed the title feat(converters): add multi-model action converters feat(converters): add base action converter with OAGI implementation Feb 2, 2026
@aoshen524 aoshen524 marked this pull request as ready for review February 3, 2026 01:46
aoshen524 and others added 9 commits February 3, 2026 02:42
Add optional strict_coordinate_validation config option (default: False).

When enabled, coordinates outside valid range [0, source_width/height]
will raise ValueError instead of being clamped. This helps surface
model output issues during training/debugging.

Default behavior (clamp) remains unchanged for backwards compatibility.

Usage:
  config = ConverterConfig(strict_coordinate_validation=True)
  converter = OagiActionConverter(config=config)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reduce KEY_MAP to minimal mappings matching original
PyautoguiActionHandler.hotkey_variations_mapping:
- caps_lock, caps -> capslock
- page_up -> pageup
- page_down -> pagedown

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update KEY_MAP to normalize page keys to short forms (pgup/pgdn)
matching original PyautoguiActionHandler.hotkey_variations_mapping:
- page_up, pageup -> pgup
- page_down, pagedown -> pgdn

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Match the original PyautoguiActionHandler behavior exactly.
The original used int() (truncation) while the new CoordinateScaler
used round() (rounding to nearest). This could cause 1-pixel
differences in some edge cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using round() instead of int() provides more accurate coordinate
transformation by rounding to the nearest pixel rather than truncating.
This is a minor improvement over the original PyautoguiActionHandler
behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove redundant is_last tracking from converter return type.
Analysis showed sandbox-platform ignores this value and recalculates
based on index position.

Changes:
- BaseActionConverter.__call__() now returns list[str] instead of
  list[tuple[str, bool]]
- OagiActionConverter._convert_action() simplified to just repeat
  commands without is_last tracking
- Updated all tests to match new return type
- Updated docstrings and examples

This simplifies the API while maintaining full compatibility with
existing consumers that already ignored the is_last value.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gwynbleidd003 gwynbleidd003 merged commit 98ccbab into agiopen-org:main Feb 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments