feat(tracking): Implement SimpleElementTracker with optimal matching#27
Draft
feat(tracking): Implement SimpleElementTracker with optimal matching#27
Conversation
Replaces the placeholder _match_elements method in SimpleElementTracker with an implementation based on scipy.optimize.linear_sum_assignment. - Matches elements based on type and squared Euclidean distance between centers. - Applies configured distance threshold and type constraints to the cost matrix. - Basic unit tests in test_tracking.py pass with this implementation. Completes the core logic for SimpleElementTracker as specified in Issue #8.
Enhances the `_match_elements` method in `SimpleElementTracker` to improve matching robustness. - Incorporates a check for relative width and height differences between elements, using a configurable `size_rel_threshold`. This adds size similarity as a constraint alongside type and proximity. - Cost matrix for `linear_sum_assignment` now penalizes matches where elements differ significantly in size. - Removes defensive try/except blocks for `scipy`/`types` imports, assuming dependencies are met. Basic unit tests in `test_tracking.py` continue to pass with this refined matching logic.
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
omnimcp/tracking.py:122
- The current structure first continues when the cost exceeds the threshold and then applies the distance constraint again, which can bypass the subsequent size constraint. Consider consolidating these checks so that the size constraint is consistently applied before deciding to mark a match as invalid.
elif cost_matrix[i, j] > self.match_threshold_sq:
omnimcp/tracking.py:114
- [nitpick] Consider using float('inf') instead of the magic number 1e8 for infinity_cost to improve clarity and reduce potential edge case issues.
infinity_cost = 1e8 # Use a large number for invalid assignments
- Updates mocks in test_core.py to return the correct LLMAnalysisAndDecision structure expected by the refactored plan_action_for_ui. - Corrects assertion strings checking for prompt headings to exactly match the PROMPT_TEMPLATE content. Resolves test failures introduced during planner refactoring. All tests now pass.
There was a problem hiding this comment.
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
omnimcp/tracking.py:118
- The condition 'elif cost_matrix[i, j] > self.match_threshold_sq:' is redundant because a similar check is performed immediately after, causing the size constraint branch to be unreachable for those indices. Consider removing this duplicate check to ensure that the size constraint is evaluated correctly for all valid assignments.
elif cost_matrix[i, j] > self.match_threshold_sq:
Implements the SimpleElementTracker (Issue #8) and integrates it throughout the perception-planning-execution loop to provide temporal context. Refactors the planner (core.py) to utilize tracking context and output a structured ActionDecision (aligning with Issue #26), improving maintainability by using PydanticPrompt for schema generation instead of hardcoding in the prompt. Key Changes: Core Functionality: - Add SimpleElementTracker (tracking.py) with scipy-based matching (type, proximity, size). - Integrate tracker into VisualState.update (visual_state.py). - Refactor plan_action_for_ui (core.py) to accept tracking_info, update prompt template to use tracking and generated schemas, call LLM expecting LLMAnalysisAndDecision, and return ActionDecision directly. - Refactor AgentExecutor (agent_executor.py) to pass tracking_info to planner, handle ActionDecision return type, add wait/finish action handlers, and update existing handlers. Data Structures & Schemas (types.py): - Add ElementTrack, ScreenAnalysis, ActionDecision, LLMAnalysisAndDecision, LoggedStep Pydantic models. - Add PydanticPrompt decorator and docstrings to generate LLM schema documentation. Observability (agent_executor.py, utils.py, __init__.py): - Implement structured metrics collection saved to run_metrics.json. - Implement structured step logging (LoggedStep format) saved to run_log.jsonl. - Refactor setup_run_logging function into utils.py. Testing (tests/): - Add unit tests for SimpleElementTracker (test_tracking.py). - Update tests for core, agent_executor, visual_state to reflect refactoring and fix issues. All tests pass. Environment & Dependencies: - Update Python requirement to >=3.11 in pyproject.toml. - Add pydantic-prompt, scipy, numpy dependencies. - Update uv.lock. - Update CI workflow for Python 3.11/3.12.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrates the
SimpleElementTrackerto provide temporal context to the planning LLM, aiming to improve state awareness and reliability, as outlined in Issue #8. This PR includes the tracker implementation, integration into the perception and planning pipeline, associated data models, updated tests, and metrics/logging infrastructure.Resolves: #8 (Implements core tracking, integration, and planner update)
Relates to: #26 (Defines
ActionDecisionstructure used internally by the planner)Key Changes:
omnimcp/types.py):ElementTrack,ScreenAnalysis,ActionDecision,LLMAnalysisAndDecision,LoggedStep.omnimcp/tracking.py):SimpleElementTrackerclass usingscipy.optimize.linear_sum_assignmentfor optimal matching based on elementtype, center proximity, and relative size similarity.updatemethod for track lifecycle management.omnimcp/visual_state.py):SimpleElementTrackerwithinVisualState.VisualState.updateto call the tracker and expose thetracked_elements_view: List[ElementTrack].omnimcp/core.py):plan_action_for_uito accepttracking_info: List[ElementTrack].PROMPT_TEMPLATEto include tracked element context and request JSON output withScreenAnalysisandActionDecision.call_llm_apito expect theLLMAnalysisAndDecisionmodel.ActionDecisionback toLLMActionPlanfor compatibility with currentAgentExecutorhandlers.omnimcp/agent_executor.py):PerceptionInterfaceandPlannerCallabletype hints.tracked_elements_viewfrom perception and passes it to the planner.run_metrics.json).LoggedStep(run_log.jsonl).tests/):SimpleElementTracker(test_tracking.py).core,visual_state, andagent_executorto reflect changes and ensure they pass.numpyandscipy.Current Status:
VisualState.core.py) updated to use tracking data & new LLM I/O (with temp conversion).AgentExecutorupdated for data flow & logging.Next Steps:
mainbranch (or a dedicated testing branch) usingcli.pywith real tasks (e.g., calculator).run.log,run_log.jsonl) and metrics (run_metrics.json) from testing.AgentExecutorhandlers to directly useActionDecision(removing the temporary conversion incore.py).Testing Done:
tracking,core,visual_state, andagent_executorpass.