Skip to content

feat: redesign rai_perception API with tiered structure and improve 3D gripping point detection#750

Merged
Juliaj merged 42 commits intomainfrom
jj/feat/3dpipe_and_usability
Feb 3, 2026
Merged

feat: redesign rai_perception API with tiered structure and improve 3D gripping point detection#750
Juliaj merged 42 commits intomainfrom
jj/feat/3dpipe_and_usability

Conversation

@Juliaj
Copy link
Collaborator

@Juliaj Juliaj commented Jan 8, 2026

Purpose

Proposed Changes

  • Code refactor based on Rethinking Usability and API Usability Design Considerations.

  • There are breaking changes introduced, see section below for details. To reduce the impact on applications using the old service names, a legacy service names flag (enable_legacy_service_names) has been introduced and the default behavior is backward compatible (defaults to true). For new applications using only the new service names (/detection, /segmentation), you can disable legacy names by setting the flag to false:

    Via launch file argument:

    ros2 launch examples/manipulation-demo.launch.py game_launcher:=... enable_legacy_service_names:=false

    Via environment variable:

    ENABLE_LEGACY_SERVICE_NAMES=false ros2 run rai_perception run_perception_services

    Via ROS2 parameter (in code):

    connector.node.declare_parameter("enable_legacy_service_names", False)

Migration Guide

Issues

  • Links to relevant issues

Testing

  • Unit tests added and ran.
  • manipulation demo, v1, v2. Prompt: swap any two cubes
  • manipulation-streamlit with both versions. Prompt: "Place each apple on top of a cube", "Build a tower from cubes" and "Arrange objects in a line".
  • ROSBot - XL demo. Prompt: drive to kitchen
  • rai_semap
  • rai_bench. Test: manipulation_o3de.py started with manipulation_o3de.py without any failure.

To start manipulation-streamlit with both versions

# v2
AGENT_VERSION=v2 streamlit run examples/manipulation-demo-streamlit.py

# v1
AGENT_VERSION=v1 streamlit run examples/manipulation-demo-streamlit.py

Summary by CodeRabbit

  • New Features

    • Added service-based perception architecture for object detection and segmentation.
    • Introduced gripping point estimation tool with configurable filtering and estimation strategies (centroid, top-plane, biggest-plane).
    • Added debug visualization and intermediate pipeline stage publishing for perception workflows.
    • New perception presets for rapid configuration setup.
  • Documentation

    • Added comprehensive API design considerations guide for framework extensions.
    • Added usability redesign documentation for perception module.
  • Bug Fixes & Improvements

    • Enhanced ROS2 error handling and parameter utilities.
    • Improved test exclusion filtering to skip manual tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 98.37398% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.23%. Comparing base (eacde1d) to head (f6f9e6a).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/rai_core/rai/tools/ros2/manipulation/custom.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   66.31%   70.23%   +3.91%     
==========================================
  Files          78       82       +4     
  Lines        3423     3534     +111     
==========================================
+ Hits         2270     2482     +212     
+ Misses       1153     1052     -101     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Juliaj Juliaj marked this pull request as draft January 8, 2026 23:51
@Juliaj Juliaj changed the title feat: redesign API with tiered structure and improve 3D gripping point detection feat: redesign rai_perception API with tiered structure and improve 3D gripping point detection Jan 9, 2026
@Juliaj Juliaj force-pushed the jj/feat/3dpipe_and_usability branch from 7aca530 to 09674aa Compare January 12, 2026 05:28
@Juliaj Juliaj marked this pull request as ready for review January 12, 2026 10:14
@Juliaj Juliaj force-pushed the jj/feat/3dpipe_and_usability branch from 32d8802 to 4af6d1b Compare January 13, 2026 03:53
@jmatejcz
Copy link
Contributor

Regarding rai_sim and rai_bench, I don't think a lot of changes will be required. Mainly naming changes, switching to new tools or changing paths. For example in src/rai_bench/rai_bench/manipulation_o3de/predefined/configs/o3de_config.yaml

@maciejmajek
Copy link
Member

I will start with coderabbit to speed things up.

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

@maciejmajek I'll conduct a full review of the PR now. This will cover all the changes including the API redesign, breaking changes, and the perception logic merger from rai_semap.

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

Walkthrough

Major architectural refactoring and expansion of the RAI perception module introducing new algorithms, services, and components alongside deprecation of legacy agents. Updates include new ROS2 communication utilities, timeout decorators, comprehensive gripping-point extraction pipeline, model registries, and extensive test coverage. Multiple packages reorganized with service-oriented architecture replacing direct agent instantiation.

Changes

Cohort / File(s) Summary
CI & Configuration
.github/workflows/poetry-test.yml, pyproject.toml, src/rai_core/pyproject.toml
Updated pytest markers to exclude both "billable" and "manual" tests; bumped rai_core version from 2.7.0 to 2.8.0; added manual marker to pytest configuration.
Documentation
docs/API_documentation/connectors/ROS_2_Connectors.md, docs/api_design_considerations.md, docs/extensions/rethinking_usability.md, src/rai_extensions/rai_perception/README.md, src/rai_extensions/rai_perception/follow-ups.md
Added ROS2 utilities documentation (ROS2ServiceError, ROS2ParameterError, get_param_value); introduced comprehensive API design guidelines and usability considerations; documented debug mode for gripping points tool; detailed post-refactor migration path for agents to services.
ROS2 Communication Utilities
src/rai_core/rai/communication/ros2/exceptions.py, src/rai_core/rai/communication/ros2/parameters.py, src/rai_core/rai/communication/ros2/__init__.py
New exception classes ROS2ServiceError and ROS2ParameterError with contextual error information; added get_param_value helper for safe parameter retrieval with type extraction; exposed new utilities in public API.
Timeout Utilities
src/rai_core/rai/tools/timeout.py, src/rai_core/rai/tools/__init__.py
Introduced RaiTimeoutError exception and timeout/timeout_method decorators using ThreadPoolExecutor for per-call timeouts; re-exported as public API; high-density logic with resource management.
Examples
examples/manipulation-demo.py, examples/manipulation-demo-v1.py
Created self-contained create_agent() function for manipulation demo with ROS2 initialization, tool configuration, and agent creation; added simpler interactive loop example; internal implementation replaces external dependency.
Perception Algorithms (New)
src/rai_extensions/rai_perception/rai_perception/algorithms/__init__.py, rai_perception/algorithms/boxer.py, rai_perception/algorithms/segmenter.py, rai_perception/algorithms/point_cloud.py
New detection (GDBoxer) and segmentation (GDSegmenter) algorithm implementations with device management and model initialization; depth-to-point-cloud conversion utility; organized as low-level algorithm package.
Perception Components (New)
src/rai_extensions/rai_perception/rai_perception/components/*
Comprehensive perception pipeline: PointCloudFromSegmentation (extraction & transformation), PointCloudFilter (configurable outlier removal), GrippingPointEstimator (multiple strategies), configuration classes; exception hierarchy (PerceptionError, PerceptionAlgorithmError, PerceptionValidationError); utilities (perception_utils, service_utils, topic_utils, visualization_utils); presets for grasp strategies. High-density logic (~1400 LOC across multiple files).
Perception Services (New)
src/rai_extensions/rai_perception/rai_perception/services/*, scripts/run_perception_services.py
BaseVisionService base class with model registry support; DetectionService and SegmentationService with dynamic model loading and ROS2 service integration; weight management utilities (download, load with corruption recovery); service orchestration script.
Perception Tools (Refactored)
src/rai_extensions/rai_perception/rai_perception/tools/gdino_tools.py, gripping_points_tools.py, segmentation_tools.py, __init__.py
GetDetectionTool and GetDistanceToObjectsTool now use dynamic service names and error handling (ROS2ServiceError); new GetObjectGrippingPointsTool with full pipeline orchestration, debug visualization, and service introspection (~700 LOC); expanded tool exports.
Perception Agents (Deprecated)
src/rai_extensions/rai_perception/rai_perception/agents/*
GroundingDinoAgent and GroundedSamAgent refactored to delegate to services (BaseAgent inheritance, deprecation warnings); BaseVisionAgent simplified with utility imports; removed in-process model loading; service wrapper helper added.
Perception Module Organization
src/rai_extensions/rai_perception/rai_perception/__init__.py, vision_markup/__init__.py, models/__init__.py
Consolidated public API exports for tools, components, algorithms; introduced model registries (detection, segmentation) for dynamic model selection; deprecated vision_markup module with delegation to algorithms; updated service name constants.
Configuration & Scripts
src/rai_extensions/rai_perception/configs/*, examples/talker.py, src/rai_semap/ros2/config/detection_publisher.yaml
New detection_publisher and perception_utils YAML configs; example service endpoints updated to use new /detection and /segmentation service names; launch script path updated to use rai_perception component.
Tests - New Suites
tests/communication/ros2/test_exceptions.py, test_parameters.py, tests/rai_perception/algorithms/*, components/*, services/*, tools/*, vision_markup/*
Comprehensive test coverage for ROS2 utilities, new algorithms, perception components, services, and deprecated wrappers; extensive use of mocks, fixtures, and parameterization; high test density.
Tests - Configuration & Helpers
tests/conftest.py, tests/rai_perception/conftest.py, test_helpers.py, test_mocks.py
Enhanced pytest configuration with strategy and grasp options; comprehensive ROS2 mocking infrastructure (parameter tracking, service clients); helper functions for weights, fixtures, and patching; mock implementations (MockGDBoxer, MockGDSegmenter, EmptyBoxer, EmptySegmenter).
Tests - Removed
tests/rai_perception/test_grounded_sam.py, test_grounding_dino.py, test_run_perception_agents.py, tests/rai_semap/test_perception_utils.py
Removed deprecated agent test files as agents are transitioned to service-based architecture; moved tests to agent-specific modules under agents/.
Version Bump
src/rai_extensions/rai_perception/pyproject.toml
Bumped rai_perception from 0.1.5 to 0.2.0; updated authors list.
Public API Exports
src/rai_core/rai/__init__.py
Exported timeout decorator as public API from tools module.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • refactor: base api #518: Removed constructor arguments from BaseAgent class, directly enabling the refactored agent initialization pattern used in this PR's GroundingDinoAgent and GroundedSamAgent changes.
  • feat: various enhancements #508: Introduced ROS2 tool updates and manipulation demo examples that align with changes to examples/manipulation-demo.py and ROS2 integration patterns in this PR.
  • refactor: internal ros communication #335: Modified ROS2 communication patterns with improved service/client handling and future-based waiting that inform the service utilities and tool refactoring in this PR.

Suggested reviewers

  • jmatejcz
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main changes: redesigning the rai_perception API with a tiered structure and improving 3D gripping point detection through refactoring and merging of perception logic.
Description check ✅ Passed The PR description includes all required sections (Purpose, Proposed Changes, Issues, Testing) and provides substantial detail about the redesign, migration guide, and testing performed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Juliaj Juliaj requested a review from rachwalk January 22, 2026 01:48
@Juliaj
Copy link
Collaborator Author

Juliaj commented Jan 22, 2026

Regarding rai_sim and rai_bench, I don't think a lot of changes will be required. Mainly naming changes, switching to new tools or changing paths. For example in src/rai_bench/rai_bench/manipulation_o3de/predefined/configs/o3de_config.yaml

@jmatejcz thanks for help look into this. The current code changes are back-compat by default. Once we have this merged, I will look into migrate existing code to the new service names.

Copy link
Contributor

@jmatejcz jmatejcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look into to code, just left some minor questions.

How can i test this whole new functionality? manipulation demo v2 is enough? And what to look for ?

@Juliaj
Copy link
Collaborator Author

Juliaj commented Jan 23, 2026

How can i test this whole new functionality? manipulation demo v2 is enough? And what to look for ?

@jmatejcz, thanks for looking into this. The testing I have done is listed in the PR description. A few additional things perhaps

  • Rerun the manipulation demo v2 and rai_bench sanity check test to make sure I didn't miss there.
  • Some work of this PR is for usability, you could evaluate it with a potential feature for the overall developer experience. For example, does the new structure make it easier to find the folder to make the change? I did some initial evaluation with this new use case.
  • A check on another internal demo would be nice as well if anyone has the bandwidth for backward compatibility.

@jmatejcz
Copy link
Contributor

@Juliaj I tested V1 and V2 manipulaiton demo with same prompts. They both work and results are the same.
Tested Rosbot demo, it also works

Copy link
Contributor

@jmatejcz jmatejcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deeper dive into the changes. Thank you for the documents you wrote on the design - they are great source of knowlegde and also give an idea about your approach to this refactor.

In my opition the layered approach is a very good idea. The tools/ dir is very easy to use as it supposed to be.
Also the components/ are designed and self explenatory. I like the idea with presets, especially with the comments which is best for what.

These changes for sure reduce friction for new users and I love the abstraction level that they introduce. I tried myself using some of this code and I didn't have to even look at algorithms/ or implementations of components to use the module.

Now i can see the issue that rest of rai modules are not organized in such a way, which might confuse user, but I guess we can refactor them in the future too ;p

"""Abstract method - must be implemented by subclasses."""
raise NotImplementedError("Subclasses must implement _run method")

def _get_detection_service_name(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started my dive into this refactor with the tools/ dir as it is highest tier layer in this API.

If I understand correctly from the documents you wrote and the code, now switching detection models is possible by setting ros param.

In such case i would rename the base tool and its varaibles, as sticking with "GroundingDino..." names suggest to user that this work only with Grounding dino model. I think renaming to DetectionBaseTool etc. would increase role expressiveness

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to other fragments in code when "dino..." naming is used , for example "dino"node"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmatejcz, thanks a bunch for the feedback. This is a good catch.

In addtion to this module, a few other modules (components/gripping_points.py, tools/segmentation_tools.py) are also tightly coupled to GroundingDINO specifics, e.g. dardcoded RAIGroundingDino service type, and model-specific parameters (box_threshold, text_threshold). Overall, the refactoring change needs a careful evaluation.

Given the current PR scope, I've added a note for this in MIGRATION.md under "Generic Detection Tools Abstraction" so that we may address in a follow-up PR after merge. Hope you are okay with deferring this.

Copy link
Contributor

@jmatejcz jmatejcz Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so the main problem is that interface of grounding dino differs from other models?
And now we can't actually use any model besides GD, but if rai interfaces are added for other models, this refactor provides a fundamentals for next refactor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because now, parts of code enable model switching, but it does not actually work yet yes?

Copy link
Collaborator Author

@Juliaj Juliaj Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This PR laid out the foundation for model switching. For example, the DetectionService has infrastructure for model switching. In base_vision_service.py, following method reads the parameter and initialize the model,

def _initialize_model_from_registry(
self, get_model_func, default_model_name: str, model_type_name: str
):
"""Initialize model from registry based on ROS2 parameter.
Args:
get_model_func: Function that takes model name and returns (AlgorithmClass, config_path)
default_model_name: Default model name if ROS2 parameter not set
model_type_name: Type name for logging (e.g., "detection", "segmentation")
Returns:
Tuple of (model_instance, model_name)
"""
model_name = self._get_param_value("model_name", default_model_name)

But the service interface is still GroundingDINO-specific, so it cannot actually switch to other models yet. In addition, tools remain tightly coupled to GroundingDINO's interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, that you for clarification. Do you think we should leave some note , that only GD is currently supported? in this method above for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Juliaj thank you for this PR and clarification. I approve it, you could add the note or something add we are done ;p

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jmatejcz! I've updated module comments across relevant files to clarify that model switching is not yet fully implemented and to document the current coupling to GroundingDINO/Grounded SAM interfaces, along with future directions.

I evaluated the effort required for true model-agnostic refactoring (generic service interfaces). It's medium to high complexity due to parameter abstraction, backward compatibility, and refactoring across services, tools, and components.

As an alternative, we can use model-specific services and tools per model, which may be simpler and faster to implement but may result in some code duplication. I've updated MIGRATION.md with the trade-offs between these approaches.

)


def _publish_gripping_point_debug_data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't be "private" method I believe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

I'm neutral on this. Currently, developers can access this via GetObjectGrippingPointsTool after setting debug=True in the tool's input, which publishes the debug data automatically.

Making it public would allow standalone reuse of the visualization logic, but since it's a debug-only utility with performance overhead, keeping it private help avoid committing to a debug-only public API. What do you think ?

Copy link
Contributor

@jmatejcz jmatejcz Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to use it only in GetObjectGrippingPointsTool then it should be moved to get_gripping_points_tool.py. i think it breaks encapsulation and is a bit weird to have a private method in file that is not even used in the same module.

But it can be made "public" if the intention is for user to use it. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the method, commit.

@Juliaj Juliaj merged commit 257435e into main Feb 3, 2026
9 checks passed
@Juliaj Juliaj deleted the jj/feat/3dpipe_and_usability branch February 3, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants