feat: RAI Semantic Map Memory (rai_semap)#727
Conversation
b79d56e to
4618982
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
=======================================
Coverage 65.32% 65.32%
=======================================
Files 78 78
Lines 3386 3386
=======================================
Hits 2212 2212
Misses 1174 1174 ☔ View full report in Codecov by Sentry. |
maciejmajek
left a comment
There was a problem hiding this comment.
After pyproject.toml changes I was able to run the semap feature. Amazing work!
The design.md mentions various algorithms for spatial/temporal deduplication. Before I reflect on that, I need to read some literature on this topic.
Since this PR is huge, I will continue the review tomorrow.
Co-authored-by: Maciej Majek <46171033+maciejmajek@users.noreply.github.com>
|
|
||
| @abstractmethod | ||
| def spatial_query( | ||
| self, center: Point, radius: float, filters: Optional[Dict[str, Any]] = None |
There was a problem hiding this comment.
Generally, I feel like interwining non ROS 2 code with ROS 2 types creates confusion and limits future integrations. Consider using src/rai_core/rai/types. Such separation validates the ros2 directory you've created. Let me know what you think
There was a problem hiding this comment.
Such separation validates the ros2 directory you've created.
Indeed, time to walk the talk.
There was a problem hiding this comment.
Nice adapter pattern: rai/types is an adapter between ROS 2 and the Python system.
There was a problem hiding this comment.
It turned out to be a bit more work than I anticipated! I ended up using both the Pose from rai.types and the ROS2 message type. See the struggle here
The issue is that ROS2 transform functions (tf2_geometry_msgs) require ROS2 message types, but we also need rai.types.Pose for the Pydantic models. Using PoseType = Any with arbitrary_types_allowed=True lets Pydantic accept both, which works but isn't ideal. If you have suggestions for a cleaner approach, I'm all ears!
| self.wait_tool = WaitForSecondsTool() | ||
| self._nav_action_ready = False | ||
|
|
||
| def wait_for_nav_action_server(self, timeout_sec: float = 60.0) -> bool: |
There was a problem hiding this comment.
from rai.communication.ros2 import wait_for_ros2_actions - I think we need to add support for timeout in ros2 waiters. The boilerplate below is what we want to eliminate with ros2connectors & their tooling
There was a problem hiding this comment.
But since there is no timeout in waiters, its ok as it is
src/rai_semap/rai_semap/ros2/node.py
Outdated
| def _initialize_tf(self): | ||
| """Initialize TF buffer and listener. | ||
|
|
||
| Note: ROS2Connector already creates _tf_buffer and _tf_listener internally. | ||
| We access the connector's TF buffer to avoid duplication. | ||
| """ | ||
| # Use the connector's internal TF buffer (already created in __init__) | ||
| self.tf_buffer = self._tf_buffer |
There was a problem hiding this comment.
That was left over code, removed.
src/rai_semap/rai_semap/ros2/node.py
Outdated
| map_frame_id = self._get_string_parameter("map_frame_id") | ||
| map_resolution = self._get_double_parameter("map_resolution") | ||
|
|
||
| backend = SQLiteBackend(database_path) |
There was a problem hiding this comment.
should most likely be configurable
There was a problem hiding this comment.
Consider adding a tool that can retrieve object types previously seen in a location.
Example:
- The model is asked to bring an object that can appear in different shapes such as a bottle, can, or cup.
- Finding the correct item might take three turns when using QuerySemanticMapTool, but only two turns if it can refer to a SeenObjects record.
At some point this may require context compression, but that can be handled by a future general module.
There was a problem hiding this comment.
Would this do ?
class GetSeenObjectsTool(BaseTool):
"""Tool for retrieving object types previously seen in a location."""
name: str = "get_seen_objects"
description: str = (
"Get a list of object types (e.g., 'bottle', 'cup', 'table') that have "
"been previously seen in a location. Useful for discovering what objects "
"exist before querying for specific instances."
)
args_schema: Type[GetSeenObjectsToolInput] = GetSeenObjectsToolInput
memory: SemanticMapMemory
def _run(self, location_id: Optional[str] = None) -> str:
"""Get list of distinct object classes seen in a location."""
pass
I've added skeleton code to support this ask and actual implementation will come in subsequent PR.
src/rai_semap/rai_semap/ros2/node.py
Outdated
| vision_detection_id=vision_detection_id, | ||
| metadata=metadata if metadata else existing_ann.metadata, | ||
| ) | ||
| self.memory.backend.update_annotation(updated) |
There was a problem hiding this comment.
From architectural (LoD) standpoint, I think that the node should not reach that far (memory.backend). Would it make sense to expose update annotation or similar in memory?
src/rai_semap/rai_semap/ros2/node.py
Outdated
|
|
||
| def _process_detection( | ||
| self, | ||
| detection, |
| # It performs object detection and 3D pose computation, which are general perception | ||
| # tasks not specific to semantic mapping. Consider moving to rai_perception when | ||
| # that package has ROS2 node infrastructure in place. | ||
| class DetectionPublisher(ROS2Connector): |
There was a problem hiding this comment.
We usually use ROS2Connector through dependency injection. I'm not opposed to inheritance, just would love to know your reasoning.
There was a problem hiding this comment.
Well, they're in the ros2 folder and conceptually are ROS2 connectors. Time to walk the talk of loose coupling. I went ahead and refactored it. With dependency injection, the call chain is a bit longer than direct calling.
|
@Juliaj I've finished the review. Huge feature! |
Huge thanks again for all the detailed review feedback! I've deferred a few items for follow-up—one is tracked as an issue, and the other involves moving code to rai_perception. I'll work on these in a subsequent PR to keep this one more focused. |

Purpose
Supports #225, Design SLAM RAI features
rai_semap, a semantic map memory system that stores object detections with their 3D locations in the SLAM map frame. The system enables spatial-semantic queries like "where did I see a cup?" and provides persistent memory for robot exploration and task planning.Proposed Changes
Deduplication: One of the main challenges is handling multiple detections of the same physical object. The system uses spatial merging with class-specific thresholds, point cloud-based matching when available, and confidence filtering to merge duplicate detections. However, distinguishing between a moved object and a new object instance remains a challenge that needs further refinement.
Object Detection Accuracy: The accuracy of object detection directly impacts the quality of the semantic map. The system includes confidence thresholds and bounding box size filtering, but detection accuracy depends on GroundingDINO.
Testing