The D-Cashier project is a smart, voice-controlled automated checkout system designed for retail environments such as convenience stores or unmanned kiosks. It integrates object detection, voice interface, and robotic manipulation to streamline the checkout process.
This system enables users to interact entirely through voice, while products are recognized and processed using a YOLOv11n-OBB-based vision module. Unrecognized items are automatically handled via a “Cancel Position,” and restricted goods are verified through OCR and face recognition.
👉 Click the thumbnail above to watch the demo video on YouTube!
-
🌀 Multi-frame Object Detection + Rotation Estimation
→ Implemented a custom post-processing algorithm for YOLOv11n-OBB
→ Achieved ±3° yaw error margin -
⏱ Voice Interface with Real-Time GUI + TTS Feedback
→ System response time maintained under 1 second -
❌ “Cancel Position” Handling for Undetected Items
→ Reduced false detection issues by over 40%
-
① YOLOv11n-OBB Object Detection
Detects objects and estimates their 3D position + orientation[x, y, z, yaw]using oriented bounding boxes.
Polygon vertices are averaged across frames to improve yaw estimation. -
② Background Subtraction + Cancel Position Handling
If YOLO fails to detect an object, the system compares the current frame with a pre-stored background image to locate unexpected items.
Detected unknown objects are moved to a Cancel Position to prevent false charges. -
③ Adult Verification (19+ Restricted Items)
When a restricted item is detected (e.g., alcohol, cigarettes), the system:- Uses OCR to extract birth date from a captured ID card
- Matches the face from the ID with the user’s face in front of the camera
- Grants or denies approval based on age + match score
-
④ Voice-Controlled Interface with GUI Feedback
- Wake-up word detection:
"Hello Rokey" - Natural language input via OpenAI Whisper
- Intent parsing via LangChain + GPT-4o
- Real-time GUI update + TTS output using OpenAI voice
- Wake-up word detection:
-
Pose Conversion
- Converts YOLO’s
[x, y, z, yaw]to robot base coordinates[x, y, z, rx, ry, rz] - Uses
T_gripper2camera.npyand external calibration parameters for accurate transform - Adjusts gripper width based on object size (e.g.,
min_side × 10 - 50) - Pick action is executed with Doosan’s
movel()API
- Converts YOLO’s
-
Cancel Preemption (Stop & Retry)
- User can say "정지" or "Remove [item]" → current goal is canceled
- Robot switches to cancel pose using a custom CancelObject service
- Uses
MultiThreadedExecutorto handle cancel requests concurrently with execution
-
Force-Sensitive Grasping
- Grasp failure is detected when
|Fz|force remains unchanged after closure - For fragile items (bottles, cans), compliance control is used:
- Applies downward force (e.g., 15N @ Z-axis)
- Releases when force drops below threshold (e.g., <10N)
- Logs all force values and errors for safety validation
- Grasp failure is detected when
default.mp4
For a detailed explanation of this project, please refer to the following document:
👉 docs
Thanks to these wonderful people who have contributed to this project:
![]() weedmo |
![]() jsbae-RL |
![]() DONGHO1206 |







