Skip to content

Quest2GM/Koch_VLM_Benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLM Benchmarks on the Koch v1.1 Manipulator

Introduction

This repository aims to reproduce the results of recent publications that use vision-language models (VLMs) for robot manipulation tasks on low-cost DIY manipulators. The goal is to create a centralized hub for VLM-based manipulator projects, enabling rapid testing and benchmarking. I chose the Koch v1.1 manipulator to start, due to its compatibility with lerobot.

Note: The koch v1-1 has only 5DoF, which may be limiting for more complex experiments. For future projects, I would recommend a low-cost 6DoF robot (ex. Simple Automation).

Koch v1.1 Manipulator

Please follow the build instructions found on the original repository. Additionally, follow the lerobot example for running the code.

To simplify the forward and inverse kinematics, I set LaTeX Equation. This is good enough to achieve most pick-and-place tasks.

DH Table

Joint LaTeX Equation (Link Length) LaTeX Equation (Twist) LaTeX Equation (Offset) LaTeX Equation (Joint Angle) Joint Limits (rad)
1 LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation
2 LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation
3 LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation
4 LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation
5 LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation LaTeX Equation

Inverse Kinematics

inv_kin

Experiment Setup

For all experiments, a single ZED mini stereo camera was positioned across from the Koch v1.1 manipulator, ensuring that it had a clear view of the manipulator's workspace.

The Perspective-n-Point (PnP) pose computation (cv2.solvePnP) was used to calculate the rotation and translation matrices between the camera frame and the robot/world frame. A blue object, held by the robot's end-effector, was tracked across the image to obtain pixel coordinates. The corresponding world coordinates were derived using inverse kinematics. See video below:

calibration.mp4

Demonstrations

Due to limited computational resources, I did not do collision checking.

Experiment 1: Eraser into Tape

demo_eraser.mp4

Experiment 2: Chess

demo_chess.1.mp4

Experiment 3: Block Stacking

demo_stack.mp4

TBD

About

VLM benchmarks for robot manipulation tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published