VLM Benchmarks on the Koch v1.1 Manipulator

Introduction

This repository aims to reproduce the results of recent publications that use vision-language models (VLMs) for robot manipulation tasks on low-cost DIY manipulators. The goal is to create a centralized hub for VLM-based manipulator projects, enabling rapid testing and benchmarking. I chose the Koch v1.1 manipulator to start, due to its compatibility with lerobot.

Note: The koch v1-1 has only 5DoF, which may be limiting for more complex experiments. For future projects, I would recommend a low-cost 6DoF robot (ex. Simple Automation).

Koch v1.1 Manipulator

Please follow the build instructions found on the original repository. Additionally, follow the lerobot example for running the code.

To simplify the forward and inverse kinematics, I set . This is good enough to achieve most pick-and-place tasks.

DH Table

Joint	(Link Length)	(Twist)	(Offset)	(Joint Angle)	Joint Limits (rad)
1
2
3
4
5

Inverse Kinematics

Experiment Setup

For all experiments, a single ZED mini stereo camera was positioned across from the Koch v1.1 manipulator, ensuring that it had a clear view of the manipulator's workspace.

The Perspective-n-Point (PnP) pose computation (cv2.solvePnP) was used to calculate the rotation and translation matrices between the camera frame and the robot/world frame. A blue object, held by the robot's end-effector, was tracked across the image to obtain pixel coordinates. The corresponding world coordinates were derived using inverse kinematics. See video below:

calibration.mp4

Demonstrations

ReKep

Due to limited computational resources, I did not do collision checking.

Experiment 1: Eraser into Tape

demo_eraser.mp4

Experiment 2: Chess

demo_chess.1.mp4

Experiment 3: Block Stacking

demo_stack.mp4

Pi0

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.cache/calibration		.cache/calibration
images		images
lerobot		lerobot
rekep		rekep
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
camera.py		camera.py
configure_koch.py		configure_koch.py
main_rekep.py		main_rekep.py
pyproject.toml		pyproject.toml
vision_pipeline.py		vision_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLM Benchmarks on the Koch v1.1 Manipulator

Introduction

Koch v1.1 Manipulator

DH Table

Inverse Kinematics

Experiment Setup

Demonstrations

ReKep

Experiment 1: Eraser into Tape

Experiment 2: Chess

Experiment 3: Block Stacking

Pi0

About

Uh oh!

Releases

Packages

Languages

License

Quest2GM/Koch_VLM_Benchmarks

Folders and files

Latest commit

History

Repository files navigation

VLM Benchmarks on the Koch v1.1 Manipulator

Introduction

Koch v1.1 Manipulator

DH Table

Inverse Kinematics

Experiment Setup

Demonstrations

ReKep

Experiment 1: Eraser into Tape

Experiment 2: Chess

Experiment 3: Block Stacking

Pi0

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages