Suhwan Choi*, Jaeyoon Jung*, Haebin Seong*, Minchan Kim, Minyeong Kim, Yongjun Cho, Yoonshik Kim, Yubeen Park, Youngjae Yu‡, Yunsung Lee‡
-
[2025/12/18] We release the FHD/QHD versions of the dataset on Hugging Face open-world-agents/D2E-Original for training world models and video generation models. We also fix issues in the 480p dataset open-world-agents/D2E-480p.
-
[2025/12/01] We release 480p version of dataset at huggingface. open-world-agents/D2E-480p: 267 hours of synchronized video, audio, and input events from 29 PC games across diverse genres (FPS, open-world, sandbox, and more), for training vision-action models and game agents.
-
[2025/10/21] We release part of our source codes. Code is comming soon!
ocapandowatoolkit is being open-sourced already, have a look at these first.- https://github.com/open-world-agents/ocap: ocap (Omnimodal CAPture) captures all essential desktop signals in synchronized format. Records screen video, audio, keyboard/mouse input, and window events.
- https://github.com/open-world-agents/open-world-agents: A versatile and efficient monorepo that embraces and grows multiple projects, containing all the essential building blocks for agent development.
- https://worv-ai.github.io/d2e/: D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. Code will coming soon!
If you find this work useful, please cite our paper:
@article{choi2025d2e,
title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
author={Choi, Suhwan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
journal={arXiv preprint arXiv:2510.05684},
year={2025}
}