Skip to content

Would anyone care to explain the RELATIVE_OFFSETS in the inference test? #9

@alexander-cobot

Description

@alexander-cobot

The videos themselves (the ones in the debug_images folder) have metadata saying they are 15 fps. The paper says that for DROID, videos are sampled at 5 FPS and actions at 15 FPS, with an action horizon of 24 (or a video frame horizon of 8 frames) making for 1.6 seconds.

But RELATIVE_OFFSETS = [-23, -16, -8, 0]. Firstly, the diff is [7, 8, 8] which already seems odd. Second of all, what's with the 8 anyway? If the video is 15 FPS but I wan't to sample at 5 FPS shouldn't we do something more like [-23, -20, -17, -14, -11, -8, -5, -2]? That way we are generating essentially [0, 3, 6, 9, 12, 15, 18, 21] for the video frames and [0, 1, 2, .... 23] for the actions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions