Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Dec 15, 2025

What does this PR do?

Fixes #42205

Also fixes processors handling videos crashing when return_metadata=True and return_tensors="pt" .

Cc @zucchini-nlp , for now I have hardcoded all the skip conversions. I think it's safer to do this for now instead of silently not converting any inputs to BatchFeature for which tensor conversion fails. The main issue is with arrays, where non ragged arrays would be converted to tensors, and ragged arrays silently wouldn't, which could cause some unexpected behavior.
Open to alternative solutions so happy to hear your thoughts!

Edit: Added auto-skip, except for ragged arrays

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm46v, glm4v, llava_next_video, perception_lm, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_omni_moe, qwen3_vl, video_llama_3

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +242 to +254
# recursively handle lists and tuples
elif isinstance(v, (list, tuple)):
return type(v)(maybe_to(item) for item in v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which cases we have lists of tensors? I assume we either convert to 'pt' or skip and leave it as list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good example is in the issue I mentioned #42205

Basically for ragged lists of tensors

return BatchFeature(data={**text_inputs, **image_inputs, **videos_inputs}, tensor_type=return_tensors)
return BatchFeature(
data={**text_inputs, **image_inputs, **videos_inputs},
skip_tensor_conversion=["video_metadata"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be added in as_tensor fn in batch features. We'll check if the input is array-like and do not convert if it's not. That also would generalize for similar outputs if any.

For ragged-arrays users will see and error because it's array-like structure and we don't auto-skip convertion. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I'll see if we can do that in a robust way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@yonigozlan yonigozlan force-pushed the improve-batch-feature-further branch from d9e2c83 to 686d9a3 Compare December 16, 2025 22:06
@yonigozlan yonigozlan changed the title Improve BatchFeature (.to() works on lists/nested lists of tensors) Improve BatchFeature (.to() works on lists/nested lists of tensors, auto-skip non array conversion) Dec 16, 2025
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm! The chat template tests seems to be failing and skipping mask conversion to tensor, can you check?

@yonigozlan
Copy link
Member Author

Thanks, lgtm! The chat template tests seems to be failing and skipping mask conversion to tensor, can you check?

Yep thanks for catching that! The issue was that I was skipping converting empty lists/nested lists when we shouldn't

@yonigozlan yonigozlan enabled auto-merge (squash) December 17, 2025 23:51
@yonigozlan yonigozlan merged commit 9f583b1 into huggingface:main Dec 17, 2025
25 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
…uto-skip non array conversion) (huggingface#42884)

* improve .to() to work on lists or nested lists of tensors, automatically skip converting non array structure

* consider empty list or nested list as array-like
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support .to(device) or Device Aware Handling for Segmentation Labels in EOMTImageProcessor

3 participants