Reminder
Motivation
The low footprint of Yi-VL's video memory and the high speed of its inference allows room for more utility. If the Yi-VL series of multimodal macromodels can be fine-tuned using its own dataset, it many projects will be a great leap forward!
Solution
No response
Alternatives
No response
Anything Else?
No response
Are you willing to submit a PR?