-
Notifications
You must be signed in to change notification settings - Fork 255
Description
Thanks for publishing this, very interesting optimization path.
I spend some of my time with the attention.rs project (authored and maintained by @guoqingbao, partially to support domestically designed/produced kit in an API-compatible manner with global market fare) which already underpins at least two inference libraries and has broader application. If the DeepSeek team happens to have any resources available to implement this demo in a more structured domain, it could do a lot of good applied to an attention/KV/memory dependency in the Rust ML ecosystem; especially for folks trying to work with large MoEs like DeepSeek on somewhat limited/partially offloaded HW (your team makes some "parameter-heavy" models which require a ton of VRAM in conventional use - not the sort of kit to which most ML practitioners or students have access).