-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Smart thinking is that upstream changes have made large parts of Toolio obsolete. A few items of note:
- Structured outputs are coming to mlx-lm soon, including to batched generation—updating Toolio for the latter would be a giant lift
A lot of this is still emerging in the past couple of months, and it looks like most people would be looking to actually use this by plugging in something like the outlines library. I kinda find this too heavyweight an upstream dependency for my taste, so one role for Toolio in continuing might be a much lighter-weight alternative lib and server (and OpenAI-compat client) to use structured gen, incl with batching, out of the box.
I want to give some candid thought to the role for Toolio going forward? I already think we should just remove all the tool-calling machinery, as the state of the art in that has gone way beyond where it was when I started with Toolio. I also think composable agent pipelines are more important than tool-calling in modern LLMOps. I do have a vague idea that we could re-orient Toolio to be like a very efficient, vLLM-like toolkit for MLX, but that idea also deserves honest pondering.