-
Notifications
You must be signed in to change notification settings - Fork 63
Description
The current stdlib implementation ignores the parts() semantics that was present in early span-based impementations of primodial Mellea. We are now re-introducing spans, which means we need to define parts().
Many of our stdlib component implementations eschew the use of CBlocks and instead insist on strings at iniitialization time. In the past, CBlocks and Strings were semanticaly equivalent. Now they are not. So we need to go through those components and reimplement both their format_for_llm function and their internal representation. This will probably result in breaking changes to initializers as well.
This issue tracks all of the required work.
After we merge #249, we need to spend some time cleaning up the richdocument.py interface:
richdocument.py:
-
DoclingDocuments are naturally chunked. We should reuse these chunks asCBlocksand incorporate thoseCBlocksinto theparts()methods. - We should choose a canonical representation for tables and include the table itself in
parts().
mify.py:
- allow re-use instead of passing back a huge string constructed at format time.