Skip to content

Cleanup stdlib Component implementations after merging conceptual spans. #264

@nrfulton

Description

@nrfulton

The current stdlib implementation ignores the parts() semantics that was present in early span-based impementations of primodial Mellea. We are now re-introducing spans, which means we need to define parts().

Many of our stdlib component implementations eschew the use of CBlocks and instead insist on strings at iniitialization time. In the past, CBlocks and Strings were semanticaly equivalent. Now they are not. So we need to go through those components and reimplement both their format_for_llm function and their internal representation. This will probably result in breaking changes to initializers as well.

This issue tracks all of the required work.

After we merge #249, we need to spend some time cleaning up the richdocument.py interface:

richdocument.py:

  • DoclingDocuments are naturally chunked. We should reuse these chunks as CBlocks and incorporate those CBlocks into the parts() methods.
  • We should choose a canonical representation for tables and include the table itself in parts().

mify.py:

  • allow re-use instead of passing back a huge string constructed at format time.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions