-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Summary
LangDiff provides streaming-aware types (ld.Object, ld.List, etc.) with event callbacks and change tracking that can instrument Pydantic models. It also supports deriving Pydantic models from LangDiff models via to_pydantic(). What’s missing is the inverse: a built-in way to adapt an existing Pydantic model into a streaming-capable LangDiff model without authoring a bespoke ld.Object subclass per schema. Adding this would materially reduce boilerplate and improve developer ergonomics for teams already standardized on Pydantic.
Problem
- Redundant class definitions: Today, if I already have a Pydantic schema, I must re-declare an equivalent ld.Object class to stream it. This duplicates fields, types, and nesting solely to access on_start/on_append/on_complete and Parser behavior.
- Scale friction: Large codebases with many Pydantic models (and frequent schema evolution) pay an ongoing tax keeping ld.Object mirrors in sync.
- Onboarding cost: The delta between “I have a Pydantic schema” and “I can stream it with LangDiff” is bigger than it needs to be, despite LangDiff’s stated aim of Pydantic interop.
Proposal
Introduce a generic adapter that converts a Pydantic BaseModel (including nested models and lists) into an equivalent streaming schema at runtime—exposing the same event system (e.g., on_append) that ld.Object users expect.
API sketch (illustrative):
- ld.Object.from_pydantic(ModelCls, *, field_overrides: dict[str, Any] | None = None) -> type[ld.Object]
- Returns a dynamically generated subclass of ld.Object whose annotated fields mirror ModelCls.
- Maps Pydantic field types to streaming types:
- str → ld.String
- list[T] → ld.List[]
- Model → generated ld.Object (recursively)
- Numbers/bools → ld.Atom
- Honors optionality/aliases/defaults where feasible.
- (Optional convenience) ld.from_pydantic(ModelCls) alias.
Event Mapping
- For str fields: fire on_start/on_append as chunks arrive; on_complete when the field closes.
- For list fields: fire on_append(item, index) as new elements appear; nested string items stream with their own events.
- For nested objects: propagate completion when the last known key completes (consistent with current object semantics).
Validation & Errors
- Validation should remain Pydantic-accurate at “field completed” boundaries. Early streaming callbacks are best-effort; final validation runs on field/object completion.
- If a Pydantic feature is not supported for streaming (e.g., complex validators that require the full instance), raise a clear UnsupportedStreamingSchema with guidance.
Configuration Hooks
- field_overrides: allow callers to force a field to ld.Atom (non-streamed) or ld.String (streamed) to tune performance and UX.
- Global policy knob (e.g., strict_key_order=True / False) to align with LangDiff’s assumption about key order for objects.
Benefits
- Zero-duplication: No more parallel ld.Object subclasses for every Pydantic model.
- Faster iteration: Schema changes in Pydantic are immediately reflected in streaming behavior.
- Smoother adoption: Teams can keep their existing Pydantic modeling practices while gaining LangDiff’s streaming UX.
- Consistency: Keeps to_pydantic() (already documented) as the “other direction,” yielding symmetrical interop.
Edge Cases to Consider
- Deeply nested/recursive models and discriminated unions.
- Optional fields that start streaming late (ensure callbacks still attach deterministically).
- Field aliases / renames and JSON key ordering requirements.
- Combining with track_change() (e.g., adapting Pydantic for streaming and wrapping UI state for JSON Patch diffs).