Skip to content

Propose: Pydantic-to-Streaming Adapter #5

@rajephon

Description

@rajephon

Summary

LangDiff provides streaming-aware types (ld.Object, ld.List, etc.) with event callbacks and change tracking that can instrument Pydantic models. It also supports deriving Pydantic models from LangDiff models via to_pydantic(). What’s missing is the inverse: a built-in way to adapt an existing Pydantic model into a streaming-capable LangDiff model without authoring a bespoke ld.Object subclass per schema. Adding this would materially reduce boilerplate and improve developer ergonomics for teams already standardized on Pydantic.

Problem

  • Redundant class definitions: Today, if I already have a Pydantic schema, I must re-declare an equivalent ld.Object class to stream it. This duplicates fields, types, and nesting solely to access on_start/on_append/on_complete and Parser behavior.
  • Scale friction: Large codebases with many Pydantic models (and frequent schema evolution) pay an ongoing tax keeping ld.Object mirrors in sync.
  • Onboarding cost: The delta between “I have a Pydantic schema” and “I can stream it with LangDiff” is bigger than it needs to be, despite LangDiff’s stated aim of Pydantic interop.

Proposal

Introduce a generic adapter that converts a Pydantic BaseModel (including nested models and lists) into an equivalent streaming schema at runtime—exposing the same event system (e.g., on_append) that ld.Object users expect.

API sketch (illustrative):

  • ld.Object.from_pydantic(ModelCls, *, field_overrides: dict[str, Any] | None = None) -> type[ld.Object]
  • Returns a dynamically generated subclass of ld.Object whose annotated fields mirror ModelCls.
  • Maps Pydantic field types to streaming types:
  • str → ld.String
  • list[T] → ld.List[]
  • Model → generated ld.Object (recursively)
  • Numbers/bools → ld.Atom
  • Honors optionality/aliases/defaults where feasible.
  • (Optional convenience) ld.from_pydantic(ModelCls) alias.

Event Mapping

  • For str fields: fire on_start/on_append as chunks arrive; on_complete when the field closes.
  • For list fields: fire on_append(item, index) as new elements appear; nested string items stream with their own events.
  • For nested objects: propagate completion when the last known key completes (consistent with current object semantics).

Validation & Errors

  • Validation should remain Pydantic-accurate at “field completed” boundaries. Early streaming callbacks are best-effort; final validation runs on field/object completion.
  • If a Pydantic feature is not supported for streaming (e.g., complex validators that require the full instance), raise a clear UnsupportedStreamingSchema with guidance.

Configuration Hooks

  • field_overrides: allow callers to force a field to ld.Atom (non-streamed) or ld.String (streamed) to tune performance and UX.
  • Global policy knob (e.g., strict_key_order=True / False) to align with LangDiff’s assumption about key order for objects.

Benefits

  • Zero-duplication: No more parallel ld.Object subclasses for every Pydantic model.
  • Faster iteration: Schema changes in Pydantic are immediately reflected in streaming behavior.
  • Smoother adoption: Teams can keep their existing Pydantic modeling practices while gaining LangDiff’s streaming UX.
  • Consistency: Keeps to_pydantic() (already documented) as the “other direction,” yielding symmetrical interop.

Edge Cases to Consider

  • Deeply nested/recursive models and discriminated unions.
  • Optional fields that start streaming late (ensure callbacks still attach deterministically).
  • Field aliases / renames and JSON key ordering requirements.
  • Combining with track_change() (e.g., adapting Pydantic for streaming and wrapping UI state for JSON Patch diffs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions