-
Notifications
You must be signed in to change notification settings - Fork 298
Open
Labels
feature requestNew feature or requestNew feature or request
Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
Summary
Enhance the embedding pipeline stage to support embedding arbitrary custom content fields specified in the embedding job.
When a custom_content_field is provided, the embedding stage will extract the text from that field, generate embeddings, and store the results in a specified result_target_field on the same metadata object.
Describe the feature, and optionally a solution or implementation and any alternatives
Requirements
1. EmbedTask schema updates
- Add two optional fields to the
EmbedTaskobject:custom_content_field: Optional[str]— name of the field on the content metadata that contains the text to embed.result_target_field: Optional[str]— name of the field on the content metadata where the resulting embedding vector should be stored.
2. Embedding stage logic
When processing an embedding job:
- Check if
custom_content_fieldis provided. - If not provided → existing behavior (embed default content field).
- If provided:
- Attempt to locate the field on the content metadata.
- Validate that its value is text (string or convertible to string).
- Run embedding on that text.
- Store the resulting embedding vector in the field specified by
result_target_field. - If the target field does not exist, create it on the metadata object.
- Even when the custom field is provided, we also want to construct default embeddings for primitive data types.
- The goal is support embedding generation for custom UDF fields, while maintaining backwards compatibility.
3. Error handling
- If
custom_content_fieldis specified but not found → log warning, skip embedding for that record. - If the field exists but is non-textual → log warning, skip embedding for that record.
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request