Skip to content

Conversation

@Qard
Copy link
Contributor

@Qard Qard commented Jan 14, 2026

Introduces a new models parameter to init() that allows configuring default models for different evaluation types:

init({
  models: {
    completion: 'claude-3-5-sonnet-20241022',
    embedding: 'text-embedding-3-large',
  }
})

Changes:

  • Added models parameter to init() in both JS and Python
  • Models object supports:
    • completion: Default model for LLM-as-a-judge evaluations
    • embedding: Default model for embedding-based evaluations
  • models.completion takes precedence over deprecated defaultModel
  • All embedding scorers now use configured default embedding model
  • Added getDefaultEmbeddingModel() function
  • Maintains backward compatibility with existing defaultModel parameter
  • Added comprehensive tests for both languages

Default values:

  • Completion: "gpt-4o" (unchanged)
  • Embedding: "text-embedding-ada-002"

@Qard Qard requested a review from ibolmo January 14, 2026 01:26
@Qard Qard self-assigned this Jan 14, 2026
@Qard Qard added the enhancement New feature or request label Jan 14, 2026
@github-actions
Copy link

github-actions bot commented Jan 14, 2026

Braintrust eval report

Autoevals (models-config-1768445476)

Score Average Improvements Regressions
NumericDiff 72.5% (+0pp) 4 🟢 3 🔴
Time_to_first_token 1.35tok (-0.02tok) 69 🟢 49 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 279.25tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 19.3tok (+0tok) - -
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 298.54tok (+0tok) - -
Estimated_cost 0$ (+0$) - -
Duration 2.89s (+0.15s) 103 🟢 116 🔴
Llm_duration 2.66s (-0.16s) 82 🟢 37 🔴

@Qard Qard force-pushed the models-config branch 2 times, most recently from 569d23a to c3d81a0 Compare January 15, 2026 00:23
Comment on lines 161 to 178
def _get_ragas_embedding_model(user_model):
"""Get embedding model with RAGAS-specific default fallback.
Priority:
1. Explicitly provided user_model parameter
2. User-configured global embedding default (via init())
3. RAGAS-specific default (text-embedding-3-small)
"""
if user_model is not None:
return user_model

# Check if user has explicitly configured a global embedding default
configured_default = _default_embedding_model_var.get(None)
if configured_default is not None:
return configured_default

# Fall back to RAGAS-specific default
return DEFAULT_RAGAS_EMBEDDING_MODEL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists because (for some reason) Python and TypeScript are inconsistent about the embedding model to use here. Python has its own fallback to text-embedding-3-small while TypeScript delegates to the EmbeddingSimilarity default which will use text-embedding-ada-002. Should we just be switching everywhere to text-embedding-3-small though?

@Qard Qard force-pushed the models-config branch 2 times, most recently from b064549 to 2a3d1c5 Compare January 15, 2026 02:48
Introduces a new `models` parameter to init() that allows configuring
default models for different evaluation types:

```typescript
init({
  models: {
    completion: 'claude-3-5-sonnet-20241022',
    embedding: 'text-embedding-3-large',
  }
})
```

Changes:
- Added `models` parameter to init() in both JS and Python
- Models object supports:
  - `completion`: Default model for LLM-as-a-judge evaluations
  - `embedding`: Default model for embedding-based evaluations
- `models.completion` takes precedence over deprecated `defaultModel`
- All embedding scorers now use configured default embedding model
- Added getDefaultEmbeddingModel() function
- Maintains backward compatibility with existing `defaultModel` parameter
- Added comprehensive tests for both languages

Default values:
- Completion: "gpt-4o" (unchanged)
- Embedding: "text-embedding-ada-002"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants