Scorer Docs Lack Configuration Parameters, Threshold Tuning, and Interpretation Guidance

**Documentation page:**
https://docs.judgmentlabs.ai/documentation/evaluation/scorers/introduction

**Context:**
The documentation provides a conceptual overview of each scorer and what it evaluates (e.g., execution order, tool usage, faithfulness). However, it does not explain how to configure these scorers or tune them for specific applications. Developers are left guessing:
- What parameters can be adjusted?
- What are the default thresholds for cosine similarity, ranking accuracy, etc.?
- What score range is considered "good" or "acceptable"?
- How do you interpret or act on a 0.6 versus 0.9 answer correctness score?

**Suggestion:**
Add a "Configuration & Interpretation" section to each scorer page with:
1. Tunable Parameters
- E.g., similarity_threshold, tool_penalty_weight, faithfulness_sensitivity
- Show how to initialize with custom parameters
2. Defaults & Ranges
- List default values used internally
- Provide typical value ranges for real-world datasets
3. Interpreting Scores
- Explain what score bands mean: "A tool-order score of 1.0 means tools were called in ideal order, while <0.5 may signal planning drift or tool misuse."
4. Best Practices
- For example: "Start with similarity_threshold = 0.75 for answer correctness, then calibrate based on false positive rate."

**Why It Matters:**
This improvement would help users:
- Tailor scorers to domain-specific agents
- Make informed decisions based on scores rather than treating them as black boxes
- Increase transparency and trust in automated evaluation pipelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scorer Docs Lack Configuration Parameters, Threshold Tuning, and Interpretation Guidance #418

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scorer Docs Lack Configuration Parameters, Threshold Tuning, and Interpretation Guidance #418

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions