Evals

Creating an eval

Evals are created in a short wizard:

Target — what to run on. Choose the level: trace (score the whole run) or span (score individual steps). Optionally filter by agent, trace name, and — for span-level — span type and model.

Check — what to verify. Pick a preset (below).

Score — how to score. For LLM judges, pick a judge model; for parameterized checks, set the parameter (a substring, pattern, or max length). Set a sample rate (1%–100%) to control how much matching traffic is scored.

Code checks

Deterministic, free, and run with no external calls:

Preset	Checks
No PII	Output is free of emails, phone numbers, SSNs, cards, IPs.
No secret leak	Output contains no API-key / token / private-key shapes.
Valid JSON	Output parses as JSON.
No refusal	Output isn’t a refusal.
Non-empty	Output isn’t empty.
Max length	Output is within a character budget.
Contains / Excludes text	Output does (or doesn’t) contain a substring.
Regex match	Output matches a pattern.
Tool args valid	(span-only) A tool call’s input is a valid JSON object.

LLM judges

Judges send the input/output to a model that returns a normalized 0.00–1.00 score or a pass/fail verdict with a reason. Presets cover relevance, helpfulness, coherence, conciseness, instruction-following, completeness, toxicity/safety, tool selection, and RAG-oriented checks (faithfulness, context relevance, correctness vs. a reference).

LLM judges are bring-your-own-key. Add a provider key (below) before creating one. An eval with no usable key shows the status needs key and doesn’t score until a key is added. The set of available judge models is defined by the deployment.

Provider keys

The Provider Keys page stores the LLM-provider API keys your judges use, encrypted at rest and scoped per project. Keys are write-only — once saved, the value is never shown again; the page only indicates which providers are configured. Add or replace a key with upsert, or delete it.

Provider-key encryption requires FOGLAMP_SECRETS_KEY (32+ chars) to be set on the server. Without it, the page shows “Encryption not configured” and judge evals can’t run.

Eval detail

Opening an eval shows its recent activity: scored count, average score, pass rate, and judge spend over the selected range, plus a table of recent scores (target, pass/fail or numeric score, the reason, and when). Each enabled eval also has a status — ok, needs key, or error — and an inline toggle.

Explore

Operate

Creating an eval

Code checks

LLM judges

Provider keys

Eval detail

​Creating an eval

​Code checks

​LLM judges

​Provider keys

​Eval detail

Creating an eval

Code checks

LLM judges

Provider keys

Eval detail