Evaluations

Evaluations in Quotient track model performance through Run objects. Each Run combines a prompt, dataset, model, and metrics to produce quantitative results.

Runs & Results

A Run contains:

class Run:
  id: str                     # Unique identifier
  prompt: str                 # Reference to prompt ID
  dataset: str                # Reference to dataset ID
  model: str                  # Reference to model ID
  parameters: dict            # Model configuration
  metrics: List[str]          # Metrics to compute
  status: str                 # Current run status
  results: List[RunResult]    # Results per dataset row
  created_at: datetime        # Creation timestamp
  finished_at: datetime       # Completion timestamp

The status field can be not-started, running, completed or failed.

Each result within a Run is represented by:

class RunResult:
  id: str           # Result identifier
  input: str        # Input text
  output: str       # Model output
  values: dict      # Metric scores
  context: str      # Optional context
  expected: str     # Optional expected output
  created_at: datetime
  created_by: str

Creating a Run

To create a run, you can use the quotient.evaluate() method along:

run = quotient.evaluate(
  prompt=prompt,      # Prompt object
  dataset=dataset,    # Dataset object
  model=model,        # Model object
  parameters={
      "temperature": 0.7,
      "max_tokens": 100
  },
  metrics=['bertscore', 'exactmatch']
)
Parameters vary by model. See Models for provider-specific options.

Retrieving Runs

Get a specific run:

run = quotient.runs.get(run_id)

List all runs:

runs = quotient.runs.list()

Run Summary

Generate performance summaries using the summarize() method:

summary = run.summarize(
    best_n=3,    # Top performing examples
    worst_n=3    # Lowest performing examples
)

The summary includes:

  • Aggregate metrics (average, standard deviation)
  • Best / worst performing examples
  • Run metadata (model, parameters, timestamps)

CLI Usage

You can run evaluations via the Quotient CLI, so long as your file has the phrase evaluate in it:

quotient run simple_evaluate.py

Runs execute asynchronously and may take time for large datasets. Monitor progress with the CLI or SDK.

See also: