Evaluators#

class EvaluatorsClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#

Bases: object

Client for managing Arize evaluators and evaluator versions.

This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via arize.ArizeClient.

The evaluators client is a thin wrapper around the generated REST API client, using the shared generated API client owned by arize.config.SDKConfiguration.

Parameters:

sdk_config (SDKConfiguration) – Resolved SDK configuration.
generated_client (ApiClient) – Shared generated API client instance.

list(*, name: str | None = None, space: str | None = None, limit: int = 100, cursor: str | None = None) → EvaluatorsList200Response[source]#

List evaluators the user has access to.

Results are sorted by update date (most recent first). This endpoint supports cursor-based pagination. When space is provided, results are limited to that space; otherwise evaluators from all permitted spaces are returned.

Parameters:

name (str | None) – Optional case-insensitive substring filter on the evaluator name.
space (str | None) – Optional space filter. If the value is a base64-encoded resource ID it is treated as a space ID; otherwise it is used as a case-insensitive substring filter on the space name.
limit (int) – Maximum number of evaluators to return (1-100).
cursor (str | None) – Opaque pagination cursor from a previous response.

Returns:

A paginated evaluator list response from the Arize REST API.

Raises:

ApiException – If the API request fails.

Return type:

EvaluatorsList200Response

get(*, evaluator: str, space: str | None = None, version_id: str | None = None) → EvaluatorWithVersion[source]#

Get an evaluator by name or ID, with its resolved version.

By default, the latest version is returned. Pass version_id to resolve a specific version instead.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to retrieve.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.
version_id (str | None) – Optional version identifier (base64). If omitted, the latest version is returned.

Returns:

The evaluator with its resolved version.

Raises:

ApiException – If the API request fails (for example, evaluator not found).

Return type:

EvaluatorWithVersion

create_template_evaluator(*, name: str, space: str, commit_message: str, template_config: TemplateConfig, description: str | None = None) → EvaluatorWithVersion[source]#

Create a new template evaluator with an initial version.

The evaluator name must be unique within the given space.

Parameters:

name (str) – Evaluator name (must be unique within the space).
space (str) – Space name or ID to create the evaluator in.
commit_message (str) – Commit message for the initial version.
template_config (TemplateConfig) –
Template configuration for the evaluator. Build with arize.evaluators.types.TemplateConfig. Required fields:
- name — eval column name; must match ^[a-zA-Z0-9_\\s\\-&()]+$.
- template — prompt template string with {variable} placeholders referencing span/trace attributes.
- include_explanations — whether the LLM should include a reasoning explanation alongside the score.
- use_function_calling_if_available — prefer structured function-call output over free-text parsing when the model supports it.
- llm_config — arize.evaluators.types.EvaluatorLlmConfig specifying the model provider, model name, and API key.
Optional fields: classification_choices, direction, data_granularity.
description (str | None) – Optional human-readable description of the evaluator.

Returns:

The created evaluator with its initial version.

Raises:

ApiException – If the API request fails (for example, name conflict or invalid payload).

Return type:

EvaluatorWithVersion

create_code_evaluator(*, name: str, space: str, commit_message: str, code_config: CodeConfig | CustomCodeConfig | ManagedCodeConfig | dict, description: str | None = None) → EvaluatorWithVersion[source]#

Create a new code evaluator with an initial version.

The evaluator name must be unique within the given space.

Parameters:

name (str) – Evaluator name (must be unique within the space).
space (str) – Space name or ID to create the evaluator in.
commit_message (str) – Commit message for the initial version.
code_config (CodeConfig | CustomCodeConfig | ManagedCodeConfig | dict) – Code configuration for the evaluator. Accepts a arize.evaluators.types.CodeConfig wrapper, an unwrapped arize.evaluators.types.ManagedCodeConfig or arize.evaluators.types.CustomCodeConfig, or a plain dict matching one of those schemas.
description (str | None) – Optional human-readable description of the evaluator.

Returns:

The created evaluator with its initial version.

Raises:

ApiException – If the API request fails (for example, name conflict or invalid payload).

Return type:

EvaluatorWithVersion

update(*, evaluator: str, space: str | None = None, name: str | None = None, description: str | None = None) → Evaluator[source]#

Update an evaluator’s metadata.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to update.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.
name (str | None) – New evaluator name (must be unique within its space).
description (str | None) – New description for the evaluator.

Returns:

The updated evaluator.

Raises:

ApiException – If the API request fails.

Return type:

Evaluator

delete(*, evaluator: str, space: str | None = None) → None[source]#

Delete an evaluator and all its versions.

This operation is irreversible.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to delete.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.

Returns:

None.

Raises:

ApiException – If the API request fails (for example, evaluator not found).

Return type:

None

list_versions(*, evaluator: str, space: str | None = None, limit: int = 100, cursor: str | None = None) → EvaluatorVersionsList200Response[source]#

List all versions of an evaluator.

Results are returned with cursor-based pagination.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to list versions for.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.
limit (int) – Maximum number of versions to return (1-100).
cursor (str | None) – Opaque pagination cursor from a previous response.

Returns:

A paginated evaluator version list response.

Raises:

ApiException – If the API request fails.

Return type:

EvaluatorVersionsList200Response

get_version(*, version_id: str) → EvaluatorVersionCode | EvaluatorVersionTemplate[source]#

Get a specific evaluator version by its global ID.

Parameters:: version_id (str) – Evaluator version identifier (base64).
Returns:: The evaluator version — a EvaluatorVersionCode for code evaluators (with code_config already unwrapped), or an EvaluatorVersionTemplate for template evaluators.
Raises:: ApiException – If the API request fails (for example, version not found).
Return type:: EvaluatorVersionCode | EvaluatorVersionTemplate

create_template_version(*, evaluator: str, space: str | None = None, commit_message: str, template_config: TemplateConfig) → EvaluatorVersionTemplate[source]#

Create a new template version of an existing evaluator.

The new version becomes the latest version immediately (versioning is append-only). Versions are immutable once created; to change the configuration, create a new version.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to add a version to.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.
commit_message (str) – Commit message describing the changes in this version.
template_config (TemplateConfig) – Updated template configuration for this version. Build with arize.evaluators.types.TemplateConfig.

Returns:

The newly created evaluator version.

Raises:

ApiException – If the API request fails.

Return type:

EvaluatorVersionTemplate

create_code_version(*, evaluator: str, space: str | None = None, commit_message: str, code_config: CodeConfig | CustomCodeConfig | ManagedCodeConfig | dict) → EvaluatorVersionCode[source]#

Create a new code version of an existing evaluator.

The new version becomes the latest version immediately (versioning is append-only). Versions are immutable once created; to change the configuration, create a new version.

Parameters:

evaluator (str) – Evaluator name or identifier (base64) to add a version to.
space (str | None) – Optional space name or ID. Required when evaluator is a name rather than an ID.
commit_message (str) – Commit message describing the changes in this version.
code_config (CodeConfig | CustomCodeConfig | ManagedCodeConfig | dict) – Updated code configuration for this version. Accepts a arize.evaluators.types.CodeConfig wrapper, an unwrapped arize.evaluators.types.ManagedCodeConfig or arize.evaluators.types.CustomCodeConfig, or a plain dict matching one of those schemas.

Returns:

The newly created evaluator version.

Raises:

ApiException – If the API request fails.

Return type:

EvaluatorVersionCode

Response Types#

class Evaluator(*, id: Annotated[str, Strict(strict=True)], name: Annotated[str, Strict(strict=True)], description: Annotated[str, Strict(strict=True)] | None = None, type: EvaluatorType, space_id: Annotated[str, Strict(strict=True)], created_at: datetime, updated_at: datetime, created_by_user_id: Annotated[str, Strict(strict=True)] | None)[source]#

Bases: BaseModel

An evaluator defines reusable evaluation logic that can be attached to evaluation tasks. The type field determines the kind of evaluation: template (LLM-based template evaluation) or code (custom code evaluation).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

id (Annotated[str, Strict(strict=True)])
name (Annotated[str, Strict(strict=True)])
description (Annotated[str, Strict(strict=True)] | None)
type (EvaluatorType)
space_id (Annotated[str, Strict(strict=True)])
created_at (datetime)
updated_at (datetime)
created_by_user_id (Annotated[str, Strict(strict=True)] | None)

id: StrictStr#

name: StrictStr#

description: StrictStr | None#

type: EvaluatorType#

space_id: StrictStr#

created_at: datetime#

updated_at: datetime#

created_by_user_id: StrictStr | None#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of Evaluator from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of Evaluator from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

class EvaluatorLlmConfig(*, ai_integration_id: Annotated[str, Strict(strict=True)], model_name: Annotated[str, Strict(strict=True)], invocation_parameters: InvocationParams, provider_parameters: ProviderParams)[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

ai_integration_id (Annotated[str, Strict(strict=True)])
model_name (Annotated[str, Strict(strict=True)])
invocation_parameters (InvocationParams)
provider_parameters (ProviderParams)

ai_integration_id: StrictStr#

model_name: StrictStr#

invocation_parameters: InvocationParams#

provider_parameters: ProviderParams#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of EvaluatorLlmConfig from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of EvaluatorLlmConfig from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

class EvaluatorWithVersion(*, id: str, name: str, description: str | None = None, type: EvaluatorType, space_id: str, created_at: datetime, updated_at: datetime, created_by_user_id: str | None = None, version: EvaluatorVersionCode | EvaluatorVersionTemplate)[source]#

Bases: BaseModel

SDK view of the generated EvaluatorWithVersion with version unwrapped.

The version field holds the concrete inner type (EvaluatorVersionCode for code evaluators, or EvaluatorVersionTemplate for template evaluators) instead of the oneOf wrapper.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

id (str)
name (str)
description (str | None)
type (EvaluatorType)
space_id (str)
created_at (datetime)
updated_at (datetime)
created_by_user_id (str | None)
version (EvaluatorVersionCode | EvaluatorVersionTemplate)

id: str#

name: str#

description: str | None#

type: EvaluatorType#

space_id: str#

created_at: datetime#

updated_at: datetime#

created_by_user_id: str | None#

version: EvaluatorVersionCode | EvaluatorVersionTemplate#

model_config: ClassVar[ConfigDict] = {'from_attributes': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class EvaluatorsList200Response(*, evaluators: List[Evaluator], pagination: PaginationMetadata)[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

evaluators (List[Evaluator])
pagination (PaginationMetadata)

evaluators: List[Evaluator]#

pagination: PaginationMetadata#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of EvaluatorsList200Response from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of EvaluatorsList200Response from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

to_df(by_alias: bool = False, exclude_none: str | bool = True, json_normalize: bool = False, convert_dtypes: bool = True, expand_field: str = 'additional_properties', expand_prefix: str = '') → pd.DataFrame#

Convert a list of objects to a pandas.DataFrame.

Behavior:

If an item is a Pydantic v2 model, use .model_dump(by_alias=…).
If an item is a mapping (dict-like), use it as-is.
Otherwise, raise a ValueError (unsupported row type).

Parameters:

self (object) – The object instance containing the field to convert.
by_alias (bool) – Use field aliases when dumping Pydantic models.
exclude_none (str | bool) – Control None/NaN column dropping. - False: keep Nones as-is - “all”: drop columns where all values are None/NaN - “any”: drop columns where any value is None/NaN - True: alias for “all”
json_normalize (bool) – If True, flatten nested dicts via pandas.json_normalize.
convert_dtypes (bool) – If True, call DataFrame.convert_dtypes() at the end.
expand_field (str) – If set, look for this field in each row and expand its keys into top-level columns.
expand_prefix (str) – If set, prefix expanded column names with this string.

Returns:

The converted DataFrame.

Return type:

pandas.DataFrame

class EvaluatorVersionsList200Response(*, evaluator_versions: list[EvaluatorVersionCode | EvaluatorVersionTemplate], pagination: PaginationMetadata)[source]#

Bases: BaseModel

SDK view of the generated EvaluatorVersionsList200Response with each version unwrapped.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

evaluator_versions (list[EvaluatorVersionCode | EvaluatorVersionTemplate])
pagination (PaginationMetadata)

evaluator_versions: list[EvaluatorVersionCode | EvaluatorVersionTemplate]#

pagination: PaginationMetadata#

model_config: ClassVar[ConfigDict] = {'from_attributes': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_df(by_alias: bool = False, exclude_none: str | bool = True, json_normalize: bool = False, convert_dtypes: bool = True, expand_field: str = 'additional_properties', expand_prefix: str = '') → pd.DataFrame#

Convert a list of objects to a pandas.DataFrame.

Behavior:

If an item is a Pydantic v2 model, use .model_dump(by_alias=…).
If an item is a mapping (dict-like), use it as-is.
Otherwise, raise a ValueError (unsupported row type).

Parameters:

self (object) – The object instance containing the field to convert.
by_alias (bool) – Use field aliases when dumping Pydantic models.
exclude_none (str | bool) – Control None/NaN column dropping. - False: keep Nones as-is - “all”: drop columns where all values are None/NaN - “any”: drop columns where any value is None/NaN - True: alias for “all”
json_normalize (bool) – If True, flatten nested dicts via pandas.json_normalize.
convert_dtypes (bool) – If True, call DataFrame.convert_dtypes() at the end.
expand_field (str) – If set, look for this field in each row and expand its keys into top-level columns.
expand_prefix (str) – If set, prefix expanded column names with this string.

Returns:

The converted DataFrame.

Return type:

pandas.DataFrame

class TemplateConfig(*, name: Annotated[str, Strict(strict=True)], template: Annotated[str, Strict(strict=True)], include_explanations: Annotated[bool, Strict(strict=True)], use_function_calling_if_available: Annotated[bool, Strict(strict=True)], classification_choices: Dict[str, Annotated[float, Strict(strict=True)] | Annotated[int, Strict(strict=True)]] | None = None, direction: OptimizationDirection | None = OptimizationDirection.NONE, data_granularity: DataGranularity | None = None, llm_config: EvaluatorLlmConfig)[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

name (Annotated[str, Strict(strict=True)])
template (Annotated[str, Strict(strict=True)])
include_explanations (Annotated[bool, Strict(strict=True)])
use_function_calling_if_available (Annotated[bool, Strict(strict=True)])
classification_choices (Dict[str, Annotated[float, Strict(strict=True)] | Annotated[int, Strict(strict=True)]] | None)
direction (OptimizationDirection | None)
data_granularity (DataGranularity | None)
llm_config (EvaluatorLlmConfig)

name: StrictStr#

template: StrictStr#

include_explanations: StrictBool#

use_function_calling_if_available: StrictBool#

classification_choices: Dict[str, StrictFloat | StrictInt] | None#

direction: OptimizationDirection | None#

data_granularity: DataGranularity | None#

llm_config: EvaluatorLlmConfig#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of TemplateConfig from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of TemplateConfig from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None