Datasets#

class DatasetsClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#

Bases: object

Client for managing datasets including creation, retrieval, and example management.

This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via arize.ArizeClient.

The datasets client is a thin wrapper around the generated REST API client, using the shared generated API client owned by arize.config.SDKConfiguration.

Parameters:

sdk_config (SDKConfiguration) – Resolved SDK configuration.
generated_client (ApiClient) – Shared generated API client instance.

list(*, name: str | None = None, space: str | None = None, limit: int = 100, cursor: str | None = None) → DatasetListResponse[source]#

List datasets the user has access to.

Datasets are returned in descending creation order (most recently created first). Dataset versions are not included in this response; use get() to retrieve a dataset along with its versions.

Parameters:

name (str | None) – Optional case-insensitive substring filter on the dataset name.
space (str | None) – Optional space filter. If the value is a base64-encoded resource ID it is treated as a space ID; otherwise it is used as a case-insensitive substring filter on the space name.
limit (int) – Maximum number of datasets to return. The server enforces an upper bound.
cursor (str | None) – Opaque pagination cursor returned from a previous response.

Returns:

A response object with the datasets and pagination information.

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

DatasetListResponse

create(*, name: str, space: str, examples: builtins.list[dict[str, object]] | pd.DataFrame, force_http: bool = False) → Dataset[source]#

Create a dataset with JSON examples.

Empty datasets are not allowed.

Payload notes (server-enforced):

name must be unique within the given space.
Each example may contain arbitrary user-defined fields.
Do not include system-managed fields on create: id, created_at, updated_at (requests containing these fields will be rejected).
Each example must contain at least one property (i.e. {} is invalid).

Transport selection:

If the payload is below the configured REST payload threshold (or force_http=True), this method uploads via REST.
Otherwise, it attempts a more efficient upload path via gRPC + Flight.

Parameters:

name (str) – Dataset name (must be unique within the target space).
space (str) – Space ID or name to create the dataset in.
examples (builtins.list[dict[str, object]] | pd.DataFrame) – Dataset examples either as: - a list of JSON-like dicts, or - a pandas.DataFrame (will be converted to records for REST).
force_http (bool) – If True, force REST upload even if the payload exceeds the configured REST payload threshold.

Returns:

The created dataset object as returned by the API.

Raises:

TypeError – If examples is not a list of dicts or a pandas.DataFrame.
RuntimeError – If the Flight upload path is selected and the Flight request fails.
ApiException – If the REST API returns an error response (e.g. 400/401/403/409/429).

Return type:

Dataset

get(*, dataset: str, space: str | None = None) → Dataset[source]#

Get a dataset by ID or name.

The returned dataset includes its dataset versions (sorted by creation time, most recent first). Dataset examples are not included; use list_examples() to retrieve examples.

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.

Returns:

The dataset object.

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/404/429).

Return type:

Dataset

delete(*, dataset: str, space: str | None = None) → None[source]#

Delete a dataset by ID or name.

This operation is irreversible.

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.

Returns:

This method returns None on success (common empty 204 response).

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/404/429).

Return type:

None

update(*, dataset: str, space: str | None = None, name: str) → Dataset[source]#

Rename a dataset.

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.
name (str) – New name for the dataset. Must be unique within the space.

Returns:

The updated dataset object.

Raises:

ApiException – If the REST API returns an error response (e.g. 400/401/403/404/409/429).

Return type:

Dataset

list_examples(*, dataset: str, space: str | None = None, dataset_version_id: str | None = None, limit: int = 100, all: bool = False) → DatasetExampleListResponse[source]#

List examples for a dataset (optionally for a specific version).

If dataset_version_id is not provided (empty string), the server selects the latest dataset version.

Pagination notes:

The response includes pagination for forward compatibility.
Cursor pagination may not be fully implemented by the server yet.
If all=True, this method retrieves all examples via the Flight path, and returns them in a single response with has_more=False.

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.
dataset_version_id (str | None) – Dataset version ID. If empty, the latest version is selected.
limit (int) – Maximum number of examples to return when all=False. The server enforces an upper bound.
all (bool) – If True, fetch all examples (ignores limit) via Flight and return a single response.

Returns:

A response object containing examples and pagination metadata.

Raises:

RuntimeError – If the Flight request fails or returns no response when all=True.
ApiException – If the REST API returns an error response when all=False (e.g. 401/403/404/429).

Return type:

DatasetExampleListResponse

append_examples(*, dataset: str, space: str | None = None, dataset_version_id: str = '', examples: builtins.list[dict[str, object]] | pd.DataFrame) → models.DatasetVersionWithExampleIds[source]#

Append new examples to an existing dataset.

This method adds examples to an existing dataset version. If dataset_version_id is not provided (empty string), the server appends the examples to the latest dataset version.

The inserted examples are assigned system-generated IDs by the server. The response includes those IDs in example_ids and the version they were written to in dataset_version_id.

Payload requirements (server-enforced):

Each example may contain arbitrary user-defined fields.
Do not include system-managed fields on input: id, created_at, updated_at (requests containing these fields will be rejected).
Each example must contain at least one property (i.e. empty examples are not invalid).

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.
dataset_version_id (str) – Optional dataset version ID to append examples to. If empty, the latest dataset version is selected.
examples (builtins.list[dict[str, object]] | pd.DataFrame) – Examples to append, provided as either: - a list of JSON-like dicts, or - a pandas.DataFrame (converted to records before upload).

Returns:

A DatasetVersionWithExampleIds containing the dataset attributes, the version the examples were written to (dataset_version_id), and the IDs of the inserted examples (example_ids).

Raises:

AssertionError – If examples is not a list of dicts or a pandas.DataFrame.
ApiException – If the REST API returns an error response (e.g. 400/401/403/404/429).

Return type:

models.DatasetVersionWithExampleIds

annotate_examples(*, dataset: str, space: str | None = None, annotations: builtins.list[models.AnnotateRecordInput]) → None[source]#

Write human annotations to a batch of examples in a dataset.

Annotations are upserted by annotation config name for each example. Submitting the same annotation config name for the same example overwrites the previous value. Retrying on network failure will not create duplicates.

Up to 1000 examples may be annotated per request.

The write completes synchronously before the function returns. Visibility in read queries may lag by a short interval (HTTP 202 Accepted).

Parameters:

dataset (str) – Dataset ID or name.
space (str | None) – Space ID or name. Required when dataset is a name.
annotations (builtins.list[models.AnnotateRecordInput]) – A list of AnnotateRecordInput items. Each item must include a record_id (the dataset example ID) and values (a list of AnnotationInput items with name, and optionally score, label, or text).

Raises:

ApiException – If the REST API returns an error response (e.g. 400/401/403/404/429).

Return type:

None

Response Types#

class Dataset(*, id: Annotated[str, Strict(strict=True)], name: Annotated[str, Strict(strict=True)], space_id: Annotated[str, Strict(strict=True)], created_at: datetime, updated_at: datetime, versions: List[DatasetVersion] | None = None)[source]#

Bases: BaseModel

A dataset is a structured collection of examples used to test and evaluate LLM applications. Datasets allow you to test models consistently across any real-world scenarios and edge cases, quickly identify regressions, and track measurable improvements.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

id (Annotated[str, Strict(strict=True)])
name (Annotated[str, Strict(strict=True)])
space_id (Annotated[str, Strict(strict=True)])
created_at (datetime)
updated_at (datetime)
versions (List[DatasetVersion] | None)

id: StrictStr#

name: StrictStr#

space_id: StrictStr#

created_at: datetime#

updated_at: datetime#

versions: List[DatasetVersion] | None#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of Dataset from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of Dataset from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

class DatasetVersionWithExampleIds(*, id: Annotated[str, Strict(strict=True)], name: Annotated[str, Strict(strict=True)], space_id: Annotated[str, Strict(strict=True)], created_at: datetime, updated_at: datetime, dataset_version_id: Annotated[str, Strict(strict=True)], example_ids: List[Annotated[str, Strict(strict=True)]])[source]#

Bases: BaseModel

A dataset with the IDs of examples that were inserted or updated. Includes the version the examples were written to and the list of affected example IDs.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

id (Annotated[str, Strict(strict=True)])
name (Annotated[str, Strict(strict=True)])
space_id (Annotated[str, Strict(strict=True)])
created_at (datetime)
updated_at (datetime)
dataset_version_id (Annotated[str, Strict(strict=True)])
example_ids (List[Annotated[str, Strict(strict=True)]])

id: StrictStr#

name: StrictStr#

space_id: StrictStr#

created_at: datetime#

updated_at: datetime#

dataset_version_id: StrictStr#

example_ids: List[StrictStr]#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of DatasetVersionWithExampleIds from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of DatasetVersionWithExampleIds from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

class AnnotateRecordInput(*, record_id: Annotated[str, Strict(strict=True)], values: Annotated[List[AnnotationInput], MinLen(min_length=1)])[source]#

Bases: BaseModel

A single record to annotate in a batch, identified by its record ID.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

record_id (Annotated[str, Strict(strict=True)])
values (Annotated[List[AnnotationInput], FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])])

record_id: StrictStr#

values: Annotated[List[AnnotationInput], Field(min_length=1)]#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of AnnotateRecordInput from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of AnnotateRecordInput from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

class AnnotationInput(*, name: Annotated[str, Strict(strict=True)], score: Annotated[float, Strict(strict=True)] | Annotated[int, Strict(strict=True)] | None = None, label: Annotated[str, Strict(strict=True)] | None = None, text: Annotated[str, Strict(strict=True)] | None = None)[source]#

Bases: BaseModel

An annotation value to set on a record, identified by its annotation config name. Omitting a field leaves the existing value unchanged.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

name (Annotated[str, Strict(strict=True)])
score (Annotated[float, Strict(strict=True)] | Annotated[int, Strict(strict=True)] | None)
label (Annotated[str, Strict(strict=True)] | None)
text (Annotated[str, Strict(strict=True)] | None)

name: StrictStr#

score: StrictFloat | StrictInt | None#

label: StrictStr | None#

text: StrictStr | None#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of AnnotationInput from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of AnnotationInput from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None