datasets#

Use datasets to curate spans from your LLM applications for testing. Read our quickstart guide.

To use in your code, import the following:

from arize.experimental.datasets import ArizeDatasetsClient

class ArizeDatasetsClient(developer_key, api_key, host='flight.arize.com', port=443, scheme='grpc+tls', otlp_endpoint='https://otlp.arize.com/v1')#

Bases: object

ArizeDatasetsClient is a client for interacting with the Arize Datasets API.

Parameters:
  • developer_key (str, required) – Arize provided developer key associated with your user profile, located on the space settings page.

  • api_key (str, required) – Arize provided API key associated with your user profile, located on the space settings page.

  • host (str, optional) – URI endpoint host to send your export request to Arize AI. Defaults to “{DEFAULT_ARIZE_FLIGHT_HOST}”.

  • port (int, optional) – URI endpoint port to send your export request to Arize AI. Defaults to {DEFAULT_ARIZE_FLIGHT_PORT}.

  • scheme (str, optional) – Transport scheme to use for the connection. Defaults to “{DEFAULT_TRANSPORT_SCHEME}”.

  • otlp_endpoint (str, optional) – OTLP endpoint to send experiment traces to Arize. Defaults to “{DEFAULT_ARIZE_OTLP_ENDPOINT}”.

create_dataset(space_id, dataset_name, dataset_type, data, convert_dict_to_json=True)#

Create a new dataset.

Parameters:
  • space_id (str) – The ID of the space where the dataset will be created.

  • dataset_name (str) – The name of the dataset.

  • dataset_type (DatasetType) – The type of the dataset.

  • data (pd.DataFrame) – The data to be included in the dataset.

  • convert_dict_to_json (bool, optional) – Convert dictionary columns to JSON strings for default JSON str columns per Open Inference. Defaults to True.

Returns:

The ID of the created dataset, or None if the creation failed.

Return type:

str

delete_dataset(space_id, dataset_id='', dataset_name='')#

Delete a dataset.

Parameters:
  • space_id (str) – The ID of the space where the dataset is located.

  • dataset_id (str, optional) – The ID of the dataset to delete. Required if dataset_name is not provided.

  • dataset_name (str, optional) – The name of the dataset to delete. Required if dataset_id is not provided.

Returns:

True if the dataset was successfully deleted, False otherwise.

Return type:

bool

Raises:
  • ValueError – If neither dataset_id nor dataset_name is provided.

  • RuntimeError – If the request to delete the dataset fails.

get_dataset(space_id, dataset_id=None, dataset_name=None, dataset_version=None, convert_json_str_to_dict=True)#

Get the data of a dataset.

Parameters:
  • space_id (str) – The ID of the space where the dataset is located.

  • dataset_id (str) – Dataset id. Required if dataset_name is not provided.

  • dataset_name (str) – Dataset name. Required if dataset_id is not provided.

  • dataset_version (str, optional) – version name of the dataset Defaults to “” and gets the latest version by based on creation time

  • convert_json_str_to_dict (bool, optional) – Convert JSON strings to Python dictionaries. For default JSON str columns per Open Inference. Defaults to True.

Returns:

The data of the dataset.

Return type:

pd.DataFrame

get_dataset_versions(space_id, dataset_id='', dataset_name='')#

Get versions information of a dataset.

Parameters:
  • space_id (str) – The ID of the space where the dataset is located.

  • dataset_id (str, optional) – The dataset ID to get versions info for. Required if dataset_name is not provided.

  • dataset_name (str, optional) – The name of the dataset to get versions info for. Required if dataset_id is not provided.

Returns:

A DataFrame containing dataset versions info of the dataset

Return type:

pd.DataFrame

Raises:
  • ValueError – If neither dataset_id nor dataset_name is provided.

  • RuntimeError – If the request to get dataset versions fails.

get_experiment(space_id, experiment_name=None, dataset_name=None, experiment_id=None)#

Retrieve experiment data from Arize.

Parameters:
  • space_id (str) – The ID of the space where the experiment is located.

  • experiment_name (Optional[str]) – The name of the experiment. Required if experiment_id is not provided.

  • dataset_name (Optional[str]) – The name of the dataset associated with the experiment. Required if experiment_id is not provided.

  • experiment_id (Optional[str]) – The ID of the experiment. Required if experiment_name and dataset_name are not provided.

Returns:

A pandas DataFrame containing the experiment data,

or None if the retrieval fails.

Return type:

Optional[pd.DataFrame]

Raises:
  • ValueError – If neither experiment_id nor both experiment_name and dataset_name are provided.

  • RuntimeError – If the experiment retrieval fails.

Note

You must provide either the experiment_id or both the experiment_name and dataset_name.

list_datasets(space_id)#

List all datasets in a space.

Parameters:

space_id (str) – The ID of the space to list datasets for.

Returns:

A table summary of the datasets in the space.

Return type:

pd.DataFrame

log_experiment(space_id, experiment_name, experiment_df, task_columns, evaluator_columns=None, dataset_id='', dataset_name='')#

Log an experiment to Arize.

Parameters:
  • space_id (str) – The ID of the space where the experiment will be logged.

  • experiment_name (str) – The name of the experiment.

  • experiment_df (pd.DataFrame) – The data to be logged.

  • task_columns (ExperimentTaskResultColumnNames) – The column names for task results.

  • evaluator_columns (Optional[Dict[str, EvaluationResultColumnNames]]) – The column names for evaluator results.

  • dataset_id (str, optional) – The ID of the dataset associated with the experiment. Required if dataset_name is not provided. Defaults to “”.

  • dataset_name (str, optional) – The name of the dataset associated with the experiment. Required if dataset_id is not provided. Defaults to “”.

Examples

>>> # Example DataFrame:
>>> df = pd.DataFrame({
...     "example_id": ["1", "2"],
...     "result": ["success", "failure"],
...     "accuracy": [0.95, 0.85],
...     "ground_truth": ["A", "B"],
...     "explanation_text": ["Good match", "Poor match"],
...     "confidence": [0.9, 0.7],
...     "model_version": ["v1", "v2"],
...     "custom_metric": [0.8, 0.6],
...})
...
>>> # Define column mappings for task
>>> task_cols = ExperimentTaskResultColumnNames(
...    example_id="example_id", result="result"
...)
>>> # Define column mappings for evaluator
>>> evaluator_cols = EvaluationResultColumnNames(
...     score="accuracy",
...     label="ground_truth",
...     explanation="explanation_text",
...     metadata={
...         "confidence": None,  # Will use "confidence" column
...         "version": "model_version",  # Will use "model_version" column
...         "custom_metric": None,  # Will use "custom_metric" column
...     },
... )
>>> # Use with ArizeDatasetsClient.log_experiment()
>>> ArizeDatasetsClient.log_experiment(
...     space_id="my_space_id",
...     experiment_name="my_experiment",
...     experiment_df=df,
...     task_columns=task_cols,
...     evaluator_columns={"my_evaluator": evaluator_cols},
...     dataset_name="my_dataset_name",
... )
Returns:

The ID of the logged experiment, or None if the logging failed.

Return type:

Optional[str]

run_experiment(space_id, experiment_name, task, dataset_df=None, dataset_id=None, dataset_name=None, evaluators=None, dry_run=False, concurrency=3, set_global_tracer_provider=False, exit_on_error=False)#

Run an experiment on a dataset and upload the results.

This function initializes an experiment, retrieves or uses a provided dataset, runs the experiment with specified tasks and evaluators, and uploads the results.

Parameters:
  • space_id (str) – The ID of the space where the experiment will be run.

  • experiment_name (str) – The name of the experiment.

  • task (ExperimentTask) – The task to be performed in the experiment.

  • dataset_df (Optional[pd.DataFrame], optional) – The dataset as a pandas DataFrame. If not provided, the dataset will be downloaded using dataset_id or dataset_name. Defaults to None.

  • dataset_id (Optional[str], optional) – The ID of the dataset to use. Required if dataset_df and dataset_name are not provided. Defaults to None.

  • dataset_name (Optional[str], optional) – The name of the dataset to use. Used if dataset_df and dataset_id are not provided. Defaults to None.

  • evaluators (Optional[Evaluators], optional) – The evaluators to use in the experiment. Defaults to None.

  • dry_run (bool) – If True, the experiment result will not be uploaded to Arize. Defaults to False.

  • concurrency (int) – The number of concurrent tasks to run. Defaults to 3.

  • set_global_tracer_provider (bool) – If True, sets the global tracer provider for the experiment. Defaults to False.

  • exit_on_error (bool) – If True, the experiment will stop running on first occurrence of an error.

Returns:

A tuple of experiment ID and experiment result DataFrame. If dry_run is True, the experiment ID will be an empty string.

Return type:

Tuple[str, pd.DataFrame]

Raises:
  • ValueError – If dataset_id and dataset_name are both not provided, or if the dataset is empty.

  • RuntimeError – If experiment initialization, dataset download, or result upload fails.

update_dataset(space_id, data, dataset_id='', dataset_name='')#

Update an existing dataset by creating a new version.

Parameters:
  • space_id (str) – The ID of the space where the dataset is located.

  • data (pd.DataFrame) – The updated data to be included in the dataset.

  • dataset_id (str, optional) – The ID of the dataset to update. Required if dataset_name is not provided.

  • dataset_name (str, optional) – The name of the dataset to update. Required if dataset_id is not provided.

Returns:

The ID of the updated dataset.

Return type:

str

Raises:
  • ValueError – If neither dataset_id nor dataset_name is provided.

  • RuntimeError – If validation of the data fails or the update operation fails.