datasets & experiments#
Use datasets to curate spans from your LLM applications for testing. Run experiments to test different models, prompts, parameters for your LLM apps. Read our quickstart guide for more information.
To use in your code, import the following:
from arize.experimental.datasets import ArizeDatasetsClient
- class ArizeDatasetsClient(developer_key, api_key, host='flight.arize.com', port=443, scheme='grpc+tls', otlp_endpoint='https://otlp.arize.com/v1')#
Bases:
object
ArizeDatasetsClient is a client for interacting with the Arize Datasets API.
- Parameters:
developer_key (str, required) – Arize provided developer key associated with your user profile, located on the space settings page.
api_key (str, required) – Arize provided API key associated with your user profile, located on the space settings page.
host (str, optional) – URI endpoint host to send your export request to Arize AI. Defaults to “{DEFAULT_ARIZE_FLIGHT_HOST}”.
port (int, optional) – URI endpoint port to send your export request to Arize AI. Defaults to {DEFAULT_ARIZE_FLIGHT_PORT}.
scheme (str, optional) – Transport scheme to use for the connection. Defaults to “{DEFAULT_TRANSPORT_SCHEME}”.
otlp_endpoint (str, optional) – OTLP endpoint to send experiment traces to Arize. Defaults to “{DEFAULT_ARIZE_OTLP_ENDPOINT}”.
- create_dataset(space_id, dataset_name, dataset_type, data, convert_dict_to_json=True)#
Create a new dataset.
- Parameters:
space_id (str) – The ID of the space where the dataset will be created.
dataset_name (str) – The name of the dataset.
dataset_type (DatasetType) – The type of the dataset.
data (pd.DataFrame) – The data to be included in the dataset.
convert_dict_to_json (bool, optional) – Convert dictionary columns to JSON strings for default JSON str columns per Open Inference. Defaults to True.
- Returns:
The ID of the created dataset, or None if the creation failed.
- Return type:
str
- delete_dataset(space_id, dataset_id='', dataset_name='')#
Delete a dataset.
- Parameters:
space_id (str) – The ID of the space where the dataset is located.
dataset_id (str, optional) – The ID of the dataset to delete. Required if dataset_name is not provided.
dataset_name (str, optional) – The name of the dataset to delete. Required if dataset_id is not provided.
- Returns:
True if the dataset was successfully deleted, False otherwise.
- Return type:
bool
- Raises:
ValueError – If neither dataset_id nor dataset_name is provided.
RuntimeError – If the request to delete the dataset fails.
- get_dataset(space_id, dataset_id=None, dataset_name=None, dataset_version=None, convert_json_str_to_dict=True)#
Get the data of a dataset.
- Parameters:
space_id (str) – The ID of the space where the dataset is located.
dataset_id (str) – Dataset id. Required if dataset_name is not provided.
dataset_name (str) – Dataset name. Required if dataset_id is not provided.
dataset_version (str, optional) – version name of the dataset Defaults to “” and gets the latest version by based on creation time
convert_json_str_to_dict (bool, optional) – Convert JSON strings to Python dictionaries. For default JSON str columns per Open Inference. Defaults to True.
- Returns:
The data of the dataset.
- Return type:
pd.DataFrame
- get_dataset_versions(space_id, dataset_id='', dataset_name='')#
Get versions information of a dataset.
- Parameters:
space_id (str) – The ID of the space where the dataset is located.
dataset_id (str, optional) – The dataset ID to get versions info for. Required if dataset_name is not provided.
dataset_name (str, optional) – The name of the dataset to get versions info for. Required if dataset_id is not provided.
- Returns:
A DataFrame containing dataset versions info of the dataset
- Return type:
pd.DataFrame
- Raises:
ValueError – If neither dataset_id nor dataset_name is provided.
RuntimeError – If the request to get dataset versions fails.
- get_experiment(space_id, experiment_name=None, dataset_name=None, experiment_id=None)#
Retrieve experiment data from Arize.
- Parameters:
space_id (str) – The ID of the space where the experiment is located.
experiment_name (Optional[str]) – The name of the experiment. Required if experiment_id is not provided.
dataset_name (Optional[str]) – The name of the dataset associated with the experiment. Required if experiment_id is not provided.
experiment_id (Optional[str]) – The ID of the experiment. Required if experiment_name and dataset_name are not provided.
- Returns:
- A pandas DataFrame containing the experiment data,
or None if the retrieval fails.
- Return type:
Optional[pd.DataFrame]
- Raises:
ValueError – If neither experiment_id nor both experiment_name and dataset_name are provided.
RuntimeError – If the experiment retrieval fails.
Note
You must provide either the experiment_id or both the experiment_name and dataset_name.
- list_datasets(space_id)#
List all datasets in a space.
- Parameters:
space_id (str) – The ID of the space to list datasets for.
- Returns:
A table summary of the datasets in the space.
- Return type:
pd.DataFrame
- log_experiment(space_id, experiment_name, experiment_df, task_columns, evaluator_columns=None, dataset_id='', dataset_name='')#
Log an experiment to Arize.
- Parameters:
space_id (str) – The ID of the space where the experiment will be logged.
experiment_name (str) – The name of the experiment.
experiment_df (pd.DataFrame) – The data to be logged.
task_columns (ExperimentTaskResultColumnNames) – The column names for task results.
evaluator_columns (Optional[Dict[str, EvaluationResultColumnNames]]) – The column names for evaluator results.
dataset_id (str, optional) – The ID of the dataset associated with the experiment. Required if dataset_name is not provided. Defaults to “”.
dataset_name (str, optional) – The name of the dataset associated with the experiment. Required if dataset_id is not provided. Defaults to “”.
Examples
>>> # Example DataFrame: >>> df = pd.DataFrame({ ... "example_id": ["1", "2"], ... "result": ["success", "failure"], ... "accuracy": [0.95, 0.85], ... "ground_truth": ["A", "B"], ... "explanation_text": ["Good match", "Poor match"], ... "confidence": [0.9, 0.7], ... "model_version": ["v1", "v2"], ... "custom_metric": [0.8, 0.6], ...}) ... >>> # Define column mappings for task >>> task_cols = ExperimentTaskResultColumnNames( ... example_id="example_id", result="result" ...) >>> # Define column mappings for evaluator >>> evaluator_cols = EvaluationResultColumnNames( ... score="accuracy", ... label="ground_truth", ... explanation="explanation_text", ... metadata={ ... "confidence": None, # Will use "confidence" column ... "version": "model_version", # Will use "model_version" column ... "custom_metric": None, # Will use "custom_metric" column ... }, ... ) >>> # Use with ArizeDatasetsClient.log_experiment() >>> ArizeDatasetsClient.log_experiment( ... space_id="my_space_id", ... experiment_name="my_experiment", ... experiment_df=df, ... task_columns=task_cols, ... evaluator_columns={"my_evaluator": evaluator_cols}, ... dataset_name="my_dataset_name", ... )
- Returns:
The ID of the logged experiment, or None if the logging failed.
- Return type:
Optional[str]
- run_experiment(space_id, experiment_name, task, dataset_df=None, dataset_id=None, dataset_name=None, evaluators=None, dry_run=False, concurrency=3, set_global_tracer_provider=False, exit_on_error=False)#
Run an experiment on a dataset and upload the results.
This function initializes an experiment, retrieves or uses a provided dataset, runs the experiment with specified tasks and evaluators, and uploads the results.
- Parameters:
space_id (str) – The ID of the space where the experiment will be run.
experiment_name (str) – The name of the experiment.
task (ExperimentTask) – The task to be performed in the experiment.
dataset_df (Optional[pd.DataFrame], optional) – The dataset as a pandas DataFrame. If not provided, the dataset will be downloaded using dataset_id or dataset_name. Defaults to None.
dataset_id (Optional[str], optional) – The ID of the dataset to use. Required if dataset_df and dataset_name are not provided. Defaults to None.
dataset_name (Optional[str], optional) – The name of the dataset to use. Used if dataset_df and dataset_id are not provided. Defaults to None.
evaluators (Optional[Evaluators], optional) – The evaluators to use in the experiment. Defaults to None.
dry_run (bool) – If True, the experiment result will not be uploaded to Arize. Defaults to False.
concurrency (int) – The number of concurrent tasks to run. Defaults to 3.
set_global_tracer_provider (bool) – If True, sets the global tracer provider for the experiment. Defaults to False.
exit_on_error (bool) – If True, the experiment will stop running on first occurrence of an error.
- Returns:
A tuple of experiment ID and experiment result DataFrame. If dry_run is True, the experiment ID will be an empty string.
- Return type:
Tuple[str, pd.DataFrame]
- Raises:
ValueError – If dataset_id and dataset_name are both not provided, or if the dataset is empty.
RuntimeError – If experiment initialization, dataset download, or result upload fails.
- update_dataset(space_id, data, dataset_id='', dataset_name='')#
Update an existing dataset by creating a new version.
- Parameters:
space_id (str) – The ID of the space where the dataset is located.
data (pd.DataFrame) – The updated data to be included in the dataset.
dataset_id (str, optional) – The ID of the dataset to update. Required if dataset_name is not provided.
dataset_name (str, optional) – The name of the dataset to update. Required if dataset_id is not provided.
- Returns:
The ID of the updated dataset.
- Return type:
str
- Raises:
ValueError – If neither dataset_id nor dataset_name is provided.
RuntimeError – If validation of the data fails or the update operation fails.