Spans#

class SpansClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#

Bases: object

Client for logging LLM tracing spans and evaluations to Arize.

This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via arize.ArizeClient.

Parameters:

sdk_config (SDKConfiguration) – Resolved SDK configuration.
generated_client (ApiClient) – Shared generated API client instance.

delete(*, project: str, span_ids: builtins.list[str], space: str | None = None) → SpanDeletePartialResponse | None[source]#

Permanently delete spans by their IDs.

This operation is irreversible. Only spans within the supported lookback window (2 years) are considered; older spans are not affected. If one or more span IDs are not found, they are silently ignored.

Parameters:

project (str) – Project name or identifier (base64) containing the spans. If the value is a name, space must also be provided.
span_ids (builtins.list[str]) – List of span IDs to delete.
space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when project is a name.

Returns:

None when all spans were deleted (HTTP 204). A response object with deleted_span_ids when the server reports a partial deletion (HTTP 200) — retry the original request for a complete result.

Raises:

ValueError – If span_ids is empty.
ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

SpanDeletePartialResponse | None

List spans for a project within a time range.

Spans are returned in descending start-time order (most recent first). If start_time and end_time are not provided, the query covers the last seven days relative to the time of the request.

Parameters:

project (str) – Project name or identifier (base64) to list spans for. If the value is a name, space must also be provided.
space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when project is a name.
start_time (datetime | None) – Inclusive lower bound of the time window. Defaults to seven days before the request time.
end_time (datetime | None) – Exclusive upper bound of the time window. Defaults to the request time.

filter (str | None) –

Optional filter expression to narrow results. Supports equality, comparison, and SQL-style AND/OR operators. Examples:

"status_code = 'ERROR'"
"eval.Custom_eval_correctness.label = 'correct'"
"annotation.Correctness.label = 'Correct'"
"latency_ms > 1789"
"status_code = 'ERROR' AND eval.Custom_eval_correctness.label = 'correct'"
"status_code = 'ERROR' OR eval.Custom_eval_correctness.label = 'correct'"

limit (int) – Maximum number of spans to return. The server enforces an upper bound. Defaults to 50.
cursor (str | None) – Opaque pagination cursor returned from a previous response.

Returns:

A response object with the spans and pagination information.

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

SpanListResponse

annotate(*, project: str, space: str | None = None, annotations: builtins.list[AnnotateRecordInput], start_time: datetime | None = None, end_time: datetime | None = None) → None[source]#

Write human annotations to a batch of spans in a project.

Annotations are upserted by annotation config name for each span. Submitting the same annotation config name for the same span overwrites the previous value. Retrying on network failure will not create duplicates.

Up to 1000 spans may be annotated per request. Spans are looked up within the specified time window (defaulting to the last 31 days). If any span ID in the batch is not found within the window, the entire request is rejected with a 404 error.

The write completes synchronously before the function returns. Visibility in read queries may lag by a short interval (HTTP 202 Accepted).

Parameters:

project (str) – Project ID or name.
space (str | None) – Space ID or name. Required when project is a name.
annotations (builtins.list[AnnotateRecordInput]) – A list of AnnotateRecordInput items. Each item must include a record_id (the span ID) and values (a list of AnnotationInput items with name, and optionally score, label, or text).
start_time (datetime | None) – Start of the time window used to look up spans. Defaults to 31 days before the request time.
end_time (datetime | None) – End of the time window used to look up spans. Defaults to the request time.

Raises:

ApiException – If the REST API returns an error response (e.g. 400/401/403/404/429).

Return type:

None

log(*, space_id: str, project_name: str, dataframe: pd.DataFrame, evals_dataframe: pd.DataFrame | None = None, datetime_format: str = DEFAULT_DATETIME_FMT, validate: bool = True, timeout: float | None = None, tmp_dir: str = '') → requests.Response[source]#

Logs a pandas dataframe containing LLM tracing data to Arize via a POST request.

Returns a Response object from the Requests HTTP library to ensure successful delivery of records.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – The dataframe containing the LLM traces.
evals_dataframe (pandas.DataFrame | None) – A dataframe containing LLM evaluations data. The evaluations are joined to their corresponding spans via a left outer join, i.e., using only context.span_id from the spans dataframe. Defaults to None.
datetime_format (str) – format for the timestamp captured in the LLM traces. Defaults to “%Y-%m-%dT%H:%M:%S.%f+00:00”.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Returns:

Response object from the HTTP request (only returned on HTTP 2xx).

Raises:

MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403 (invalid API key or space ID). Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response (e.g. 400, 422, 429, 5xx). Raised immediately to prevent further uploads when the server signals an error.

Return type:

requests.Response

update_evaluations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True, force_http: bool = False, timeout: float | None = None, tmp_dir: str = '') → flight_pb2.WriteSpanEvaluationResponse[source]#

Logs a pandas dataframe containing LLM evaluations data to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each evaluation to its respective span.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – A dataframe containing LLM evaluations data.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.
force_http (bool) – Force the use of HTTP for data upload. Defaults to False.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Raises:

MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403. Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response. Raised immediately to prevent further uploads when the server signals an error.

Return type:

flight_pb2.WriteSpanEvaluationResponse

update_annotations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True) → flight_pb2.WriteSpanAnnotationResponse[source]#

Logs a pandas dataframe containing LLM span annotations to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each annotation to its respective span. Annotation columns should follow the pattern annotation.<name>.<suffix> where suffix is label, score, or text. An optional annotation.notes column can be included for free-form text notes.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – A dataframe containing LLM annotation data.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.

Return type:

flight_pb2.WriteSpanAnnotationResponse

update_metadata(*, space_id: str, project_name: str, dataframe: DataFrame, patch_document_column_name: str = 'patch_document', validate: bool = True) → dict[str, Any][source]#

Log metadata updates using JSON Merge Patch format.

This method is only supported for LLM model types.

The dataframe must contain a column context.span_id to identify spans and either:

A column with JSON patch documents (specified by patch_document_column_name), or
One or more columns with prefix attributes.metadata. that will be automatically converted to a patch document (e.g., attributes.metadata.tag → {“tag”: value}).

If both methods are used, the explicit patch document is applied after the individual field updates. The patches will be applied to the attributes.metadata field of each span.

Type Handling:

The client primarily supports string, integer, and float data types.
Boolean values are converted to string representations.
Nested JSON objects and arrays are serialized to JSON strings during transmission.
Setting a field to None or null will set the field to JSON null in the metadata. Note: This differs from standard JSON Merge Patch where null values remove fields.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – DataFrame with span_ids and either patch documents or metadata field columns.
patch_document_column_name (str) – Name of the column containing JSON patch documents. Defaults to “patch_document”.
validate (bool) – When set to True, validation is run before sending data.

Returns:

spans_processed: Total number of spans in the input dataframe
spans_updated: Count of successfully updated span metadata records
spans_failed: Count of spans that failed to update
errors: List of dictionaries with ‘span_id’ and ‘error_message’ keys for each failed span

Error types from the server include:

parse_failure: Failed to parse JSON metadata
patch_failure: Failed to apply JSON patch
type_conflict: Type conflict in metadata
connection_failure: Connection issues
segment_not_found: No matching segment found
druid_rejection: Backend rejected the update

Return type:

Dictionary containing update results with the following keys

Raises:

AuthError – When API key or space ID is missing.
ValidationFailure – When validation of the dataframe or values fails.
ImportError – When required tracing dependencies are missing.
ArrowInvalid – When the dataframe cannot be converted to Arrow format.
RuntimeError – If the request fails or no response is received.

Examples

Method 1: Using a patch document

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "patch_document": [
...             {"tag": "important"},
...             {"priority": "high"},
...         ],
...     }
... )

Method 2: Using direct field columns

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "attributes.metadata.tag": ["important", "standard"],
...         "attributes.metadata.priority": ["high", "medium"],
...     }
... )

Method 3: Combining both approaches

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.tag": ["important"],
...         "patch_document": [
...             {"priority": "high"}
...         ],  # Overrides conflicting fields
...     }
... )

Method 4: Setting fields to null

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.old_field": [
...             None
...         ],  # Sets field to JSON null
...         "patch_document": [
...             {"other_field": None}
...         ],  # Also sets field to JSON null
...     }
... )

export_to_df(*, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) → pd.DataFrame[source]#

Export span data from Arize to a pandas.DataFrame.

Retrieves trace/span data from the specified project within a time range and returns it as a pandas.DataFrame. Supports filtering with SQL-like WHERE clauses and similarity search for semantic retrieval.

Returns:

DataFrame containing the requested span data with columns: for span metadata, attributes, events, and any custom fields.

Return type:

pandas.DataFrame

Parameters:

space_id (str)
project_name (str)
start_time (datetime)
end_time (datetime)
where (str)
columns (builtins.list | None)
stream_chunk_size (int | None)

export_to_parquet(*, path: str, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) → None[source]#

Export span data from Arize to a Parquet file.

Retrieves trace/span data from the specified project within a time range and writes it directly to a Parquet file at the specified path. Supports filtering with SQL-like WHERE clauses for efficient querying. Ideal for large datasets and long-term storage.

Parameters:

path (str) – The file path where the Parquet file will be written.
space_id (str) – The space ID where the project resides.
project_name (str) – The name of the project to export span data from.
start_time (datetime) – Start of the time range (inclusive) as a datetime object.
end_time (datetime) – End of the time range (inclusive) as a datetime object.
where (str) – Optional SQL-like WHERE clause to filter rows (e.g., “span.status_code = ‘ERROR’”).
columns (builtins.list | None) – Optional list of column names to include. If None, all columns are returned.
stream_chunk_size (int | None) – Optional chunk size for streaming large result sets.

Raises:

RuntimeError – If the Flight client request fails or returns no response.

Return type:

None

Notes

Uses Apache Arrow Flight for efficient data transfer
Data is written directly to the specified path as a Parquet file
Large exports may benefit from specifying stream_chunk_size

Spans#

Response Types#