Spans#

class SpansClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#

Bases: object

Client for logging LLM tracing spans and evaluations to Arize.

This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via arize.ArizeClient.

Parameters:
  • sdk_config (SDKConfiguration) – Resolved SDK configuration.

  • generated_client (ApiClient) – Shared generated API client instance.

delete(*, project: str, span_ids: builtins.list[str], space: str | None = None) SpansDelete200Response | None[source]#

Permanently delete spans by their IDs.

This operation is irreversible. Only spans within the supported lookback window (2 years) are considered; older spans are not affected. If one or more span IDs are not found, they are silently ignored.

Parameters:
  • project (str) – Project name or global ID (base64) containing the spans. If the value is a name, space must also be provided.

  • span_ids (builtins.list[str]) – List of span IDs to delete.

  • space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when project is a name.

Returns:

None when all spans were deleted (HTTP 204). A response object with deleted_span_ids when the server reports a partial deletion (HTTP 200) — retry the original request for a complete result.

Raises:
  • ValueError – If span_ids is empty.

  • ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

SpansDelete200Response | None

list(*, project: str, space: str | None = None, start_time: datetime | None = None, end_time: datetime | None = None, filter: str | None = None, limit: int = 100, cursor: str | None = None) SpansList200Response[source]#

List spans for a project within a time range.

Spans are returned in descending start-time order (most recent first). If start_time and end_time are not provided, the query covers the last seven days relative to the time of the request.

Parameters:
  • project (str) – Project name or global ID (base64) to list spans for. If the value is a name, space must also be provided.

  • space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when project is a name.

  • start_time (datetime | None) – Inclusive lower bound of the time window. Defaults to seven days before the request time.

  • end_time (datetime | None) – Exclusive upper bound of the time window. Defaults to the request time.

  • filter (str | None) –

    Optional filter expression to narrow results. Supports equality, comparison, and SQL-style AND/OR operators. Examples:

    "status_code = 'ERROR'"
    "eval.Custom_eval_correctness.label = 'correct'"
    "annotation.Correctness.label = 'Correct'"
    "latency_ms > 1789"
    "status_code = 'ERROR' AND eval.Custom_eval_correctness.label = 'correct'"
    "status_code = 'ERROR' OR eval.Custom_eval_correctness.label = 'correct'"
    

  • limit (int) – Maximum number of spans to return. The server enforces an upper bound. Defaults to 100.

  • cursor (str | None) – Opaque pagination cursor returned from a previous response.

Returns:

A response object with the spans and pagination information.

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

SpansList200Response

log(*, space_id: str, project_name: str, dataframe: pd.DataFrame, evals_dataframe: pd.DataFrame | None = None, datetime_format: str = DEFAULT_DATETIME_FMT, validate: bool = True, timeout: float | None = None, tmp_dir: str = '') requests.Response[source]#

Logs a pandas dataframe containing LLM tracing data to Arize via a POST request.

Returns a Response object from the Requests HTTP library to ensure successful delivery of records.

Parameters:
  • space_id (str) – The space ID where the project resides.

  • project_name (str) – A unique name to identify your project in the Arize platform.

  • dataframe (pandas.DataFrame) – The dataframe containing the LLM traces.

  • evals_dataframe (pandas.DataFrame | None) – A dataframe containing LLM evaluations data. The evaluations are joined to their corresponding spans via a left outer join, i.e., using only context.span_id from the spans dataframe. Defaults to None.

  • datetime_format (str) – format for the timestamp captured in the LLM traces. Defaults to “%Y-%m-%dT%H:%M:%S.%f+00:00”.

  • validate (bool) – When set to True, validation is run before sending data. Defaults to True.

  • timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.

  • tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Returns:

Response object from the HTTP request (only returned on HTTP 2xx).

Raises:
  • MissingSpaceIDError – If space_id is not provided or empty.

  • MissingProjectNameError – If project_name is not provided or empty.

  • ValidationFailure – If validate=True and validation checks fail.

  • pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.

  • AuthenticationError – If the server returns HTTP 401 or 403 (invalid API key or space ID). Raised immediately to prevent further uploads with bad credentials.

  • APIError – If the server returns any other non-2xx response (e.g. 400, 422, 429, 5xx). Raised immediately to prevent further uploads when the server signals an error.

Return type:

requests.Response

update_evaluations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True, force_http: bool = False, timeout: float | None = None, tmp_dir: str = '') flight_pb2.WriteSpanEvaluationResponse[source]#

Logs a pandas dataframe containing LLM evaluations data to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each evaluation to its respective span.

Parameters:
  • space_id (str) – The space ID where the project resides.

  • project_name (str) – A unique name to identify your project in the Arize platform.

  • dataframe (pandas.DataFrame) – A dataframe containing LLM evaluations data.

  • validate (bool) – When set to True, validation is run before sending data. Defaults to True.

  • force_http (bool) – Force the use of HTTP for data upload. Defaults to False.

  • timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.

  • tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Raises:
  • MissingSpaceIDError – If space_id is not provided or empty.

  • MissingProjectNameError – If project_name is not provided or empty.

  • ValidationFailure – If validate=True and validation checks fail.

  • pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.

  • AuthenticationError – If the server returns HTTP 401 or 403. Raised immediately to prevent further uploads with bad credentials.

  • APIError – If the server returns any other non-2xx response. Raised immediately to prevent further uploads when the server signals an error.

Return type:

flight_pb2.WriteSpanEvaluationResponse

update_annotations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True) flight_pb2.WriteSpanAnnotationResponse[source]#

Logs a pandas dataframe containing LLM span annotations to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each annotation to its respective span. Annotation columns should follow the pattern annotation.<name>.<suffix> where suffix is label, score, or text. An optional annotation.notes column can be included for free-form text notes.

Parameters:
  • space_id (str) – The space ID where the project resides.

  • project_name (str) – A unique name to identify your project in the Arize platform.

  • dataframe (pandas.DataFrame) – A dataframe containing LLM annotation data.

  • validate (bool) – When set to True, validation is run before sending data. Defaults to True.

Return type:

flight_pb2.WriteSpanAnnotationResponse

update_metadata(*, space_id: str, project_name: str, dataframe: DataFrame, patch_document_column_name: str = 'patch_document', validate: bool = True) dict[str, Any][source]#

Log metadata updates using JSON Merge Patch format.

This method is only supported for LLM model types.

The dataframe must contain a column context.span_id to identify spans and either:

  1. A column with JSON patch documents (specified by patch_document_column_name), or

  2. One or more columns with prefix attributes.metadata. that will be automatically converted to a patch document (e.g., attributes.metadata.tag{“tag”: value}).

If both methods are used, the explicit patch document is applied after the individual field updates. The patches will be applied to the attributes.metadata field of each span.

Type Handling:

  • The client primarily supports string, integer, and float data types.

  • Boolean values are converted to string representations.

  • Nested JSON objects and arrays are serialized to JSON strings during transmission.

  • Setting a field to None or null will set the field to JSON null in the metadata. Note: This differs from standard JSON Merge Patch where null values remove fields.

Parameters:
  • space_id (str) – The space ID where the project resides.

  • project_name (str) – A unique name to identify your project in the Arize platform.

  • dataframe (pandas.DataFrame) – DataFrame with span_ids and either patch documents or metadata field columns.

  • patch_document_column_name (str) – Name of the column containing JSON patch documents. Defaults to “patch_document”.

  • validate (bool) – When set to True, validation is run before sending data.

Returns:

  • spans_processed: Total number of spans in the input dataframe

  • spans_updated: Count of successfully updated span metadata records

  • spans_failed: Count of spans that failed to update

  • errors: List of dictionaries with ‘span_id’ and ‘error_message’ keys for each failed span

Error types from the server include:

  • parse_failure: Failed to parse JSON metadata

  • patch_failure: Failed to apply JSON patch

  • type_conflict: Type conflict in metadata

  • connection_failure: Connection issues

  • segment_not_found: No matching segment found

  • druid_rejection: Backend rejected the update

Return type:

Dictionary containing update results with the following keys

Raises:
  • AuthError – When API key or space ID is missing.

  • ValidationFailure – When validation of the dataframe or values fails.

  • ImportError – When required tracing dependencies are missing.

  • ArrowInvalid – When the dataframe cannot be converted to Arrow format.

  • RuntimeError – If the request fails or no response is received.

Examples

Method 1: Using a patch document

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "patch_document": [
...             {"tag": "important"},
...             {"priority": "high"},
...         ],
...     }
... )

Method 2: Using direct field columns

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "attributes.metadata.tag": ["important", "standard"],
...         "attributes.metadata.priority": ["high", "medium"],
...     }
... )

Method 3: Combining both approaches

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.tag": ["important"],
...         "patch_document": [
...             {"priority": "high"}
...         ],  # Overrides conflicting fields
...     }
... )

Method 4: Setting fields to null

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.old_field": [
...             None
...         ],  # Sets field to JSON null
...         "patch_document": [
...             {"other_field": None}
...         ],  # Also sets field to JSON null
...     }
... )
export_to_df(*, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) pd.DataFrame[source]#

Export span data from Arize to a pandas.DataFrame.

Retrieves trace/span data from the specified project within a time range and returns it as a pandas.DataFrame. Supports filtering with SQL-like WHERE clauses and similarity search for semantic retrieval.

Returns:

DataFrame containing the requested span data with columns

for span metadata, attributes, events, and any custom fields.

Return type:

pandas.DataFrame

Parameters:
  • space_id (str)

  • project_name (str)

  • start_time (datetime)

  • end_time (datetime)

  • where (str)

  • columns (builtins.list | None)

  • stream_chunk_size (int | None)

export_to_parquet(*, path: str, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) None[source]#

Export span data from Arize to a Parquet file.

Retrieves trace/span data from the specified project within a time range and writes it directly to a Parquet file at the specified path. Supports filtering with SQL-like WHERE clauses for efficient querying. Ideal for large datasets and long-term storage.

Parameters:
  • path (str) – The file path where the Parquet file will be written.

  • space_id (str) – The space ID where the project resides.

  • project_name (str) – The name of the project to export span data from.

  • start_time (datetime) – Start of the time range (inclusive) as a datetime object.

  • end_time (datetime) – End of the time range (inclusive) as a datetime object.

  • where (str) – Optional SQL-like WHERE clause to filter rows (e.g., “span.status_code = ‘ERROR’”).

  • columns (builtins.list | None) – Optional list of column names to include. If None, all columns are returned.

  • stream_chunk_size (int | None) – Optional chunk size for streaming large result sets.

Raises:

RuntimeError – If the Flight client request fails or returns no response.

Return type:

None

Notes

  • Uses Apache Arrow Flight for efficient data transfer

  • Data is written directly to the specified path as a Parquet file

  • Large exports may benefit from specifying stream_chunk_size

Response Types#

class SpansDelete200Response(*, deleted_span_ids: List[Annotated[str, Strict(strict=True)]])[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

deleted_span_ids (List[Annotated[str, Strict(strict=True)]])

deleted_span_ids: List[StrictStr]#
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() str[source]#

Returns the string representation of the model using alias

Return type:

str

to_json() str[source]#

Returns the JSON representation of the model using alias

Return type:

str

classmethod from_json(json_str: str) Self | None[source]#

Create an instance of SpansDelete200Response from a JSON string

Parameters:

json_str (str)

Return type:

Self | None

to_dict() Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

  • None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:

Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) Self | None[source]#

Create an instance of SpansDelete200Response from a dict

Parameters:

obj (Dict[str, Any] | None)

Return type:

Self | None

class SpansList200Response(*, spans: List[Span], pagination: PaginationMetadata)[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • spans (List[Span])

  • pagination (PaginationMetadata)

spans: List[Span]#
pagination: PaginationMetadata#
model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() str[source]#

Returns the string representation of the model using alias

Return type:

str

to_json() str[source]#

Returns the JSON representation of the model using alias

Return type:

str

classmethod from_json(json_str: str) Self | None[source]#

Create an instance of SpansList200Response from a JSON string

Parameters:

json_str (str)

Return type:

Self | None

to_dict() Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

  • None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:

Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) Self | None[source]#

Create an instance of SpansList200Response from a dict

Parameters:

obj (Dict[str, Any] | None)

Return type:

Self | None

to_df(by_alias: bool = False, exclude_none: str | bool = True, json_normalize: bool = False, convert_dtypes: bool = True, expand_field: str = 'additional_properties', expand_prefix: str = '') pd.DataFrame#

Convert a list of objects to a pandas.DataFrame.

Behavior:
  • If an item is a Pydantic v2 model, use .model_dump(by_alias=…).

  • If an item is a mapping (dict-like), use it as-is.

  • Otherwise, raise a ValueError (unsupported row type).

Parameters:
  • self (object) – The object instance containing the field to convert.

  • by_alias (bool) – Use field aliases when dumping Pydantic models.

  • exclude_none (str | bool) – Control None/NaN column dropping. - False: keep Nones as-is - “all”: drop columns where all values are None/NaN - “any”: drop columns where any value is None/NaN - True: alias for “all”

  • json_normalize (bool) – If True, flatten nested dicts via pandas.json_normalize.

  • convert_dtypes (bool) – If True, call DataFrame.convert_dtypes() at the end.

  • expand_field (str) – If set, look for this field in each row and

  • columns. (expand its keys into top-level)

  • expand_prefix (str) – If set, prefix expanded column names with this string.

Returns:

The converted DataFrame.

Return type:

pandas.DataFrame