Spans#

class SpansClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#

Bases: object

Client for logging LLM tracing spans and evaluations to Arize.

This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via arize.ArizeClient.

Parameters:

sdk_config (SDKConfiguration) – Resolved SDK configuration.
generated_client (ApiClient) – Shared generated API client instance.

List spans for a project within a time range.

Spans are returned in descending start-time order (most recent first). If start_time and end_time are not provided, the query covers the last seven days relative to the time of the request.

Parameters:

project (str) – Project name or global ID (base64) to list spans for. If the value is a name, space must also be provided.
space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when project is a name.
start_time (datetime | None) – Inclusive lower bound of the time window. Defaults to seven days before the request time.
end_time (datetime | None) – Exclusive upper bound of the time window. Defaults to the request time.

filter (str | None) –

Optional filter expression to narrow results. Supports equality, comparison, and SQL-style AND/OR operators. Examples:

"status_code = 'ERROR'"
"eval.Custom_eval_correctness.label = 'correct'"
"annotation.Correctness.label = 'Correct'"
"latency_ms > 1789"
"status_code = 'ERROR' AND eval.Custom_eval_correctness.label = 'correct'"
"status_code = 'ERROR' OR eval.Custom_eval_correctness.label = 'correct'"

limit (int) – Maximum number of spans to return. The server enforces an upper bound. Defaults to 100.
cursor (str | None) – Opaque pagination cursor returned from a previous response.

Returns:

A response object with the spans and pagination information.

Raises:

ApiException – If the REST API returns an error response (e.g. 401/403/429).

Return type:

SpansList200Response

log(*, space_id: str, project_name: str, dataframe: pd.DataFrame, evals_dataframe: pd.DataFrame | None = None, datetime_format: str = DEFAULT_DATETIME_FMT, validate: bool = True, timeout: float | None = None, tmp_dir: str = '') → requests.Response[source]#

Logs a pandas dataframe containing LLM tracing data to Arize via a POST request.

Returns a Response object from the Requests HTTP library to ensure successful delivery of records.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – The dataframe containing the LLM traces.
evals_dataframe (pandas.DataFrame | None) – A dataframe containing LLM evaluations data. The evaluations are joined to their corresponding spans via a left outer join, i.e., using only context.span_id from the spans dataframe. Defaults to None.
datetime_format (str) – format for the timestamp captured in the LLM traces. Defaults to “%Y-%m-%dT%H:%M:%S.%f+00:00”.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Returns:

Response object from the HTTP request (only returned on HTTP 2xx).

Raises:

MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403 (invalid API key or space ID). Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response (e.g. 400, 422, 429, 5xx). Raised immediately to prevent further uploads when the server signals an error.

Return type:

requests.Response

update_evaluations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True, force_http: bool = False, timeout: float | None = None, tmp_dir: str = '') → flight_pb2.WriteSpanEvaluationResponse[source]#

Logs a pandas dataframe containing LLM evaluations data to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each evaluation to its respective span.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – A dataframe containing LLM evaluations data.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.
force_http (bool) – Force the use of HTTP for data upload. Defaults to False.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.

Raises:

MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403. Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response. Raised immediately to prevent further uploads when the server signals an error.

Return type:

flight_pb2.WriteSpanEvaluationResponse

update_annotations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True) → flight_pb2.WriteSpanAnnotationResponse[source]#

Logs a pandas dataframe containing LLM span annotations to Arize via a Flight gRPC request.

The dataframe must contain a column context.span_id such that Arize can assign each annotation to its respective span. Annotation columns should follow the pattern annotation.<name>.<suffix> where suffix is label, score, or text. An optional annotation.notes column can be included for free-form text notes.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – A dataframe containing LLM annotation data.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.

Return type:

flight_pb2.WriteSpanAnnotationResponse

update_metadata(*, space_id: str, project_name: str, dataframe: DataFrame, patch_document_column_name: str = 'patch_document', validate: bool = True) → dict[str, Any][source]#

Log metadata updates using JSON Merge Patch format.

This method is only supported for LLM model types.

The dataframe must contain a column context.span_id to identify spans and either:

A column with JSON patch documents (specified by patch_document_column_name), or
One or more columns with prefix attributes.metadata. that will be automatically converted to a patch document (e.g., attributes.metadata.tag → {“tag”: value}).

If both methods are used, the explicit patch document is applied after the individual field updates. The patches will be applied to the attributes.metadata field of each span.

Type Handling:

The client primarily supports string, integer, and float data types.
Boolean values are converted to string representations.
Nested JSON objects and arrays are serialized to JSON strings during transmission.
Setting a field to None or null will set the field to JSON null in the metadata. Note: This differs from standard JSON Merge Patch where null values remove fields.

Parameters:

space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (pandas.DataFrame) – DataFrame with span_ids and either patch documents or metadata field columns.
patch_document_column_name (str) – Name of the column containing JSON patch documents. Defaults to “patch_document”.
validate (bool) – When set to True, validation is run before sending data.

Returns:

spans_processed: Total number of spans in the input dataframe
spans_updated: Count of successfully updated span metadata records
spans_failed: Count of spans that failed to update
errors: List of dictionaries with ‘span_id’ and ‘error_message’ keys for each failed span

Error types from the server include:

parse_failure: Failed to parse JSON metadata
patch_failure: Failed to apply JSON patch
type_conflict: Type conflict in metadata
connection_failure: Connection issues
segment_not_found: No matching segment found
druid_rejection: Backend rejected the update

Return type:

Dictionary containing update results with the following keys

Raises:

AuthError – When API key or space ID is missing.
ValidationFailure – When validation of the dataframe or values fails.
ImportError – When required tracing dependencies are missing.
ArrowInvalid – When the dataframe cannot be converted to Arrow format.
RuntimeError – If the request fails or no response is received.

Examples

Method 1: Using a patch document

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "patch_document": [
...             {"tag": "important"},
...             {"priority": "high"},
...         ],
...     }
... )

Method 2: Using direct field columns

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1", "span2"],
...         "attributes.metadata.tag": ["important", "standard"],
...         "attributes.metadata.priority": ["high", "medium"],
...     }
... )

Method 3: Combining both approaches

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.tag": ["important"],
...         "patch_document": [
...             {"priority": "high"}
...         ],  # Overrides conflicting fields
...     }
... )

Method 4: Setting fields to null

>>> df = pd.DataFrame(
...     {
...         "context.span_id": ["span1"],
...         "attributes.metadata.old_field": [
...             None
...         ],  # Sets field to JSON null
...         "patch_document": [
...             {"other_field": None}
...         ],  # Also sets field to JSON null
...     }
... )

export_to_df(*, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) → pd.DataFrame[source]#

Export span data from Arize to a pandas.DataFrame.

Retrieves trace/span data from the specified project within a time range and returns it as a pandas.DataFrame. Supports filtering with SQL-like WHERE clauses and similarity search for semantic retrieval.

Returns:

DataFrame containing the requested span data with columns: for span metadata, attributes, events, and any custom fields.

Return type:

pandas.DataFrame

Parameters:

space_id (str)
project_name (str)
start_time (datetime)
end_time (datetime)
where (str)
columns (builtins.list | None)
stream_chunk_size (int | None)

export_to_parquet(*, path: str, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) → None[source]#

Export span data from Arize to a Parquet file.

Retrieves trace/span data from the specified project within a time range and writes it directly to a Parquet file at the specified path. Supports filtering with SQL-like WHERE clauses for efficient querying. Ideal for large datasets and long-term storage.

Parameters:

path (str) – The file path where the Parquet file will be written.
space_id (str) – The space ID where the project resides.
project_name (str) – The name of the project to export span data from.
start_time (datetime) – Start of the time range (inclusive) as a datetime object.
end_time (datetime) – End of the time range (inclusive) as a datetime object.
where (str) – Optional SQL-like WHERE clause to filter rows (e.g., “span.status_code = ‘ERROR’”).
columns (builtins.list | None) – Optional list of column names to include. If None, all columns are returned.
stream_chunk_size (int | None) – Optional chunk size for streaming large result sets.

Raises:

RuntimeError – If the Flight client request fails or returns no response.

Return type:

None

Notes

Uses Apache Arrow Flight for efficient data transfer
Data is written directly to the specified path as a Parquet file
Large exports may benefit from specifying stream_chunk_size

Response Types#

class SpansList200Response(*, spans: List[Span], pagination: PaginationMetadata)[source]#

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

spans (List[Span])
pagination (PaginationMetadata)

spans: List[Span]#

pagination: PaginationMetadata#

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_str() → str[source]#

Returns the string representation of the model using alias

Return type:: str

to_json() → str[source]#

Returns the JSON representation of the model using alias

Return type:: str

classmethod from_json(json_str: str) → Self | None[source]#

Create an instance of SpansList200Response from a JSON string

Parameters:: json_str (str)
Return type:: Self | None

to_dict() → Dict[str, Any][source]#

Return the dictionary representation of the model using alias.

This has the following differences from calling pydantic’s self.model_dump(by_alias=True):

None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.

Return type:: Dict[str, Any]

classmethod from_dict(obj: Dict[str, Any] | None) → Self | None[source]#

Create an instance of SpansList200Response from a dict

Parameters:: obj (Dict[str, Any] | None)
Return type:: Self | None

to_df(by_alias: bool = False, exclude_none: str | bool = True, json_normalize: bool = False, convert_dtypes: bool = True, expand_field: str = 'additional_properties', expand_prefix: str = '') → pd.DataFrame#

Convert a list of objects to a pandas.DataFrame.

Behavior:

If an item is a Pydantic v2 model, use .model_dump(by_alias=…).
If an item is a mapping (dict-like), use it as-is.
Otherwise, raise a ValueError (unsupported row type).

Parameters:

self (object) – The object instance containing the field to convert.
by_alias (bool) – Use field aliases when dumping Pydantic models.
exclude_none (str | bool) – Control None/NaN column dropping. - False: keep Nones as-is - “all”: drop columns where all values are None/NaN - “any”: drop columns where any value is None/NaN - True: alias for “all”
json_normalize (bool) – If True, flatten nested dicts via pandas.json_normalize.
convert_dtypes (bool) – If True, call DataFrame.convert_dtypes() at the end.
expand_field (str) – If set, look for this field in each row and
columns. (expand its keys into top-level)
expand_prefix (str) – If set, prefix expanded column names with this string.

Returns:

The converted DataFrame.

Return type:

pandas.DataFrame