Spans#
- class SpansClient(*, sdk_config: SDKConfiguration, generated_client: ApiClient)[source]#
Bases:
objectClient for logging LLM tracing spans and evaluations to Arize.
This class is primarily intended for internal use within the SDK. Users are highly encouraged to access resource-specific functionality via
arize.ArizeClient.- Parameters:
sdk_config (SDKConfiguration) – Resolved SDK configuration.
generated_client (ApiClient) – Shared generated API client instance.
- list(*, project: str, space: str | None = None, start_time: datetime | None = None, end_time: datetime | None = None, filter: str | None = None, limit: int = 100, cursor: str | None = None) SpansList200Response[source]#
List spans for a project within a time range.
Spans are returned in descending start-time order (most recent first). If
start_timeandend_timeare not provided, the query covers the last seven days relative to the time of the request.- Parameters:
project (str) – Project name or global ID (base64) to list spans for. If the value is a name,
spacemust also be provided.space (str | None) – Optional space name or ID used to disambiguate the project lookup. Required when
projectis a name.start_time (datetime | None) – Inclusive lower bound of the time window. Defaults to seven days before the request time.
end_time (datetime | None) – Exclusive upper bound of the time window. Defaults to the request time.
filter (str | None) –
Optional filter expression to narrow results. Supports equality, comparison, and SQL-style
AND/ORoperators. Examples:"status_code = 'ERROR'" "eval.Custom_eval_correctness.label = 'correct'" "annotation.Correctness.label = 'Correct'" "latency_ms > 1789" "status_code = 'ERROR' AND eval.Custom_eval_correctness.label = 'correct'" "status_code = 'ERROR' OR eval.Custom_eval_correctness.label = 'correct'"
limit (int) – Maximum number of spans to return. The server enforces an upper bound. Defaults to 100.
cursor (str | None) – Opaque pagination cursor returned from a previous response.
- Returns:
A response object with the spans and pagination information.
- Raises:
ApiException – If the REST API returns an error response (e.g. 401/403/429).
- Return type:
- log(*, space_id: str, project_name: str, dataframe: pd.DataFrame, evals_dataframe: pd.DataFrame | None = None, datetime_format: str = DEFAULT_DATETIME_FMT, validate: bool = True, timeout: float | None = None, tmp_dir: str = '') requests.Response[source]#
Logs a pandas dataframe containing LLM tracing data to Arize via a POST request.
Returns a
Responseobject from the Requests HTTP library to ensure successful delivery of records.- Parameters:
space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (
pandas.DataFrame) – The dataframe containing the LLM traces.evals_dataframe (
pandas.DataFrame|None) – A dataframe containing LLM evaluations data. The evaluations are joined to their corresponding spans via a left outer join, i.e., using only context.span_id from the spans dataframe. Defaults to None.datetime_format (str) – format for the timestamp captured in the LLM traces. Defaults to “%Y-%m-%dT%H:%M:%S.%f+00:00”.
validate (bool) – When set to True, validation is run before sending data. Defaults to True.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.
- Returns:
Response object from the HTTP request (only returned on HTTP 2xx).
- Raises:
MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403 (invalid API key or space ID). Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response (e.g. 400, 422, 429, 5xx). Raised immediately to prevent further uploads when the server signals an error.
- Return type:
requests.Response
- update_evaluations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True, force_http: bool = False, timeout: float | None = None, tmp_dir: str = '') flight_pb2.WriteSpanEvaluationResponse[source]#
Logs a pandas dataframe containing LLM evaluations data to Arize via a Flight gRPC request.
The dataframe must contain a column context.span_id such that Arize can assign each evaluation to its respective span.
- Parameters:
space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (
pandas.DataFrame) – A dataframe containing LLM evaluations data.validate (bool) – When set to True, validation is run before sending data. Defaults to True.
force_http (bool) – Force the use of HTTP for data upload. Defaults to False.
timeout (float | None) – You can stop waiting for a response after a given number of seconds with the timeout parameter. Defaults to None.
tmp_dir (str) – Temporary directory/file to store the serialized data in binary before sending to Arize.
- Raises:
MissingSpaceIDError – If space_id is not provided or empty.
MissingProjectNameError – If project_name is not provided or empty.
ValidationFailure – If validate=True and validation checks fail.
pa.ArrowInvalid – If the dataframe cannot be converted to Arrow format.
AuthenticationError – If the server returns HTTP 401 or 403. Raised immediately to prevent further uploads with bad credentials.
APIError – If the server returns any other non-2xx response. Raised immediately to prevent further uploads when the server signals an error.
- Return type:
flight_pb2.WriteSpanEvaluationResponse
- update_annotations(*, space_id: str, project_name: str, dataframe: pd.DataFrame, validate: bool = True) flight_pb2.WriteSpanAnnotationResponse[source]#
Logs a pandas dataframe containing LLM span annotations to Arize via a Flight gRPC request.
The dataframe must contain a column context.span_id such that Arize can assign each annotation to its respective span. Annotation columns should follow the pattern annotation.<name>.<suffix> where suffix is label, score, or text. An optional annotation.notes column can be included for free-form text notes.
- Parameters:
space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (
pandas.DataFrame) – A dataframe containing LLM annotation data.validate (bool) – When set to True, validation is run before sending data. Defaults to True.
- Return type:
flight_pb2.WriteSpanAnnotationResponse
- update_metadata(*, space_id: str, project_name: str, dataframe: DataFrame, patch_document_column_name: str = 'patch_document', validate: bool = True) dict[str, Any][source]#
Log metadata updates using JSON Merge Patch format.
This method is only supported for LLM model types.
The dataframe must contain a column context.span_id to identify spans and either:
A column with JSON patch documents (specified by patch_document_column_name), or
One or more columns with prefix attributes.metadata. that will be automatically converted to a patch document (e.g., attributes.metadata.tag → {“tag”: value}).
If both methods are used, the explicit patch document is applied after the individual field updates. The patches will be applied to the attributes.metadata field of each span.
Type Handling:
The client primarily supports string, integer, and float data types.
Boolean values are converted to string representations.
Nested JSON objects and arrays are serialized to JSON strings during transmission.
Setting a field to None or null will set the field to JSON null in the metadata. Note: This differs from standard JSON Merge Patch where null values remove fields.
- Parameters:
space_id (str) – The space ID where the project resides.
project_name (str) – A unique name to identify your project in the Arize platform.
dataframe (
pandas.DataFrame) – DataFrame with span_ids and either patch documents or metadata field columns.patch_document_column_name (str) – Name of the column containing JSON patch documents. Defaults to “patch_document”.
validate (bool) – When set to True, validation is run before sending data.
- Returns:
spans_processed: Total number of spans in the input dataframe
spans_updated: Count of successfully updated span metadata records
spans_failed: Count of spans that failed to update
errors: List of dictionaries with ‘span_id’ and ‘error_message’ keys for each failed span
Error types from the server include:
parse_failure: Failed to parse JSON metadata
patch_failure: Failed to apply JSON patch
type_conflict: Type conflict in metadata
connection_failure: Connection issues
segment_not_found: No matching segment found
druid_rejection: Backend rejected the update
- Return type:
Dictionary containing update results with the following keys
- Raises:
AuthError – When API key or space ID is missing.
ValidationFailure – When validation of the dataframe or values fails.
ImportError – When required tracing dependencies are missing.
ArrowInvalid – When the dataframe cannot be converted to Arrow format.
RuntimeError – If the request fails or no response is received.
Examples
Method 1: Using a patch document
>>> df = pd.DataFrame( ... { ... "context.span_id": ["span1", "span2"], ... "patch_document": [ ... {"tag": "important"}, ... {"priority": "high"}, ... ], ... } ... )
Method 2: Using direct field columns
>>> df = pd.DataFrame( ... { ... "context.span_id": ["span1", "span2"], ... "attributes.metadata.tag": ["important", "standard"], ... "attributes.metadata.priority": ["high", "medium"], ... } ... )
Method 3: Combining both approaches
>>> df = pd.DataFrame( ... { ... "context.span_id": ["span1"], ... "attributes.metadata.tag": ["important"], ... "patch_document": [ ... {"priority": "high"} ... ], # Overrides conflicting fields ... } ... )
Method 4: Setting fields to null
>>> df = pd.DataFrame( ... { ... "context.span_id": ["span1"], ... "attributes.metadata.old_field": [ ... None ... ], # Sets field to JSON null ... "patch_document": [ ... {"other_field": None} ... ], # Also sets field to JSON null ... } ... )
- export_to_df(*, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) pd.DataFrame[source]#
Export span data from Arize to a
pandas.DataFrame.Retrieves trace/span data from the specified project within a time range and returns it as a
pandas.DataFrame. Supports filtering with SQL-like WHERE clauses and similarity search for semantic retrieval.- Returns:
- DataFrame containing the requested span data with columns
for span metadata, attributes, events, and any custom fields.
- Return type:
- Parameters:
- export_to_parquet(*, path: str, space_id: str, project_name: str, start_time: datetime, end_time: datetime, where: str = '', columns: builtins.list | None = None, stream_chunk_size: int | None = None) None[source]#
Export span data from Arize to a Parquet file.
Retrieves trace/span data from the specified project within a time range and writes it directly to a Parquet file at the specified path. Supports filtering with SQL-like WHERE clauses for efficient querying. Ideal for large datasets and long-term storage.
- Parameters:
path (str) – The file path where the Parquet file will be written.
space_id (str) – The space ID where the project resides.
project_name (str) – The name of the project to export span data from.
start_time (datetime) – Start of the time range (inclusive) as a datetime object.
end_time (datetime) – End of the time range (inclusive) as a datetime object.
where (str) – Optional SQL-like WHERE clause to filter rows (e.g., “span.status_code = ‘ERROR’”).
columns (builtins.list | None) – Optional list of column names to include. If None, all columns are returned.
stream_chunk_size (int | None) – Optional chunk size for streaming large result sets.
- Raises:
RuntimeError – If the Flight client request fails or returns no response.
- Return type:
None
Notes
Uses Apache Arrow Flight for efficient data transfer
Data is written directly to the specified path as a Parquet file
Large exports may benefit from specifying stream_chunk_size
Response Types#
- class SpansList200Response(*, spans: List[Span], pagination: PaginationMetadata)[source]#
Bases:
BaseModelCreate a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
spans (List[Span])
pagination (PaginationMetadata)
- spans: List[Span]#
- pagination: PaginationMetadata#
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'protected_namespaces': (), 'validate_assignment': True, 'validate_by_alias': True, 'validate_by_name': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod from_json(json_str: str) Self | None[source]#
Create an instance of SpansList200Response from a JSON string
- to_dict() Dict[str, Any][source]#
Return the dictionary representation of the model using alias.
This has the following differences from calling pydantic’s self.model_dump(by_alias=True):
None is only added to the output dict for nullable fields that were set at model initialization. Other fields with value None are ignored.
- classmethod from_dict(obj: Dict[str, Any] | None) Self | None[source]#
Create an instance of SpansList200Response from a dict
- to_df(by_alias: bool = False, exclude_none: str | bool = True, json_normalize: bool = False, convert_dtypes: bool = True, expand_field: str = 'additional_properties', expand_prefix: str = '') pd.DataFrame#
Convert a list of objects to a
pandas.DataFrame.- Behavior:
If an item is a Pydantic v2 model, use .model_dump(by_alias=…).
If an item is a mapping (dict-like), use it as-is.
Otherwise, raise a ValueError (unsupported row type).
- Parameters:
self (object) – The object instance containing the field to convert.
by_alias (bool) – Use field aliases when dumping Pydantic models.
exclude_none (str | bool) – Control None/NaN column dropping. - False: keep Nones as-is - “all”: drop columns where all values are None/NaN - “any”: drop columns where any value is None/NaN - True: alias for “all”
json_normalize (bool) – If True, flatten nested dicts via pandas.json_normalize.
convert_dtypes (bool) – If True, call DataFrame.convert_dtypes() at the end.
expand_field (str) – If set, look for this field in each row and
columns. (expand its keys into top-level)
expand_prefix (str) – If set, prefix expanded column names with this string.
- Returns:
The converted DataFrame.
- Return type: