pandera.api.xarray.container.DataArraySchema

class pandera.api.xarray.container.DataArraySchema(dtype=None, dims=None, ordered_dims=True, sizes=None, shape=None, coords=None, attrs=None, name=None, checks=None, parsers=None, coerce=False, nullable=False, chunked=None, array_type=None, strict_coords=False, strict_attrs=False, encoding=None, title=None, description=None, metadata=None)[source]

A lightweight xarray DataArray validator.

Initialize a DataArraySchema.

Parameters:
  • dtype (UnionType[Any, None]) – datatype of the DataArray.

  • dims (Union[tuple[UnionType[str, None], …], list[UnionType[str, None]], dict[str, str], None]) – dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.

  • ordered_dims (bool) – if True (default), dims validation is positional — order must match. If False, only the set of dim names is checked.

  • sizes (UnionType[dict[str, UnionType[int, None]], None]) – size requirements for dimensions.

  • shape (UnionType[tuple[UnionType[int, None], …], None]) – shape requirements for the DataArray.

  • coords (UnionType[dict[str, Any], list[str], None]) – coordinate specifications.

  • attrs (UnionType[dict[str, Any], type[BaseModel], None]) – attribute specifications. Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^ (pattern match), or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full attrs dict against the model’s schema.

  • name (UnionType[str, None]) – name of the DataArray.

  • checks (Union[Check, list[Union[Check, Hypothesis]], None]) – checks applied to the whole DataArray (after structure).

  • parsers (Union[Parser, list[Parser], None]) – parsers applied to the whole DataArray before checks.

  • coerce (bool) – whether or not to coerce data.

  • nullable (bool) – whether the DataArray can contain null values.

  • chunked (UnionType[bool, None]) – if True, require a Dask-backed array; if False, require eager data; if None, do not check.

  • array_type (UnionType[Any, None]) – expected type of underlying array (e.g. numpy.ndarray).

  • strict_coords (Union[bool, Literal[‘filter’]]) – whether to enforce strict coordinate validation.

  • strict_attrs (Union[bool, Literal[‘filter’]]) – whether to enforce strict attribute validation.

  • encoding (UnionType[dict[str, Any], type[BaseModel], None]) – expected per-variable encoding key-value pairs. Validated against da.encoding, which is populated when reading from netCDF/Zarr (common keys: _FillValue, dtype, scale_factor, add_offset, zlib, complevel, units, calendar). Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^, or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full encoding dict against the model’s schema.

  • title (UnionType[str, None]) – A human-readable label for the schema.

  • description (UnionType[str, None]) – An arbitrary textual description of the schema.

  • metadata (UnionType[dict, None]) – An optional key-value data.

Attributes

BACKEND_REGISTRY

properties

Get the properties of the schema for serialization purposes.

Methods

__init__(dtype=None, dims=None, ordered_dims=True, sizes=None, shape=None, coords=None, attrs=None, name=None, checks=None, parsers=None, coerce=False, nullable=False, chunked=None, array_type=None, strict_coords=False, strict_attrs=False, encoding=None, title=None, description=None, metadata=None)[source]

Initialize a DataArraySchema.

Parameters:
  • dtype (UnionType[Any, None]) – datatype of the DataArray.

  • dims (Union[tuple[UnionType[str, None], …], list[UnionType[str, None]], dict[str, str], None]) – dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.

  • ordered_dims (bool) – if True (default), dims validation is positional — order must match. If False, only the set of dim names is checked.

  • sizes (UnionType[dict[str, UnionType[int, None]], None]) – size requirements for dimensions.

  • shape (UnionType[tuple[UnionType[int, None], …], None]) – shape requirements for the DataArray.

  • coords (UnionType[dict[str, Any], list[str], None]) – coordinate specifications.

  • attrs (UnionType[dict[str, Any], type[BaseModel], None]) – attribute specifications. Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^ (pattern match), or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full attrs dict against the model’s schema.

  • name (UnionType[str, None]) – name of the DataArray.

  • checks (Union[Check, list[Union[Check, Hypothesis]], None]) – checks applied to the whole DataArray (after structure).

  • parsers (Union[Parser, list[Parser], None]) – parsers applied to the whole DataArray before checks.

  • coerce (bool) – whether or not to coerce data.

  • nullable (bool) – whether the DataArray can contain null values.

  • chunked (UnionType[bool, None]) – if True, require a Dask-backed array; if False, require eager data; if None, do not check.

  • array_type (UnionType[Any, None]) – expected type of underlying array (e.g. numpy.ndarray).

  • strict_coords (Union[bool, Literal[‘filter’]]) – whether to enforce strict coordinate validation.

  • strict_attrs (Union[bool, Literal[‘filter’]]) – whether to enforce strict attribute validation.

  • encoding (UnionType[dict[str, Any], type[BaseModel], None]) – expected per-variable encoding key-value pairs. Validated against da.encoding, which is populated when reading from netCDF/Zarr (common keys: _FillValue, dtype, scale_factor, add_offset, zlib, complevel, units, calendar). Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^, or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full encoding dict against the model’s schema.

  • title (UnionType[str, None]) – A human-readable label for the schema.

  • description (UnionType[str, None]) – An arbitrary textual description of the schema.

  • metadata (UnionType[dict, None]) – An optional key-value data.

classmethod from_json(source)[source]

Load schema from JSON (see pandera.io.xarray_io).

Return type:

DataArraySchema

classmethod from_yaml(yaml_schema)[source]

Load schema from YAML (see pandera.io.xarray_io).

Return type:

DataArraySchema

static register_default_backends(check_obj_cls)[source]

Register default backends.

This method is invoked in the get_backend method so that the appropriate validation backend is loaded at validation time instead of schema-definition time.

This method needs to be implemented by the schema subclass.

to_json(target=None, *, minimal=True, **kwargs)[source]

Write schema to JSON (see pandera.io.xarray_io).

Return type:

UnionType[str, None]

to_yaml(stream=None, *, minimal=True)[source]

Write schema to YAML (see pandera.io.xarray_io).

Return type:

UnionType[str, None]

validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]

Validate a DataArray based on the schema specification.

Parameters:
  • check_obj (DataArray) – the DataArray to be validated.

  • head (UnionType[int, None]) – validate the first n positions along the first dimension only (see backend subsampling).

  • tail (UnionType[int, None]) – validate the last n positions along the first dimension.

  • sample (UnionType[int, None]) – random subset of size n along the first dimension.

  • random_state (UnionType[int, None]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates DataArray against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

DataArray

Returns:

validated DataArray

Raises:

SchemaError – when DataArray violates built-in or custom checks.

Chunked (Dask-backed) arrays default to SCHEMA_ONLY for data-level checks unless validation_depth is set (see pandera.api.xarray.utils.get_validation_depth()).