pandera.api.xarray.container.DatasetSchemaΒΆ

class pandera.api.xarray.container.DatasetSchema(data_vars=None, coords=None, dims=None, ordered_dims=True, sizes=None, attrs=None, checks=None, parsers=None, strict=False, strict_coords=False, strict_attrs=False, encoding=None, name=None, title=None, description=None, metadata=None)[source]ΒΆ

A lightweight xarray Dataset validator.

Initialize a DatasetSchema.

Parameters:
  • data_vars (dict[str, Union[DataVar, DataArraySchema, None]] | None) – mapping of logical names to DataVar, DataArraySchema, or None (variable must exist, no value checks).

  • coords (dict[str, Any] | list[str] | None) – coordinate specifications.

  • dims (Union[tuple[str, ...], list[str], dict[str, str]] | None) – dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.

  • sizes (dict[str, int | None] | None) – size requirements for dimensions.

  • attrs (dict[str, Any] | type[BaseModel] | None) – attribute specifications.

  • checks (CheckList | None) – checks applied to the whole Dataset (after per-variable validation).

  • parsers (ParserList | None) – parsers applied to the whole Dataset before checks.

  • strict (StrictType | str) – whether to enforce strict validation.

  • strict_coords (StrictType) – whether to enforce strict coordinate validation.

  • strict_attrs (StrictType) – whether to enforce strict attribute validation.

  • encoding (dict[str, Any] | type[BaseModel] | None) – expected dataset-level encoding key-value pairs. Validated against ds.encoding (common keys: unlimited_dims, source). For per-variable encoding (_FillValue, dtype, scale_factor, etc.) use DataVar encoding. Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^, or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full encoding dict against the model’s schema.

  • title (str | None) – A human-readable label for the schema.

  • description (str | None) – An arbitrary textual description of the schema.

  • metadata (dict | None) – An optional key-value data.

Attributes

BACKEND_REGISTRY

properties

Get the properties of the schema for serialization purposes.

Methods

__init__(data_vars=None, coords=None, dims=None, ordered_dims=True, sizes=None, attrs=None, checks=None, parsers=None, strict=False, strict_coords=False, strict_attrs=False, encoding=None, name=None, title=None, description=None, metadata=None)[source]ΒΆ

Initialize a DatasetSchema.

Parameters:
  • data_vars (dict[str, Union[DataVar, DataArraySchema, None]] | None) – mapping of logical names to DataVar, DataArraySchema, or None (variable must exist, no value checks).

  • coords (dict[str, Any] | list[str] | None) – coordinate specifications.

  • dims (Union[tuple[str, ...], list[str], dict[str, str]] | None) – dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.

  • sizes (dict[str, int | None] | None) – size requirements for dimensions.

  • attrs (dict[str, Any] | type[BaseModel] | None) – attribute specifications.

  • checks (CheckList | None) – checks applied to the whole Dataset (after per-variable validation).

  • parsers (ParserList | None) – parsers applied to the whole Dataset before checks.

  • strict (StrictType | str) – whether to enforce strict validation.

  • strict_coords (StrictType) – whether to enforce strict coordinate validation.

  • strict_attrs (StrictType) – whether to enforce strict attribute validation.

  • encoding (dict[str, Any] | type[BaseModel] | None) – expected dataset-level encoding key-value pairs. Validated against ds.encoding (common keys: unlimited_dims, source). For per-variable encoding (_FillValue, dtype, scale_factor, etc.) use DataVar encoding. Can be a dict[str, Any] where values are literal (equality), regex strings starting with ^, or callables (value) -> bool. Alternatively, pass a pydantic.BaseModel class to validate the full encoding dict against the model’s schema.

  • title (str | None) – A human-readable label for the schema.

  • description (str | None) – An arbitrary textual description of the schema.

  • metadata (dict | None) – An optional key-value data.

classmethod from_json(source)[source]ΒΆ

Load schema from JSON (see pandera.io.xarray_io).

Return type:

DatasetSchema

classmethod from_yaml(yaml_schema)[source]ΒΆ

Load schema from YAML (see pandera.io.xarray_io).

Return type:

DatasetSchema

static register_default_backends(check_obj_cls)[source]ΒΆ

Register default backends.

This method is invoked in the get_backend method so that the appropriate validation backend is loaded at validation time instead of schema-definition time.

This method needs to be implemented by the schema subclass.

to_json(target=None, *, minimal=True, **kwargs)[source]ΒΆ

Write schema to JSON (see pandera.io.xarray_io).

Return type:

UnionType[str, None]

to_yaml(stream=None, *, minimal=True)[source]ΒΆ

Write schema to YAML (see pandera.io.xarray_io).

Return type:

UnionType[str, None]

validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ

Validate a Dataset based on the schema specification.

Parameters:
  • check_obj (Dataset) – the Dataset to be validated.

  • head (UnionType[int, None]) – validate the first n positions along the first dimension only (see backend subsampling).

  • tail (UnionType[int, None]) – validate the last n positions along the first dimension.

  • sample (UnionType[int, None]) – random subset of size n along the first dimension.

  • random_state (UnionType[int, None]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates Dataset against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

Dataset

Returns:

validated Dataset

Raises:

SchemaError – when Dataset violates built-in or custom checks.

If any data variable is chunked (Dask-backed), data-level checks default to SCHEMA_ONLY unless validation_depth is configured; see pandera.api.xarray.utils.get_validation_depth().