pandera.api.xarray.container.DatasetSchemaΒΆ
- class pandera.api.xarray.container.DatasetSchema(data_vars=None, coords=None, dims=None, ordered_dims=True, sizes=None, attrs=None, checks=None, parsers=None, strict=False, strict_coords=False, strict_attrs=False, encoding=None, name=None, title=None, description=None, metadata=None)[source]ΒΆ
A lightweight xarray Dataset validator.
Initialize a DatasetSchema.
- Parameters:
data_vars (dict[str, Union[DataVar, DataArraySchema, None]] | None) β mapping of logical names to
DataVar,DataArraySchema, orNone(variable must exist, no value checks).coords (dict[str, Any] | list[str] | None) β coordinate specifications.
dims (Union[tuple[str, ...], list[str], dict[str, str]] | None) β dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.
sizes (dict[str, int | None] | None) β size requirements for dimensions.
attrs (dict[str, Any] | type[BaseModel] | None) β attribute specifications.
checks (CheckList | None) β checks applied to the whole Dataset (after per-variable validation).
parsers (ParserList | None) β parsers applied to the whole Dataset before checks.
strict (StrictType | str) β whether to enforce strict validation.
strict_coords (StrictType) β whether to enforce strict coordinate validation.
strict_attrs (StrictType) β whether to enforce strict attribute validation.
encoding (dict[str, Any] | type[BaseModel] | None) β expected dataset-level encoding key-value pairs. Validated against
ds.encoding(common keys:unlimited_dims,source). For per-variable encoding (_FillValue,dtype,scale_factor, etc.) useDataVarencoding. Can be adict[str, Any]where values are literal (equality), regex strings starting with^, or callables(value) -> bool. Alternatively, pass apydantic.BaseModelclass to validate the full encoding dict against the modelβs schema.title (str | None) β A human-readable label for the schema.
description (str | None) β An arbitrary textual description of the schema.
metadata (dict | None) β An optional key-value data.
Attributes
BACKEND_REGISTRYpropertiesGet the properties of the schema for serialization purposes.
Methods
- __init__(data_vars=None, coords=None, dims=None, ordered_dims=True, sizes=None, attrs=None, checks=None, parsers=None, strict=False, strict_coords=False, strict_attrs=False, encoding=None, name=None, title=None, description=None, metadata=None)[source]ΒΆ
Initialize a DatasetSchema.
- Parameters:
data_vars (dict[str, Union[DataVar, DataArraySchema, None]] | None) β mapping of logical names to
DataVar,DataArraySchema, orNone(variable must exist, no value checks).coords (dict[str, Any] | list[str] | None) β coordinate specifications.
dims (Union[tuple[str, ...], list[str], dict[str, str]] | None) β dimension names. Can be a list of dimension names or a dict mapping dimension names to dimension types.
sizes (dict[str, int | None] | None) β size requirements for dimensions.
attrs (dict[str, Any] | type[BaseModel] | None) β attribute specifications.
checks (CheckList | None) β checks applied to the whole Dataset (after per-variable validation).
parsers (ParserList | None) β parsers applied to the whole Dataset before checks.
strict (StrictType | str) β whether to enforce strict validation.
strict_coords (StrictType) β whether to enforce strict coordinate validation.
strict_attrs (StrictType) β whether to enforce strict attribute validation.
encoding (dict[str, Any] | type[BaseModel] | None) β expected dataset-level encoding key-value pairs. Validated against
ds.encoding(common keys:unlimited_dims,source). For per-variable encoding (_FillValue,dtype,scale_factor, etc.) useDataVarencoding. Can be adict[str, Any]where values are literal (equality), regex strings starting with^, or callables(value) -> bool. Alternatively, pass apydantic.BaseModelclass to validate the full encoding dict against the modelβs schema.title (str | None) β A human-readable label for the schema.
description (str | None) β An arbitrary textual description of the schema.
metadata (dict | None) β An optional key-value data.
- classmethod from_json(source)[source]ΒΆ
Load schema from JSON (see
pandera.io.xarray_io).- Return type:
- classmethod from_yaml(yaml_schema)[source]ΒΆ
Load schema from YAML (see
pandera.io.xarray_io).- Return type:
- static register_default_backends(check_obj_cls)[source]ΒΆ
Register default backends.
This method is invoked in the get_backend method so that the appropriate validation backend is loaded at validation time instead of schema-definition time.
This method needs to be implemented by the schema subclass.
- to_json(target=None, *, minimal=True, **kwargs)[source]ΒΆ
Write schema to JSON (see
pandera.io.xarray_io).
- validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ
Validate a Dataset based on the schema specification.
- Parameters:
check_obj (
Dataset) β the Dataset to be validated.head (
UnionType[int,None]) β validate the firstnpositions along the first dimension only (see backend subsampling).tail (
UnionType[int,None]) β validate the lastnpositions along the first dimension.sample (
UnionType[int,None]) β random subset of sizenalong the first dimension.random_state (
UnionType[int,None]) β random seed for thesampleargument.lazy (
bool) β if True, lazily evaluates Dataset against all validation checks and raises aSchemaErrors. Otherwise, raiseSchemaErroras soon as one occurs.inplace (
bool) β if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Return type:
- Returns:
validated
Dataset- Raises:
SchemaError β when
Datasetviolates built-in or custom checks.
If any data variable is chunked (Dask-backed), data-level checks default to
SCHEMA_ONLYunlessvalidation_depthis configured; seepandera.api.xarray.utils.get_validation_depth().