Xarray Data ValidationΒΆ
xarray provides labelled multi-dimensional arrays
DataArray, collections of aligned arrays
Dataset, and collections of datasets with DataTree.
Pandera validates them with the same patterns as the other dataframe backends:
schema objects, optional Check
instances, and global configuration.
InstallationΒΆ
pip install 'pandera[xarray]'
Quick startΒΆ
import numpy as np
import xarray as xr
import pandera.xarray as pa
schema = pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x", "y")),
"pressure": pa.DataVar(dtype=np.float64, dims=("x", "y")),
},
coords={"x": pa.Coordinate(dtype=np.float64)},
)
ds = xr.Dataset(
{
"temperature": (("x", "y"), np.random.rand(3, 4)),
"pressure": (("x", "y"), np.random.rand(3, 4)),
},
coords={"x": np.arange(3, dtype=np.float64)},
)
schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
temperature (x, y) float64 96B 0.477 0.4206 0.4485 ... 0.718 0.4396 0.5664
pressure (x, y) float64 96B 0.4769 0.4384 0.2959 ... 0.05203 0.6396 0.49Dataset ModelΒΆ
from pandera.typing.xarray import Coordinate
class Surface(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x", "y"))
pressure: np.float64 = pa.Field(dims=("x", "y"))
x: Coordinate[np.float64]
Surface.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
temperature (x, y) float64 96B 0.477 0.4206 0.4485 ... 0.718 0.4396 0.5664
pressure (x, y) float64 96B 0.4769 0.4384 0.2959 ... 0.05203 0.6396 0.49Guide contentsΒΆ
DataArray Schemas β validating a single
DataArrayDataset Schemas β validating a
DatasetwithDataVarandCoordinateDataTree Validation β validating a
DataTreehierarchyData Models β class-based
DataArrayModel,DatasetModel, andDataTreeModelChecks and Parsers β checks, parsers, and lazy validation
Decorators β
check_input,check_output,check_io, andcheck_typesConfiguration β validation depth, Dask, and environment variables
Dask and Duck Arrays β
chunked,array_type, validation depth, and lazy data checksEncoding Validation β validate
.encodingdicts on DataArrays, DataVars, and DatasetsError reports and lazy validation β
SchemaError/SchemaErrors, lazy validation, and failure casesCF Convention Checks β CF standard name, units, and
cf_xarraychecksSchema Inference β automatically infer schemas from data
IO Serialization β save and load schemas as YAML or JSON
Hypothesis Data Strategies β generate synthetic data with Hypothesis
See alsoΒΆ
Supported DataFrame Libraries β other backends
Validating with Checks β general
CheckbehaviourLazy Validation β
lazy=TrueandSchemaErrorsConfiguration β
ValidationDepthand environment variables