Xarray Data ValidationΒΆ
New in 0.31.0
xarray provides labelled multi-dimensional arrays
DataArray, collections of aligned arrays
Dataset, and collections of datasets with DataTree.
Pandera validates them with the same patterns as the other dataframe backends:
schema objects, optional Check
instances, and global configuration.
InstallationΒΆ
pip install 'pandera[xarray]'
Quick startΒΆ
import numpy as np
import xarray as xr
import pandera.xarray as pa
schema = pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x", "y")),
"pressure": pa.DataVar(dtype=np.float64, dims=("x", "y")),
},
coords={"x": pa.Coordinate(dtype=np.float64)},
)
ds = xr.Dataset(
{
"temperature": (("x", "y"), np.random.rand(3, 4)),
"pressure": (("x", "y"), np.random.rand(3, 4)),
},
coords={"x": np.arange(3, dtype=np.float64)},
)
schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
temperature (x, y) float64 96B 0.4043 0.6347 0.6288 ... 0.9956 0.1815
pressure (x, y) float64 96B 0.1103 0.1267 0.7906 ... 0.8932 0.6363Dataset ModelΒΆ
from pandera.typing.xarray import Coordinate
class Surface(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x", "y"))
pressure: np.float64 = pa.Field(dims=("x", "y"))
x: Coordinate[np.float64]
Surface.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
temperature (x, y) float64 96B 0.4043 0.6347 0.6288 ... 0.9956 0.1815
pressure (x, y) float64 96B 0.1103 0.1267 0.7906 ... 0.8932 0.6363Guide contentsΒΆ
DataArray Schemas β validating a single
DataArrayDataset Schemas β validating a
DatasetwithDataVarandCoordinateDataTree Validation β validating a
DataTreehierarchyData Models β class-based
DataArrayModel,DatasetModel, andDataTreeModelChecks and Parsers β checks, parsers, and lazy validation
Decorators β
check_input,check_output,check_io, andcheck_typesConfiguration β validation depth, Dask, and environment variables
Dask and Duck Arrays β
chunked,array_type, validation depth, and lazy data checksEncoding Validation β validate
.encodingdicts on DataArrays, DataVars, and DatasetsError reports and lazy validation β
SchemaError/SchemaErrors, lazy validation, and failure casesCF Convention Checks β CF standard name, units, and
cf_xarraychecksSchema Inference β automatically infer schemas from data
IO Serialization β save and load schemas as YAML or JSON
Hypothesis Data Strategies β generate synthetic data with Hypothesis
See alsoΒΆ
Supported DataFrame Libraries β other backends
Validating with Checks β general
CheckbehaviourLazy Validation β
lazy=TrueandSchemaErrorsConfiguration β
ValidationDepthand environment variables