Xarray Data ValidationΒΆ

xarray provides labelled multi-dimensional arrays DataArray, collections of aligned arrays Dataset, and collections of datasets with DataTree.

Pandera validates them with the same patterns as the other dataframe backends: schema objects, optional Check instances, and global configuration.

InstallationΒΆ

pip install 'pandera[xarray]'

Quick startΒΆ

import numpy as np
import xarray as xr
import pandera.xarray as pa

schema = pa.DatasetSchema(
    data_vars={
        "temperature": pa.DataVar(dtype=np.float64, dims=("x", "y")),
        "pressure": pa.DataVar(dtype=np.float64, dims=("x", "y")),
    },
    coords={"x": pa.Coordinate(dtype=np.float64)},
)

ds = xr.Dataset(
    {
        "temperature": (("x", "y"), np.random.rand(3, 4)),
        "pressure": (("x", "y"), np.random.rand(3, 4)),
    },
    coords={"x": np.arange(3, dtype=np.float64)},
)
schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.477 0.4206 0.4485 ... 0.718 0.4396 0.5664
    pressure     (x, y) float64 96B 0.4769 0.4384 0.2959 ... 0.05203 0.6396 0.49

Dataset ModelΒΆ

from pandera.typing.xarray import Coordinate

class Surface(pa.DatasetModel):
    temperature: np.float64 = pa.Field(dims=("x", "y"))
    pressure: np.float64 = pa.Field(dims=("x", "y"))
    x: Coordinate[np.float64]

Surface.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.477 0.4206 0.4485 ... 0.718 0.4396 0.5664
    pressure     (x, y) float64 96B 0.4769 0.4384 0.2959 ... 0.05203 0.6396 0.49

Guide contentsΒΆ

See alsoΒΆ