Xarray Data ValidationΒΆ

New in 0.31.0

xarray provides labelled multi-dimensional arrays DataArray, collections of aligned arrays Dataset, and collections of datasets with DataTree.

Pandera validates them with the same patterns as the other dataframe backends: schema objects, optional Check instances, and global configuration.

InstallationΒΆ

pip install 'pandera[xarray]'

Quick startΒΆ

import numpy as np
import xarray as xr
import pandera.xarray as pa

schema = pa.DatasetSchema(
    data_vars={
        "temperature": pa.DataVar(dtype=np.float64, dims=("x", "y")),
        "pressure": pa.DataVar(dtype=np.float64, dims=("x", "y")),
    },
    coords={"x": pa.Coordinate(dtype=np.float64)},
)

ds = xr.Dataset(
    {
        "temperature": (("x", "y"), np.random.rand(3, 4)),
        "pressure": (("x", "y"), np.random.rand(3, 4)),
    },
    coords={"x": np.arange(3, dtype=np.float64)},
)
schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.4043 0.6347 0.6288 ... 0.9956 0.1815
    pressure     (x, y) float64 96B 0.1103 0.1267 0.7906 ... 0.8932 0.6363

Dataset ModelΒΆ

from pandera.typing.xarray import Coordinate

class Surface(pa.DatasetModel):
    temperature: np.float64 = pa.Field(dims=("x", "y"))
    pressure: np.float64 = pa.Field(dims=("x", "y"))
    x: Coordinate[np.float64]

Surface.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.4043 0.6347 0.6288 ... 0.9956 0.1815
    pressure     (x, y) float64 96B 0.1103 0.1267 0.7906 ... 0.8932 0.6363

Guide contentsΒΆ

See alsoΒΆ