DataArray Schemasยถ

DataArraySchema validates a single DataArray. It is the xarray counterpart of SeriesSchema โ€” but for arbitrary-rank labelled arrays rather than 1-D series.

You can also express the same constraints with the declarative DataArrayModel.

Basic usageยถ

import numpy as np
import xarray as xr
import pandera.xarray as pa

schema = pa.DataArraySchema(
    dtype=np.float64,
    dims=("x", "y"),
    name="temperature",
)

da = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    name="temperature",
)
schema.validate(da)
<xarray.DataArray 'temperature' (x: 3, y: 4)> Size: 96B
array([[0.91547261, 0.72095847, 0.18207123, 0.43649667],
       [0.8363929 , 0.80296296, 0.18692264, 0.51210528],
       [0.18127073, 0.72020102, 0.42334195, 0.6468169 ]])
Dimensions without coordinates: x, y

Dtype validationยถ

The dtype is resolved through NumPyโ€™s type hierarchy. You can pass a Python type, a NumPy dtype, or a string alias:

da_float32 = xr.DataArray(np.zeros(3, dtype=np.float32), dims="x")

pa.DataArraySchema(dtype=float).validate(da)
pa.DataArraySchema(dtype=np.float32).validate(da_float32)
pa.DataArraySchema(dtype="float32").validate(da_float32)
<xarray.DataArray (x: 3)> Size: 12B
array([0., 0., 0.], dtype=float32)
Dimensions without coordinates: x

If dtype is None, any dtype is accepted.

Dimension validationยถ

dims enforces dimension names in order. None entries act as wildcards that match any name:

pa.DataArraySchema(dims=("x", "y")).validate(da)
pa.DataArraySchema(dims=("x", None)).validate(da)
<xarray.DataArray 'temperature' (x: 3, y: 4)> Size: 96B
array([[0.91547261, 0.72095847, 0.18207123, 0.43649667],
       [0.8363929 , 0.80296296, 0.18692264, 0.51210528],
       [0.18127073, 0.72020102, 0.42334195, 0.6468169 ]])
Dimensions without coordinates: x, y

The tuple length also constrains the rank (ndim).

try:
    pa.DataArraySchema(dims=("x", "y", "z")).validate(da)
except pa.errors.SchemaError as exc:
    print(exc)
expected ndim/dims length 3 ('x', 'y', 'z'), got 2 ('x', 'y')

Sizes and shapeยถ

sizes is the idiomatic xarray way to constrain dimension lengths. shape does the same thing positionally. They are mutually exclusive.

da_sized = xr.DataArray(
    np.zeros((12, 180, 360)),
    dims=("time", "lat", "lon"),
)

pa.DataArraySchema(
    dims=("time", "lat", "lon"),
    sizes={"lat": 180, "lon": 360},
).validate(da_sized)

pa.DataArraySchema(
    dims=("time", "lat", "lon"),
    shape=(None, 180, 360),
).validate(da_sized)
<xarray.DataArray (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
...
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]], shape=(12, 180, 360))
Dimensions without coordinates: time, lat, lon

Coordinate validationยถ

Pass a dict[str, Coordinate] to validate coordinate arrays, or a list[str] as shorthand for โ€œthese coordinates must existโ€:

da_with_coords = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    coords={
        "x": np.arange(3, dtype=np.float64),
        "y": np.arange(4, dtype=np.float64),
        "label": ("x", ["a", "b", "c"]),
    },
)

schema = pa.DataArraySchema(
    dims=("x", "y"),
    coords={
        "x": pa.Coordinate(dtype=np.float64, dimension=True),
        "y": pa.Coordinate(dtype=np.float64, dimension=True),
        "label": pa.Coordinate(dimension=False),
    },
)
schema.validate(da_with_coords)
<xarray.DataArray (x: 3, y: 4)> Size: 96B
array([[0.99219903, 0.00650998, 0.46675993, 0.40539898],
       [0.7469778 , 0.9323137 , 0.31524351, 0.42855526],
       [0.88666229, 0.98565366, 0.65992507, 0.66354354]])
Coordinates:
  * x        (x) float64 24B 0.0 1.0 2.0
    label    (x) <U1 12B 'a' 'b' 'c'
  * y        (y) float64 32B 0.0 1.0 2.0 3.0

Coordinate is documented in detail under Dataset Schemas.

Strict coordinatesยถ

With strict_coords=True, the schema fails if the DataArray has coordinates not listed in coords:

strict_schema = pa.DataArraySchema(
    coords={"x": pa.Coordinate()},
    strict_coords=True,
)

da_x_only = xr.DataArray(
    np.ones(3),
    dims="x",
    coords={"x": np.arange(3, dtype=np.float64)},
)
strict_schema.validate(da_x_only)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Coordinates:
  * x        (x) float64 24B 0.0 1.0 2.0
try:
    strict_schema.validate(da_with_coords)
except pa.errors.SchemaError as exc:
    print(exc)
unexpected coordinate 'y'

Attribute validationยถ

attrs validates the DataArrayโ€™s .attrs dict. Each value in the schemaโ€™s attrs dict determines how the corresponding attribute is checked:

  • Literal values โ€” matched by equality (==).

  • Regex patterns โ€” strings that start with ^ are treated as regular expressions and matched against str(actual_value) via re.fullmatch.

  • Callable predicates โ€” any callable (value) -> bool is invoked with the actual attribute value; validation passes when the function returns True.

  • Pydantic model โ€” pass a pydantic.BaseModel class to validate the full attrs dict using pydanticโ€™s type system.

Equality matchingยถ

da_attrs = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "standard_name": "air_temperature"},
)

pa.DataArraySchema(
    attrs={"units": "K", "standard_name": "air_temperature"},
).validate(da_attrs)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:          K
    standard_name:  air_temperature

Regex matchingยถ

Use a regex pattern (starting with ^) to validate an attribute against a set of acceptable values:

schema = pa.DataArraySchema(
    attrs={"units": "^(K|degC|degF)$"},
)

da_units = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K"},
)
schema.validate(da_units)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:    K
da_bad_units = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "meters"},
)

try:
    schema.validate(da_bad_units)
except pa.errors.SchemaError as exc:
    print(exc)
attribute mismatch 'units': expected '^(K|degC|degF)$', got 'meters'

Callable predicatesยถ

Pass a function that receives the attribute value and returns a boolean:

schema = pa.DataArraySchema(
    attrs={
        "version": lambda v: isinstance(v, int) and v >= 2,
    },
)

da_v3 = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"version": 3},
)
schema.validate(da_v3)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    version:  3
da_v1 = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"version": 1},
)

try:
    schema.validate(da_v1)
except pa.errors.SchemaError as exc:
    print(exc)
attribute mismatch 'version': expected <function <lambda> at 0x7e717f651940>, got 1

Pydantic modelยถ

For complex attribute schemas you can pass a pydantic.BaseModel class instead of a dict. Pandera delegates validation to pydantic and converts every pydantic error into a pandera SchemaError, so error collection during lazy validation works seamlessly:

from pydantic import BaseModel, Field as PydanticField

class ArrayAttrs(BaseModel):
    units: str
    standard_name: str
    version: int = PydanticField(ge=2)
schema = pa.DataArraySchema(attrs=ArrayAttrs)

da_ok = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "standard_name": "air_temperature", "version": 3},
)
schema.validate(da_ok)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:          K
    standard_name:  air_temperature
    version:        3

When validation fails, the error messages surface the pydantic error details:

da_bad = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "version": 1},  # version < 2, standard_name missing
)

try:
    schema.validate(da_bad, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc)
{
    "SCHEMA": {
        "SCHEMA_COMPONENT_CHECK": [
            {
                "schema": "schema",
                "column": null,
                "check": "attrs",
                "error": "standard_name: Field required [type=missing]"
            },
            {
                "schema": "schema",
                "column": null,
                "check": "attrs",
                "error": "version: Input should be greater than or equal to 2 [type=greater_than_equal]"
            }
        ]
    }
}

All four modes also work on DatasetSchema โ€” see Dataset Schemas.

Strict attributesยถ

With strict_attrs=True, extra attributes cause a validation error. When attrs is a pydantic model class, the set of allowed keys is derived from the modelโ€™s fields.

da_extra = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "extra_key": 42},
)

try:
    pa.DataArraySchema(
        attrs={"units": "K"}, strict_attrs=True
    ).validate(da_extra)
except pa.errors.SchemaError as exc:
    print(exc)
unexpected attribute 'extra_key'

Name validationยถ

named_da = xr.DataArray(np.ones(3), dims="x", name="temperature")
pa.DataArraySchema(name="temperature").validate(named_da)
<xarray.DataArray 'temperature' (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x

The DataArrayโ€™s .name must match exactly.

try:
    pa.DataArraySchema(name="pressure").validate(named_da)
except pa.errors.SchemaError as exc:
    print(exc)
expected name 'pressure', got 'temperature'

Null valuesยถ

By default nullable=False โ€” any NaN or null value raises a SchemaError. Set nullable=True to allow them:

da_with_nan = xr.DataArray([1.0, np.nan, 3.0], dims="x")

pa.DataArraySchema(dtype=float, nullable=True).validate(da_with_nan)
<xarray.DataArray (x: 3)> Size: 24B
array([ 1., nan,  3.])
Dimensions without coordinates: x
try:
    pa.DataArraySchema(dtype=float, nullable=False).validate(da_with_nan)
except pa.errors.SchemaError as exc:
    print(exc)
non-nullable DataArray contains null values

Coercing dtypesยถ

When coerce=True, the DataArray is cast to dtype before validation:

schema = pa.DataArraySchema(dtype=np.float32, coerce=True)
da_int = xr.DataArray(np.array([1, 2, 3]), dims="x")
validated = schema.validate(da_int)
print(f"original: {da_int.dtype} -> coerced: {validated.dtype}")
original: int64 -> coerced: float32

Encoding validationยถ

The encoding parameter validates the DataArrayโ€™s .encoding dict, which is populated when reading from netCDF or Zarr:

da_encoded = xr.DataArray(np.ones(3), dims="x")
da_encoded.encoding = {"_FillValue": -999.0, "dtype": "float32"}

pa.DataArraySchema(
    encoding={"_FillValue": -999.0, "dtype": "^float.*"},
).validate(da_encoded)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x

Encoding supports the same matching modes as attrs (equality, regex, callable) plus pydantic models. See Encoding Validation for full details.

Chunked / array typeยถ

Control whether the underlying storage is lazy (Dask) or eager (NumPy):

pa.DataArraySchema(chunked=True)       # must be Dask-backed
pa.DataArraySchema(chunked=False)      # must be eager
pa.DataArraySchema(array_type=np.ndarray)  # must be a numpy array

See Dask and Duck Arrays for comprehensive Dask integration documentation, and Configuration for how chunked interacts with validation depth.

Data-level checksยถ

Use Check for value-level assertions:

schema = pa.DataArraySchema(
    dtype=np.float64,
    checks=[
        pa.Check(lambda da: float(da.min()) >= 0),
        pa.Check(lambda da: float(da.max()) <= 100),
    ],
)

da_checked = xr.DataArray(np.linspace(0, 50, 10), dims="x")
schema.validate(da_checked)
<xarray.DataArray (x: 10)> Size: 80B
array([ 0.        ,  5.55555556, 11.11111111, 16.66666667, 22.22222222,
       27.77777778, 33.33333333, 38.88888889, 44.44444444, 50.        ])
Dimensions without coordinates: x

See Checks and Parsers for built-in check helpers and details on how checks interact with lazy / chunked data.

Parsersยถ

Parser objects run before checks and can transform the array:

schema = pa.DataArraySchema(
    parsers=pa.Parser(lambda da: da.fillna(0)),
    nullable=False,
)

da_nulls = xr.DataArray([1.0, np.nan, 3.0], dims="x")
validated = schema.validate(da_nulls)
validated
<xarray.DataArray (x: 3)> Size: 24B
array([1., 0., 3.])
Dimensions without coordinates: x

Validation optionsยถ

schema.validate(da) accepts several keyword arguments:

  • lazy โ€” collect all failures into SchemaErrors instead of raising on the first one.

  • head / tail / sample โ€” subsample along the first dimension before running heavy checks.

  • inplace โ€” if True, coercion mutates the original object.

schema = pa.DataArraySchema(
    dtype=np.float64,
    dims=("x",),
    name="values",
    checks=pa.Check(lambda da: bool((da > 0).all())),
)

da_bad = xr.DataArray([-1, 2, 3], dims="x", name="wrong_name")

try:
    schema.validate(da_bad, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc)
{
    "SCHEMA": {
        "WRONG_FIELD_NAME": [
            {
                "schema": "values",
                "column": "values",
                "check": "name",
                "error": "expected name 'values', got 'wrong_name'"
            }
        ],
        "WRONG_DATATYPE": [
            {
                "schema": "values",
                "column": "values",
                "check": "dtype(<class 'numpy.float64'>)",
                "error": "expected dtype <class 'numpy.float64'>, got int64"
            }
        ]
    },
    "DATA": {
        "DATAFRAME_CHECK": [
            {
                "schema": "values",
                "column": "values",
                "check": "<lambda>",
                "error": "DataArraySchema 'values' failed series or dataframe validator 0: <Check <lambda>>"
            }
        ]
    }
}

See alsoยถ