DataArray Schemas¶

DataArraySchema validates a single DataArray. It is the xarray counterpart of SeriesSchema — but for arbitrary-rank labelled arrays rather than 1-D series.

You can also express the same constraints with the declarative DataArrayModel.

Basic usage¶

import numpy as np
import xarray as xr
import pandera.xarray as pa

schema = pa.DataArraySchema(
    dtype=np.float64,
    dims=("x", "y"),
    name="temperature",
)

da = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    name="temperature",
)
schema.validate(da)

<xarray.DataArray 'temperature' (x: 3, y: 4)> Size: 96B
array([[0.91547261, 0.72095847, 0.18207123, 0.43649667],
       [0.8363929 , 0.80296296, 0.18692264, 0.51210528],
       [0.18127073, 0.72020102, 0.42334195, 0.6468169 ]])
Dimensions without coordinates: x, y

Dtype validation¶

The dtype is resolved through NumPy’s type hierarchy. You can pass a Python type, a NumPy dtype, or a string alias:

da_float32 = xr.DataArray(np.zeros(3, dtype=np.float32), dims="x")

pa.DataArraySchema(dtype=float).validate(da)
pa.DataArraySchema(dtype=np.float32).validate(da_float32)
pa.DataArraySchema(dtype="float32").validate(da_float32)

<xarray.DataArray (x: 3)> Size: 12B
array([0., 0., 0.], dtype=float32)
Dimensions without coordinates: x

If dtype is None, any dtype is accepted.

Dimension validation¶

dims enforces dimension names in order. None entries act as wildcards that match any name:

pa.DataArraySchema(dims=("x", "y")).validate(da)
pa.DataArraySchema(dims=("x", None)).validate(da)

<xarray.DataArray 'temperature' (x: 3, y: 4)> Size: 96B
array([[0.91547261, 0.72095847, 0.18207123, 0.43649667],
       [0.8363929 , 0.80296296, 0.18692264, 0.51210528],
       [0.18127073, 0.72020102, 0.42334195, 0.6468169 ]])
Dimensions without coordinates: x, y

The tuple length also constrains the rank (ndim).

try:
    pa.DataArraySchema(dims=("x", "y", "z")).validate(da)
except pa.errors.SchemaError as exc:
    print(exc)

expected ndim/dims length 3 ('x', 'y', 'z'), got 2 ('x', 'y')

Sizes and shape¶

sizes is the idiomatic xarray way to constrain dimension lengths. shape does the same thing positionally. They are mutually exclusive.

da_sized = xr.DataArray(
    np.zeros((12, 180, 360)),
    dims=("time", "lat", "lon"),
)

pa.DataArraySchema(
    dims=("time", "lat", "lon"),
    sizes={"lat": 180, "lon": 360},
).validate(da_sized)

pa.DataArraySchema(
    dims=("time", "lat", "lon"),
    shape=(None, 180, 360),
).validate(da_sized)

<xarray.DataArray (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
...
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]], shape=(12, 180, 360))
Dimensions without coordinates: time, lat, lon

Attribute validation¶

attrs validates the DataArray’s .attrs dict. Each value in the schema’s attrs dict determines how the corresponding attribute is checked:

Literal values — matched by equality (==).
Regex patterns — strings that start with ^ are treated as regular expressions and matched against str(actual_value) via re.fullmatch.
Callable predicates — any callable (value) -> bool is invoked with the actual attribute value; validation passes when the function returns True.
Pydantic model — pass a pydantic.BaseModel class to validate the full attrs dict using pydantic’s type system.

Equality matching¶

da_attrs = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "standard_name": "air_temperature"},
)

pa.DataArraySchema(
    attrs={"units": "K", "standard_name": "air_temperature"},
).validate(da_attrs)

<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:          K
    standard_name:  air_temperature

Regex matching¶

Use a regex pattern (starting with ^) to validate an attribute against a set of acceptable values:

schema = pa.DataArraySchema(
    attrs={"units": "^(K|degC|degF)$"},
)

da_units = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K"},
)
schema.validate(da_units)

<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:    K

da_bad_units = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "meters"},
)

try:
    schema.validate(da_bad_units)
except pa.errors.SchemaError as exc:
    print(exc)

attribute mismatch 'units': expected '^(K|degC|degF)$', got 'meters'

Callable predicates¶

Pass a function that receives the attribute value and returns a boolean:

schema = pa.DataArraySchema(
    attrs={
        "version": lambda v: isinstance(v, int) and v >= 2,
    },
)

da_v3 = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"version": 3},
)
schema.validate(da_v3)

<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    version:  3

da_v1 = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"version": 1},
)

try:
    schema.validate(da_v1)
except pa.errors.SchemaError as exc:
    print(exc)

attribute mismatch 'version': expected <function <lambda> at 0x7e717f651940>, got 1

Pydantic model¶

For complex attribute schemas you can pass a pydantic.BaseModel class instead of a dict. Pandera delegates validation to pydantic and converts every pydantic error into a pandera SchemaError, so error collection during lazy validation works seamlessly:

from pydantic import BaseModel, Field as PydanticField

class ArrayAttrs(BaseModel):
    units: str
    standard_name: str
    version: int = PydanticField(ge=2)

schema = pa.DataArraySchema(attrs=ArrayAttrs)

da_ok = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "standard_name": "air_temperature", "version": 3},
)
schema.validate(da_ok)

<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
    units:          K
    standard_name:  air_temperature
    version:        3

When validation fails, the error messages surface the pydantic error details:

da_bad = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "version": 1},  # version < 2, standard_name missing
)

try:
    schema.validate(da_bad, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc)

{
    "SCHEMA": {
        "SCHEMA_COMPONENT_CHECK": [
            {
                "schema": "schema",
                "column": null,
                "check": "attrs",
                "error": "standard_name: Field required [type=missing]"
            },
            {
                "schema": "schema",
                "column": null,
                "check": "attrs",
                "error": "version: Input should be greater than or equal to 2 [type=greater_than_equal]"
            }
        ]
    }
}

All four modes also work on DatasetSchema — see Dataset Schemas.

Strict attributes¶

With strict_attrs=True, extra attributes cause a validation error. When attrs is a pydantic model class, the set of allowed keys is derived from the model’s fields.

da_extra = xr.DataArray(
    np.ones(3), dims="x",
    attrs={"units": "K", "extra_key": 42},
)

try:
    pa.DataArraySchema(
        attrs={"units": "K"}, strict_attrs=True
    ).validate(da_extra)
except pa.errors.SchemaError as exc:
    print(exc)

unexpected attribute 'extra_key'

Name validation¶

named_da = xr.DataArray(np.ones(3), dims="x", name="temperature")
pa.DataArraySchema(name="temperature").validate(named_da)

<xarray.DataArray 'temperature' (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x

The DataArray’s .name must match exactly.

try:
    pa.DataArraySchema(name="pressure").validate(named_da)
except pa.errors.SchemaError as exc:
    print(exc)

expected name 'pressure', got 'temperature'

Null values¶

By default nullable=False — any NaN or null value raises a SchemaError. Set nullable=True to allow them:

da_with_nan = xr.DataArray([1.0, np.nan, 3.0], dims="x")

pa.DataArraySchema(dtype=float, nullable=True).validate(da_with_nan)

<xarray.DataArray (x: 3)> Size: 24B
array([ 1., nan,  3.])
Dimensions without coordinates: x

try:
    pa.DataArraySchema(dtype=float, nullable=False).validate(da_with_nan)
except pa.errors.SchemaError as exc:
    print(exc)

non-nullable DataArray contains null values

Coercing dtypes¶

When coerce=True, the DataArray is cast to dtype before validation:

schema = pa.DataArraySchema(dtype=np.float32, coerce=True)
da_int = xr.DataArray(np.array([1, 2, 3]), dims="x")
validated = schema.validate(da_int)
print(f"original: {da_int.dtype} -> coerced: {validated.dtype}")

original: int64 -> coerced: float32

Encoding validation¶

The encoding parameter validates the DataArray’s .encoding dict, which is populated when reading from netCDF or Zarr:

da_encoded = xr.DataArray(np.ones(3), dims="x")
da_encoded.encoding = {"_FillValue": -999.0, "dtype": "float32"}

pa.DataArraySchema(
    encoding={"_FillValue": -999.0, "dtype": "^float.*"},
).validate(da_encoded)

<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x

Encoding supports the same matching modes as attrs (equality, regex, callable) plus pydantic models. See Encoding Validation for full details.

Chunked / array type¶

Control whether the underlying storage is lazy (Dask) or eager (NumPy):

pa.DataArraySchema(chunked=True)       # must be Dask-backed
pa.DataArraySchema(chunked=False)      # must be eager
pa.DataArraySchema(array_type=np.ndarray)  # must be a numpy array

See Dask and Duck Arrays for comprehensive Dask integration documentation, and Configuration for how chunked interacts with validation depth.

Data-level checks¶

Use Check for value-level assertions:

schema = pa.DataArraySchema(
    dtype=np.float64,
    checks=[
        pa.Check(lambda da: float(da.min()) >= 0),
        pa.Check(lambda da: float(da.max()) <= 100),
    ],
)

da_checked = xr.DataArray(np.linspace(0, 50, 10), dims="x")
schema.validate(da_checked)

<xarray.DataArray (x: 10)> Size: 80B
array([ 0.        ,  5.55555556, 11.11111111, 16.66666667, 22.22222222,
       27.77777778, 33.33333333, 38.88888889, 44.44444444, 50.        ])
Dimensions without coordinates: x

See Checks and Parsers for built-in check helpers and details on how checks interact with lazy / chunked data.

Parsers¶

Parser objects run before checks and can transform the array:

schema = pa.DataArraySchema(
    parsers=pa.Parser(lambda da: da.fillna(0)),
    nullable=False,
)

da_nulls = xr.DataArray([1.0, np.nan, 3.0], dims="x")
validated = schema.validate(da_nulls)
validated

<xarray.DataArray (x: 3)> Size: 24B
array([1., 0., 3.])
Dimensions without coordinates: x

Validation options¶

schema.validate(da) accepts several keyword arguments:

lazy — collect all failures into SchemaErrors instead of raising on the first one.
head / tail / sample — subsample along the first dimension before running heavy checks.
inplace — if True, coercion mutates the original object.

schema = pa.DataArraySchema(
    dtype=np.float64,
    dims=("x",),
    name="values",
    checks=pa.Check(lambda da: bool((da > 0).all())),
)

da_bad = xr.DataArray([-1, 2, 3], dims="x", name="wrong_name")

try:
    schema.validate(da_bad, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc)

{
    "SCHEMA": {
        "WRONG_FIELD_NAME": [
            {
                "schema": "values",
                "column": "values",
                "check": "name",
                "error": "expected name 'values', got 'wrong_name'"
            }
        ],
        "WRONG_DATATYPE": [
            {
                "schema": "values",
                "column": "values",
                "check": "dtype(<class 'numpy.float64'>)",
                "error": "expected dtype <class 'numpy.float64'>, got int64"
            }
        ]
    },
    "DATA": {
        "DATAFRAME_CHECK": [
            {
                "schema": "values",
                "column": "values",
                "check": "<lambda>",
                "error": "DataArraySchema 'values' failed series or dataframe validator 0: <Check <lambda>>"
            }
        ]
    }
}