Configuration¶

Validation depth and Dask / chunked data¶

Pandera uses ValidationDepth for xarray the same way it does for Polars lazy frames:

SCHEMA_ONLY — only structural validation (dims, dtype, coords, attrs, name, shape). Data-level Check objects are skipped.
DATA_ONLY — only data-level checks.
SCHEMA_AND_DATA — full validation (default for eager arrays).

Chunked (Dask-backed) arrays¶

When an array is backed by Dask (i.e. da.chunks is not None), data-level checks would trigger .compute(), which may be expensive. To avoid surprises, chunked arrays default to SCHEMA_ONLY when no explicit depth is set. Eager (NumPy-backed) arrays default to SCHEMA_AND_DATA.

Opting in to data checks on Dask arrays¶

Set the validation depth explicitly:

import numpy as np
import xarray as xr
import pandera.xarray as pa
from pandera.config import ValidationDepth, config_context

schema = pa.DataArraySchema(
    dtype=np.float64,
    dims=("x",),
    checks=pa.Check(lambda da: float(da.min()) >= 0),
)

da = xr.DataArray(np.ones(5), dims="x")

with config_context(validation_depth=ValidationDepth.SCHEMA_AND_DATA):
    schema.validate(da)

Or set the environment variable before running your program:

export PANDERA_VALIDATION_DEPTH=SCHEMA_AND_DATA

Resolution order¶

get_validation_depth() resolves the depth in this order:

Active config_context(validation_depth=...) — highest priority.
Global config (PANDERA_VALIDATION_DEPTH env var or PanderaConfig.validation_depth).
Per-object default — SCHEMA_ONLY for chunked data, SCHEMA_AND_DATA for eager data.

Disabling validation¶

Set PANDERA_VALIDATION_ENABLED=false (env var) or use config_context(validation_enabled=False) to make validate() a no-op that returns the input unchanged:

with config_context(validation_enabled=False):
    bad_da = xr.DataArray([-999], dims="z", name="wrong")
    result = schema.validate(bad_da)
    print(f"Validation skipped, returned: {result.values}")

Validation skipped, returned: [-999]