Schema Inference

Automatically infer a schema from an existing DataArray or Dataset:

import numpy as np
import xarray as xr
import pandera.xarray as pa

da = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    coords={"x": [1.0, 2.0, 3.0]},
    name="temperature",
)

schema = pa.infer_schema(da)
print(type(schema).__name__)
print(f"dims={schema.dims}, name={schema.name}")
DataArraySchema
dims=('x', 'y'), name=temperature

For datasets:

ds = xr.Dataset(
    {
        "temperature": (("x", "y"), np.random.rand(3, 4)),
        "pressure": (("x", "y"), np.random.rand(3, 4)),
    },
    coords={"x": [1.0, 2.0, 3.0]},
)

ds_schema = pa.infer_schema(ds)
print(type(ds_schema).__name__)
print(f"data_vars: {list(ds_schema.data_vars.keys())}")
DatasetSchema
data_vars: ['temperature', 'pressure']

The inferred schema captures dtype, dims, coords, nullable status, and min/max bounds for numeric data. Re-validate the same data:

schema.validate(da)
ds_schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 1.0 2.0 3.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.1209 0.9323 0.8148 ... 0.616 0.5962 0.7314
    pressure     (x, y) float64 96B 0.1558 0.4252 0.6478 ... 0.2206 0.8422 0.772

See also

IO Serialization for saving and loading schemas as YAML or JSON.