Schema Inference¶

Automatically infer a schema from an existing DataArray or Dataset:

import numpy as np
import xarray as xr
import pandera.xarray as pa

da = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    coords={"x": [1.0, 2.0, 3.0]},
    name="temperature",
)

schema = pa.infer_schema(da)
print(type(schema).__name__)
print(f"dims={schema.dims}, name={schema.name}")

DataArraySchema
dims=('x', 'y'), name=temperature

For datasets:

ds = xr.Dataset(
    {
        "temperature": (("x", "y"), np.random.rand(3, 4)),
        "pressure": (("x", "y"), np.random.rand(3, 4)),
    },
    coords={"x": [1.0, 2.0, 3.0]},
)

ds_schema = pa.infer_schema(ds)
print(type(ds_schema).__name__)
print(f"data_vars: {list(ds_schema.data_vars.keys())}")

DatasetSchema
data_vars: ['temperature', 'pressure']

The inferred schema captures dtype, dims, coords, nullable status, and min/max bounds for numeric data. Re-validate the same data:

schema.validate(da)
ds_schema.validate(ds)

<xarray.Dataset> Size: 216B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 1.0 2.0 3.0
Dimensions without coordinates: y
Data variables:
    temperature  (x, y) float64 96B 0.1209 0.9323 0.8148 ... 0.616 0.5962 0.7314
    pressure     (x, y) float64 96B 0.1558 0.4252 0.6478 ... 0.2206 0.8422 0.772