Schema Inference¶
Automatically infer a schema from an existing DataArray or
Dataset:
import numpy as np
import xarray as xr
import pandera.xarray as pa
da = xr.DataArray(
np.random.rand(3, 4),
dims=("x", "y"),
coords={"x": [1.0, 2.0, 3.0]},
name="temperature",
)
schema = pa.infer_schema(da)
print(type(schema).__name__)
print(f"dims={schema.dims}, name={schema.name}")
DataArraySchema
dims=('x', 'y'), name=temperature
For datasets:
ds = xr.Dataset(
{
"temperature": (("x", "y"), np.random.rand(3, 4)),
"pressure": (("x", "y"), np.random.rand(3, 4)),
},
coords={"x": [1.0, 2.0, 3.0]},
)
ds_schema = pa.infer_schema(ds)
print(type(ds_schema).__name__)
print(f"data_vars: {list(ds_schema.data_vars.keys())}")
DatasetSchema
data_vars: ['temperature', 'pressure']
The inferred schema captures dtype, dims, coords, nullable status, and min/max bounds for numeric data. Re-validate the same data:
schema.validate(da)
ds_schema.validate(ds)
<xarray.Dataset> Size: 216B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 1.0 2.0 3.0
Dimensions without coordinates: y
Data variables:
temperature (x, y) float64 96B 0.1209 0.9323 0.8148 ... 0.616 0.5962 0.7314
pressure (x, y) float64 96B 0.1558 0.4252 0.6478 ... 0.2206 0.8422 0.772See also
IO Serialization for saving and loading schemas as YAML or JSON.