Data Modelsยถ
DataArrayModel and
DatasetModel provide a class-based,
pydantic-style API for defining xarray schemas โ the same pattern as
DataFrameModel for pandas.
Type annotations and Field()
descriptors define the schema; call validate() or to_schema() to use it.
The imperative counterparts are DataArray Schemas and Dataset Schemas.
Annotation Coordinate vs component Coordinateยถ
Two objects share the name โCoordinateโ. To avoid confusion, the convention in this guide is to prefix with the module path when context is ambiguous:
Coordinate(pandera.typing.xarray.Coordinate) โ a typing marker used in model annotations, analogous toIndexin aDataFrameModel.Coordinate(pandera.api.xarray.components.Coordinate, also available aspa.Coordinate) โ the schema component you pass toDataArraySchema(coords=...)orDatasetSchema(coords=...).
from pandera.typing.xarray import Coordinate # annotation marker
import pandera.xarray as pa
pa.Coordinate(dtype=float) # imperative component
<pandera.api.xarray.components.Coordinate at 0x75b3fdb2f550>
DataArrayModelยถ
Basic usageยถ
Every DataArrayModel must define a data field whose type annotation is
the array dtype. Other fields use Coordinate[dtype] to declare coordinate
schemas. Schema-level options live on a nested Config class.
import numpy as np
import xarray as xr
class Temperature(pa.DataArrayModel):
data: np.float64 = pa.Field()
time: Coordinate[np.float64]
lat: Coordinate[np.float64]
lon: Coordinate[np.float64]
class Config:
dims = ("time", "lat", "lon")
name = "temperature"
da = xr.DataArray(
np.random.rand(12, 180, 360),
dims=("time", "lat", "lon"),
coords={
"time": np.arange(12, dtype=np.float64),
"lat": np.linspace(-89.5, 89.5, 180),
"lon": np.linspace(-179.5, 179.5, 360),
},
name="temperature",
)
Temperature.validate(da)
<xarray.DataArray 'temperature' (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0.63974457, 0.26524558, 0.63006648, ..., 0.4321435 ,
0.36229183, 0.59286676],
[0.27191546, 0.94152827, 0.58565502, ..., 0.14886646,
0.12175208, 0.3450944 ],
[0.85260456, 0.4214471 , 0.35091739, ..., 0.85773377,
0.09952705, 0.62496683],
...,
[0.14123247, 0.16678671, 0.27509324, ..., 0.64963559,
0.11571173, 0.50582554],
[0.71946057, 0.8219462 , 0.98002777, ..., 0.91143668,
0.56405048, 0.55188396],
[0.24675554, 0.90353178, 0.05141815, ..., 0.89156497,
0.21416824, 0.77706951]],
[[0.24145534, 0.66254542, 0.55530239, ..., 0.05122572,
0.07354845, 0.23920661],
[0.92069292, 0.78467761, 0.11412356, ..., 0.59072757,
0.25972443, 0.79464703],
[0.10336437, 0.69829642, 0.80650221, ..., 0.74404321,
0.23735362, 0.10710934],
...
[0.52703942, 0.1395948 , 0.21657352, ..., 0.63805663,
0.75164889, 0.32302268],
[0.26169817, 0.12997801, 0.5754408 , ..., 0.55745522,
0.51233137, 0.91236648],
[0.13537324, 0.78387427, 0.01710389, ..., 0.12313294,
0.01862873, 0.42942545]],
[[0.53172355, 0.30484472, 0.26774834, ..., 0.21620514,
0.64003276, 0.11498758],
[0.00349957, 0.7468368 , 0.65371721, ..., 0.06059852,
0.00382282, 0.33405755],
[0.58819199, 0.46274337, 0.68517959, ..., 0.03491337,
0.15580067, 0.38139553],
...,
[0.8155553 , 0.27790271, 0.96884469, ..., 0.03557298,
0.1179469 , 0.22276687],
[0.11270351, 0.461159 , 0.69143605, ..., 0.23625459,
0.81585242, 0.14759047],
[0.3871465 , 0.51139789, 0.34188184, ..., 0.18824142,
0.25880001, 0.18137871]]], shape=(12, 180, 360))
Coordinates:
* time (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0 11.0
* lat (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 3kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5Field names as attributesยถ
Accessing a class attribute on the model returns the coordinate or variable name as a string, useful for programmatic indexing:
print(Temperature.time)
print(Temperature.lat)
time
lat
Config optionsยถ
DataArrayModel.Config
(DataArrayConfig) accepts:
dtype, dims, sizes, shape, name, coerce, nullable,
strict_coords, strict_attrs, attrs, chunked, array_type.
These mirror the keyword arguments on
DataArraySchema.
Using Field on dataยถ
The data field can carry the same per-field structural constraints that
you would pass as DataArraySchema constructor arguments:
class Grid(pa.DataArrayModel):
data: np.float64 = pa.Field(
dims=("x", "y"),
sizes={"x": 3, "y": 4},
)
x: Coordinate[np.float64]
y: Coordinate[np.float64]
class Config:
name = "grid"
da_grid = xr.DataArray(
np.random.rand(3, 4),
dims=("x", "y"),
coords={
"x": np.arange(3, dtype=np.float64),
"y": np.arange(4, dtype=np.float64),
},
name="grid",
)
Grid.validate(da_grid)
<xarray.DataArray 'grid' (x: 3, y: 4)> Size: 96B
array([[0.93350701, 0.10289634, 0.18830035, 0.58045176],
[0.28439894, 0.20824653, 0.47871903, 0.96441649],
[0.91228247, 0.14490973, 0.27572526, 0.0962955 ]])
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
* y (y) float64 32B 0.0 1.0 2.0 3.0When both Field(dims=...) and Config.dims are set, the Field value
takes precedence.
Using Field on coordinatesยถ
Coordinate fields accept the same built-in check keywords as
Field(): eq, ge, le,
in_range, isin, etc. Plus nullable and coerce.
class Geo(pa.DataArrayModel):
data: np.float64 = pa.Field()
lat: Coordinate[np.float64] = pa.Field(ge=-90, le=90)
lon: Coordinate[np.float64] = pa.Field(ge=-180, le=180)
class Config:
dims = ("lat", "lon")
da_geo = xr.DataArray(
np.ones((5, 10)),
dims=("lat", "lon"),
coords={
"lat": np.linspace(-45, 45, 5),
"lon": np.linspace(-90, 90, 10),
},
)
Geo.validate(da_geo)
<xarray.DataArray (lat: 5, lon: 10)> Size: 400B
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
Coordinates:
* lat (lat) float64 40B -45.0 -22.5 0.0 22.5 45.0
* lon (lon) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0to_schema() and validate()ยถ
schema = Temperature.to_schema()
print(type(schema))
Temperature.validate(da)
<class 'pandera.api.xarray.container.DataArraySchema'>
<xarray.DataArray 'temperature' (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0.63974457, 0.26524558, 0.63006648, ..., 0.4321435 ,
0.36229183, 0.59286676],
[0.27191546, 0.94152827, 0.58565502, ..., 0.14886646,
0.12175208, 0.3450944 ],
[0.85260456, 0.4214471 , 0.35091739, ..., 0.85773377,
0.09952705, 0.62496683],
...,
[0.14123247, 0.16678671, 0.27509324, ..., 0.64963559,
0.11571173, 0.50582554],
[0.71946057, 0.8219462 , 0.98002777, ..., 0.91143668,
0.56405048, 0.55188396],
[0.24675554, 0.90353178, 0.05141815, ..., 0.89156497,
0.21416824, 0.77706951]],
[[0.24145534, 0.66254542, 0.55530239, ..., 0.05122572,
0.07354845, 0.23920661],
[0.92069292, 0.78467761, 0.11412356, ..., 0.59072757,
0.25972443, 0.79464703],
[0.10336437, 0.69829642, 0.80650221, ..., 0.74404321,
0.23735362, 0.10710934],
...
[0.52703942, 0.1395948 , 0.21657352, ..., 0.63805663,
0.75164889, 0.32302268],
[0.26169817, 0.12997801, 0.5754408 , ..., 0.55745522,
0.51233137, 0.91236648],
[0.13537324, 0.78387427, 0.01710389, ..., 0.12313294,
0.01862873, 0.42942545]],
[[0.53172355, 0.30484472, 0.26774834, ..., 0.21620514,
0.64003276, 0.11498758],
[0.00349957, 0.7468368 , 0.65371721, ..., 0.06059852,
0.00382282, 0.33405755],
[0.58819199, 0.46274337, 0.68517959, ..., 0.03491337,
0.15580067, 0.38139553],
...,
[0.8155553 , 0.27790271, 0.96884469, ..., 0.03557298,
0.1179469 , 0.22276687],
[0.11270351, 0.461159 , 0.69143605, ..., 0.23625459,
0.81585242, 0.14759047],
[0.3871465 , 0.51139789, 0.34188184, ..., 0.18824142,
0.25880001, 0.18137871]]], shape=(12, 180, 360))
Coordinates:
* time (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0 11.0
* lat (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
* lon (lon) float64 3kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5Error on missing data fieldยถ
If the data field is omitted, calling to_schema() raises
SchemaInitError:
class Bad(pa.DataArrayModel):
x: Coordinate[np.float64]
class Config:
dims = ("x",)
try:
Bad.to_schema()
except pa.errors.SchemaInitError as exc:
print(exc)
DataArrayModel requires a 'data' field.
DatasetModelยถ
Basic usageยถ
Data variable fields are annotated with a dtype, and coordinate fields use
Coordinate[dtype]:
class Surface(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x", "y"))
pressure: np.float64 = pa.Field(dims=("x", "y"))
x: Coordinate[np.float64]
y: Coordinate[np.float64]
class Config:
strict = True
ds = xr.Dataset(
{
"temperature": (("x", "y"), np.random.rand(3, 4)),
"pressure": (("x", "y"), np.random.rand(3, 4)),
},
coords={
"x": np.arange(3, dtype=np.float64),
"y": np.arange(4, dtype=np.float64),
},
)
Surface.validate(ds)
<xarray.Dataset> Size: 248B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
* y (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
temperature (x, y) float64 96B 0.106 0.8133 0.4142 ... 0.7453 0.03316
pressure (x, y) float64 96B 0.6136 0.7724 0.6049 ... 0.4675 0.332 0.5909Config optionsยถ
DatasetModel.Config
(DatasetConfig) accepts:
strict, strict_coords, strict_attrs, dims, sizes, plus the
common name, title, description, coerce.
Field on data variablesยถ
Field on a data-variable annotation supports dims, sizes, shape,
aligned_with, broadcastable_with, required, and all the built-in check
keywords:
class BoundedGrid(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x", "y"), ge=150, le=350)
x: Coordinate[np.float64]
y: Coordinate[np.float64]
ds_bounded = xr.Dataset(
{"temperature": (("x", "y"), np.full((3, 4), 273.15))},
coords={
"x": np.arange(3, dtype=np.float64),
"y": np.arange(4, dtype=np.float64),
},
)
BoundedGrid.validate(ds_bounded)
<xarray.Dataset> Size: 152B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
* y (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
temperature (x, y) float64 96B 273.1 273.1 273.1 ... 273.1 273.1 273.1Nested DataArrayModelยถ
Instead of a bare dtype, annotate a data variable with a DataArrayModel
subclass to reuse a full array schema:
class TemperatureArray(pa.DataArrayModel):
data: np.float64 = pa.Field()
time: Coordinate[np.float64]
class Config:
dims = ("time",)
name = "temperature"
class Climate(pa.DatasetModel):
temperature: TemperatureArray
time: Coordinate[np.float64]
ds_climate = xr.Dataset(
{"temperature": (("time",), np.ones(12))},
coords={"time": np.arange(12, dtype=np.float64)},
)
Climate.validate(ds_climate)
<xarray.Dataset> Size: 192B
Dimensions: (time: 12)
Coordinates:
* time (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 8.0 9.0 10.0 11.0
Data variables:
temperature (time) float64 96B 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0The nested model compiles to a
DataArraySchema inside the datasetโs
data_vars.
Optional variablesยถ
Use T | None with Field(required=False):
class Flexible(pa.DatasetModel):
required_var: np.float64 = pa.Field(dims=("x",))
optional_var: np.float64 | None = pa.Field(dims=("x",), required=False)
x: Coordinate[np.float64]
ds_minimal = xr.Dataset(
{"required_var": (("x",), np.ones(3))},
coords={"x": np.arange(3, dtype=np.float64)},
)
Flexible.validate(ds_minimal)
<xarray.Dataset> Size: 48B
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
required_var (x) float64 24B 1.0 1.0 1.0to_schema() and validate()ยถ
schema = Surface.to_schema()
print(type(schema))
Surface.validate(ds)
<class 'pandera.api.xarray.container.DatasetSchema'>
<xarray.Dataset> Size: 248B
Dimensions: (x: 3, y: 4)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
* y (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
temperature (x, y) float64 96B 0.106 0.8133 0.4142 ... 0.7453 0.03316
pressure (x, y) float64 96B 0.6136 0.7724 0.6049 ... 0.4675 0.332 0.5909Schema inheritanceยถ
Models support regular Python inheritance. Child classes inherit fields and
Config options, and can override them:
class BaseGrid(pa.DataArrayModel):
data: np.float64 = pa.Field()
x: Coordinate[np.float64]
class Config:
dims = ("x",)
class DetailedGrid(BaseGrid):
y: Coordinate[np.float64]
class Config:
dims = ("x", "y")
name = "detailed"
da_detailed = xr.DataArray(
np.ones((3, 4)),
dims=("x", "y"),
coords={
"x": np.arange(3, dtype=np.float64),
"y": np.arange(4, dtype=np.float64),
},
name="detailed",
)
DetailedGrid.validate(da_detailed)
<xarray.DataArray 'detailed' (x: 3, y: 4)> Size: 96B
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
* y (y) float64 32B 0.0 1.0 2.0 3.0Excluded attributesยถ
Class variables starting with an underscore (_) are excluded from the
model. Config is a reserved name.
See alsoยถ
DataArray Schemas / Dataset Schemas โ imperative API
DataTree Validation โ
DataTreeSchemaandDataTreeModelChecks and Parsers โ checks, parsers, lazy validation
Decorators โ
check_input,check_output,check_io, andcheck_typesConfiguration โ
ValidationDepth,ValidationScope, Dask, environment variablesDataFrame Models โ dataframe class-based API (same patterns)
Xarray โ full API reference for all xarray classes