Data Modelsยถ

DataArrayModel and DatasetModel provide a class-based, pydantic-style API for defining xarray schemas โ€” the same pattern as DataFrameModel for pandas.

Type annotations and Field() descriptors define the schema; call validate() or to_schema() to use it.

The imperative counterparts are DataArray Schemas and Dataset Schemas.

Annotation Coordinate vs component Coordinateยถ

Two objects share the name โ€œCoordinateโ€. To avoid confusion, the convention in this guide is to prefix with the module path when context is ambiguous:

  • Coordinate (pandera.typing.xarray.Coordinate) โ€” a typing marker used in model annotations, analogous to Index in a DataFrameModel.

  • Coordinate (pandera.api.xarray.components.Coordinate, also available as pa.Coordinate) โ€” the schema component you pass to DataArraySchema(coords=...) or DatasetSchema(coords=...).

from pandera.typing.xarray import Coordinate  # annotation marker
import pandera.xarray as pa

pa.Coordinate(dtype=float)  # imperative component
<pandera.api.xarray.components.Coordinate at 0x75b3fdb2f550>

DataArrayModelยถ

Basic usageยถ

Every DataArrayModel must define a data field whose type annotation is the array dtype. Other fields use Coordinate[dtype] to declare coordinate schemas. Schema-level options live on a nested Config class.

import numpy as np
import xarray as xr

class Temperature(pa.DataArrayModel):
    data: np.float64 = pa.Field()
    time: Coordinate[np.float64]
    lat: Coordinate[np.float64]
    lon: Coordinate[np.float64]

    class Config:
        dims = ("time", "lat", "lon")
        name = "temperature"

da = xr.DataArray(
    np.random.rand(12, 180, 360),
    dims=("time", "lat", "lon"),
    coords={
        "time": np.arange(12, dtype=np.float64),
        "lat": np.linspace(-89.5, 89.5, 180),
        "lon": np.linspace(-179.5, 179.5, 360),
    },
    name="temperature",
)
Temperature.validate(da)
<xarray.DataArray 'temperature' (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0.63974457, 0.26524558, 0.63006648, ..., 0.4321435 ,
         0.36229183, 0.59286676],
        [0.27191546, 0.94152827, 0.58565502, ..., 0.14886646,
         0.12175208, 0.3450944 ],
        [0.85260456, 0.4214471 , 0.35091739, ..., 0.85773377,
         0.09952705, 0.62496683],
        ...,
        [0.14123247, 0.16678671, 0.27509324, ..., 0.64963559,
         0.11571173, 0.50582554],
        [0.71946057, 0.8219462 , 0.98002777, ..., 0.91143668,
         0.56405048, 0.55188396],
        [0.24675554, 0.90353178, 0.05141815, ..., 0.89156497,
         0.21416824, 0.77706951]],

       [[0.24145534, 0.66254542, 0.55530239, ..., 0.05122572,
         0.07354845, 0.23920661],
        [0.92069292, 0.78467761, 0.11412356, ..., 0.59072757,
         0.25972443, 0.79464703],
        [0.10336437, 0.69829642, 0.80650221, ..., 0.74404321,
         0.23735362, 0.10710934],
...
        [0.52703942, 0.1395948 , 0.21657352, ..., 0.63805663,
         0.75164889, 0.32302268],
        [0.26169817, 0.12997801, 0.5754408 , ..., 0.55745522,
         0.51233137, 0.91236648],
        [0.13537324, 0.78387427, 0.01710389, ..., 0.12313294,
         0.01862873, 0.42942545]],

       [[0.53172355, 0.30484472, 0.26774834, ..., 0.21620514,
         0.64003276, 0.11498758],
        [0.00349957, 0.7468368 , 0.65371721, ..., 0.06059852,
         0.00382282, 0.33405755],
        [0.58819199, 0.46274337, 0.68517959, ..., 0.03491337,
         0.15580067, 0.38139553],
        ...,
        [0.8155553 , 0.27790271, 0.96884469, ..., 0.03557298,
         0.1179469 , 0.22276687],
        [0.11270351, 0.461159  , 0.69143605, ..., 0.23625459,
         0.81585242, 0.14759047],
        [0.3871465 , 0.51139789, 0.34188184, ..., 0.18824142,
         0.25880001, 0.18137871]]], shape=(12, 180, 360))
Coordinates:
  * time     (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0 11.0
  * lat      (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 3kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5

Field names as attributesยถ

Accessing a class attribute on the model returns the coordinate or variable name as a string, useful for programmatic indexing:

print(Temperature.time)
print(Temperature.lat)
time
lat

Config optionsยถ

DataArrayModel.Config (DataArrayConfig) accepts: dtype, dims, sizes, shape, name, coerce, nullable, strict_coords, strict_attrs, attrs, chunked, array_type.

These mirror the keyword arguments on DataArraySchema.

Using Field on dataยถ

The data field can carry the same per-field structural constraints that you would pass as DataArraySchema constructor arguments:

class Grid(pa.DataArrayModel):
    data: np.float64 = pa.Field(
        dims=("x", "y"),
        sizes={"x": 3, "y": 4},
    )
    x: Coordinate[np.float64]
    y: Coordinate[np.float64]

    class Config:
        name = "grid"

da_grid = xr.DataArray(
    np.random.rand(3, 4),
    dims=("x", "y"),
    coords={
        "x": np.arange(3, dtype=np.float64),
        "y": np.arange(4, dtype=np.float64),
    },
    name="grid",
)
Grid.validate(da_grid)
<xarray.DataArray 'grid' (x: 3, y: 4)> Size: 96B
array([[0.93350701, 0.10289634, 0.18830035, 0.58045176],
       [0.28439894, 0.20824653, 0.47871903, 0.96441649],
       [0.91228247, 0.14490973, 0.27572526, 0.0962955 ]])
Coordinates:
  * x        (x) float64 24B 0.0 1.0 2.0
  * y        (y) float64 32B 0.0 1.0 2.0 3.0

When both Field(dims=...) and Config.dims are set, the Field value takes precedence.

Using Field on coordinatesยถ

Coordinate fields accept the same built-in check keywords as Field(): eq, ge, le, in_range, isin, etc. Plus nullable and coerce.

class Geo(pa.DataArrayModel):
    data: np.float64 = pa.Field()
    lat: Coordinate[np.float64] = pa.Field(ge=-90, le=90)
    lon: Coordinate[np.float64] = pa.Field(ge=-180, le=180)

    class Config:
        dims = ("lat", "lon")

da_geo = xr.DataArray(
    np.ones((5, 10)),
    dims=("lat", "lon"),
    coords={
        "lat": np.linspace(-45, 45, 5),
        "lon": np.linspace(-90, 90, 10),
    },
)
Geo.validate(da_geo)
<xarray.DataArray (lat: 5, lon: 10)> Size: 400B
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
Coordinates:
  * lat      (lat) float64 40B -45.0 -22.5 0.0 22.5 45.0
  * lon      (lon) float64 80B -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0

to_schema() and validate()ยถ

schema = Temperature.to_schema()
print(type(schema))

Temperature.validate(da)
<class 'pandera.api.xarray.container.DataArraySchema'>
<xarray.DataArray 'temperature' (time: 12, lat: 180, lon: 360)> Size: 6MB
array([[[0.63974457, 0.26524558, 0.63006648, ..., 0.4321435 ,
         0.36229183, 0.59286676],
        [0.27191546, 0.94152827, 0.58565502, ..., 0.14886646,
         0.12175208, 0.3450944 ],
        [0.85260456, 0.4214471 , 0.35091739, ..., 0.85773377,
         0.09952705, 0.62496683],
        ...,
        [0.14123247, 0.16678671, 0.27509324, ..., 0.64963559,
         0.11571173, 0.50582554],
        [0.71946057, 0.8219462 , 0.98002777, ..., 0.91143668,
         0.56405048, 0.55188396],
        [0.24675554, 0.90353178, 0.05141815, ..., 0.89156497,
         0.21416824, 0.77706951]],

       [[0.24145534, 0.66254542, 0.55530239, ..., 0.05122572,
         0.07354845, 0.23920661],
        [0.92069292, 0.78467761, 0.11412356, ..., 0.59072757,
         0.25972443, 0.79464703],
        [0.10336437, 0.69829642, 0.80650221, ..., 0.74404321,
         0.23735362, 0.10710934],
...
        [0.52703942, 0.1395948 , 0.21657352, ..., 0.63805663,
         0.75164889, 0.32302268],
        [0.26169817, 0.12997801, 0.5754408 , ..., 0.55745522,
         0.51233137, 0.91236648],
        [0.13537324, 0.78387427, 0.01710389, ..., 0.12313294,
         0.01862873, 0.42942545]],

       [[0.53172355, 0.30484472, 0.26774834, ..., 0.21620514,
         0.64003276, 0.11498758],
        [0.00349957, 0.7468368 , 0.65371721, ..., 0.06059852,
         0.00382282, 0.33405755],
        [0.58819199, 0.46274337, 0.68517959, ..., 0.03491337,
         0.15580067, 0.38139553],
        ...,
        [0.8155553 , 0.27790271, 0.96884469, ..., 0.03557298,
         0.1179469 , 0.22276687],
        [0.11270351, 0.461159  , 0.69143605, ..., 0.23625459,
         0.81585242, 0.14759047],
        [0.3871465 , 0.51139789, 0.34188184, ..., 0.18824142,
         0.25880001, 0.18137871]]], shape=(12, 180, 360))
Coordinates:
  * time     (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 7.0 8.0 9.0 10.0 11.0
  * lat      (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 3kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5

Error on missing data fieldยถ

If the data field is omitted, calling to_schema() raises SchemaInitError:

class Bad(pa.DataArrayModel):
    x: Coordinate[np.float64]

    class Config:
        dims = ("x",)

try:
    Bad.to_schema()
except pa.errors.SchemaInitError as exc:
    print(exc)
DataArrayModel requires a 'data' field.

DatasetModelยถ

Basic usageยถ

Data variable fields are annotated with a dtype, and coordinate fields use Coordinate[dtype]:

class Surface(pa.DatasetModel):
    temperature: np.float64 = pa.Field(dims=("x", "y"))
    pressure: np.float64 = pa.Field(dims=("x", "y"))
    x: Coordinate[np.float64]
    y: Coordinate[np.float64]

    class Config:
        strict = True

ds = xr.Dataset(
    {
        "temperature": (("x", "y"), np.random.rand(3, 4)),
        "pressure": (("x", "y"), np.random.rand(3, 4)),
    },
    coords={
        "x": np.arange(3, dtype=np.float64),
        "y": np.arange(4, dtype=np.float64),
    },
)
Surface.validate(ds)
<xarray.Dataset> Size: 248B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
  * y            (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
    temperature  (x, y) float64 96B 0.106 0.8133 0.4142 ... 0.7453 0.03316
    pressure     (x, y) float64 96B 0.6136 0.7724 0.6049 ... 0.4675 0.332 0.5909

Config optionsยถ

DatasetModel.Config (DatasetConfig) accepts: strict, strict_coords, strict_attrs, dims, sizes, plus the common name, title, description, coerce.

Field on data variablesยถ

Field on a data-variable annotation supports dims, sizes, shape, aligned_with, broadcastable_with, required, and all the built-in check keywords:

class BoundedGrid(pa.DatasetModel):
    temperature: np.float64 = pa.Field(dims=("x", "y"), ge=150, le=350)
    x: Coordinate[np.float64]
    y: Coordinate[np.float64]

ds_bounded = xr.Dataset(
    {"temperature": (("x", "y"), np.full((3, 4), 273.15))},
    coords={
        "x": np.arange(3, dtype=np.float64),
        "y": np.arange(4, dtype=np.float64),
    },
)
BoundedGrid.validate(ds_bounded)
<xarray.Dataset> Size: 152B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
  * y            (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
    temperature  (x, y) float64 96B 273.1 273.1 273.1 ... 273.1 273.1 273.1

Nested DataArrayModelยถ

Instead of a bare dtype, annotate a data variable with a DataArrayModel subclass to reuse a full array schema:

class TemperatureArray(pa.DataArrayModel):
    data: np.float64 = pa.Field()
    time: Coordinate[np.float64]

    class Config:
        dims = ("time",)
        name = "temperature"

class Climate(pa.DatasetModel):
    temperature: TemperatureArray
    time: Coordinate[np.float64]

ds_climate = xr.Dataset(
    {"temperature": (("time",), np.ones(12))},
    coords={"time": np.arange(12, dtype=np.float64)},
)
Climate.validate(ds_climate)
<xarray.Dataset> Size: 192B
Dimensions:      (time: 12)
Coordinates:
  * time         (time) float64 96B 0.0 1.0 2.0 3.0 4.0 ... 8.0 9.0 10.0 11.0
Data variables:
    temperature  (time) float64 96B 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0

The nested model compiles to a DataArraySchema inside the datasetโ€™s data_vars.

Optional variablesยถ

Use T | None with Field(required=False):

class Flexible(pa.DatasetModel):
    required_var: np.float64 = pa.Field(dims=("x",))
    optional_var: np.float64 | None = pa.Field(dims=("x",), required=False)
    x: Coordinate[np.float64]

ds_minimal = xr.Dataset(
    {"required_var": (("x",), np.ones(3))},
    coords={"x": np.arange(3, dtype=np.float64)},
)
Flexible.validate(ds_minimal)
<xarray.Dataset> Size: 48B
Dimensions:       (x: 3)
Coordinates:
  * x             (x) float64 24B 0.0 1.0 2.0
Data variables:
    required_var  (x) float64 24B 1.0 1.0 1.0

to_schema() and validate()ยถ

schema = Surface.to_schema()
print(type(schema))

Surface.validate(ds)
<class 'pandera.api.xarray.container.DatasetSchema'>
<xarray.Dataset> Size: 248B
Dimensions:      (x: 3, y: 4)
Coordinates:
  * x            (x) float64 24B 0.0 1.0 2.0
  * y            (y) float64 32B 0.0 1.0 2.0 3.0
Data variables:
    temperature  (x, y) float64 96B 0.106 0.8133 0.4142 ... 0.7453 0.03316
    pressure     (x, y) float64 96B 0.6136 0.7724 0.6049 ... 0.4675 0.332 0.5909

Schema inheritanceยถ

Models support regular Python inheritance. Child classes inherit fields and Config options, and can override them:

class BaseGrid(pa.DataArrayModel):
    data: np.float64 = pa.Field()
    x: Coordinate[np.float64]

    class Config:
        dims = ("x",)

class DetailedGrid(BaseGrid):
    y: Coordinate[np.float64]

    class Config:
        dims = ("x", "y")
        name = "detailed"

da_detailed = xr.DataArray(
    np.ones((3, 4)),
    dims=("x", "y"),
    coords={
        "x": np.arange(3, dtype=np.float64),
        "y": np.arange(4, dtype=np.float64),
    },
    name="detailed",
)
DetailedGrid.validate(da_detailed)
<xarray.DataArray 'detailed' (x: 3, y: 4)> Size: 96B
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
Coordinates:
  * x        (x) float64 24B 0.0 1.0 2.0
  * y        (y) float64 32B 0.0 1.0 2.0 3.0

Excluded attributesยถ

Class variables starting with an underscore (_) are excluded from the model. Config is a reserved name.

See alsoยถ