DataTree ValidationΒΆ
DataTree is a hierarchical tree of datasets, useful for
organising related multi-dimensional data under a single structure. Pandera
validates trees with DataTreeSchema
(imperative API) and DataTreeModel
(declarative API).
DataTreeSchemaΒΆ
Basic usageΒΆ
DataTreeSchema validates node-level attributes and child nodes. Each child
schema can be a DatasetSchema or another
DataTreeSchema for recursive nesting.
import numpy as np
import xarray as xr
import pandera.xarray as pa
schema = pa.DataTreeSchema(
attrs={"conventions": "CF-1.8"},
children={
"surface": pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x",)),
},
),
"upper": pa.DatasetSchema(
data_vars={
"wind": pa.DataVar(dtype=np.float64, dims=("x",)),
},
),
},
)
dt = xr.DataTree.from_dict({
"/": xr.Dataset(attrs={"conventions": "CF-1.8"}),
"/surface": xr.Dataset(
{"temperature": (("x",), np.ones(3, dtype=np.float64))},
coords={"x": np.arange(3, dtype=np.float64)},
),
"/upper": xr.Dataset(
{"wind": (("x",), np.ones(3, dtype=np.float64))},
coords={"x": np.arange(3, dtype=np.float64)},
),
})
schema.validate(dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0Path-based childrenΒΆ
Children can reference nested nodes using /-separated paths, just like
xr.DataTree.from_dict():
nested_dt = xr.DataTree.from_dict({
"/": xr.Dataset(attrs={"conventions": "CF-1.8"}),
"/surface": xr.Dataset(
{"temperature": (("x",), np.ones(3, dtype=np.float64))},
coords={"x": np.arange(3, dtype=np.float64)},
),
"/surface/diagnostics": xr.Dataset(
{"rmse": (("x",), np.ones(3, dtype=np.float64))},
coords={"x": np.arange(3, dtype=np.float64)},
),
})
schema = pa.DataTreeSchema(
children={
"surface/diagnostics": pa.DatasetSchema(
data_vars={"rmse": pa.DataVar(dtype=np.float64)},
),
},
)
schema.validate(nested_dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /surface/diagnostics
Dimensions: (x: 3)
Data variables:
rmse (x) float64 24B 1.0 1.0 1.0Root node datasetΒΆ
Use the dataset parameter to validate the dataset attached to the root node:
schema = pa.DataTreeSchema(
dataset=pa.DatasetSchema(attrs={"conventions": "CF-1.8"}),
children={
"surface": pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x",)),
},
),
},
)
schema.validate(dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0Strict modeΒΆ
When strict=True, unexpected child nodes raise a validation error:
schema = pa.DataTreeSchema(
children={
"surface": pa.DatasetSchema(),
"upper": pa.DatasetSchema(),
},
strict=True,
)
schema.validate(dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0strict_schema = pa.DataTreeSchema(
children={"surface": pa.DatasetSchema()},
strict=True,
)
try:
strict_schema.validate(dt)
except pa.errors.SchemaError as exc:
print(exc)
unexpected child node 'upper'
Nested DataTreeSchemaΒΆ
Children can themselves be DataTreeSchema instances for deep validation:
schema = pa.DataTreeSchema(
attrs={"conventions": "CF-1.8"},
children={
"surface": pa.DataTreeSchema(
dataset=pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x",)),
},
),
children={
"diagnostics": pa.DatasetSchema(
data_vars={"rmse": pa.DataVar(dtype=np.float64)},
),
},
),
},
)
schema.validate(nested_dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /surface/diagnostics
Dimensions: (x: 3)
Data variables:
rmse (x) float64 24B 1.0 1.0 1.0DataTreeModelΒΆ
Basic usageΒΆ
DataTreeModel uses class attributes
annotated with DatasetModel subclasses to
declare child node schemas:
from pandera.typing.xarray import Coordinate
class SurfaceModel(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x",))
x: Coordinate[np.float64]
class UpperModel(pa.DatasetModel):
wind: np.float64 = pa.Field(dims=("x",))
x: Coordinate[np.float64]
class ClimateTree(pa.DataTreeModel):
surface: SurfaceModel
upper: UpperModel
class Config:
strict = True
ClimateTree.validate(dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0Config optionsΒΆ
DataTreeModel.Config
(DataTreeConfig) accepts:
strict, attrs, name.
Field name accessΒΆ
print(ClimateTree.surface)
print(ClimateTree.upper)
surface
upper
to_schema() and validate()ΒΆ
schema = ClimateTree.to_schema()
print(type(schema))
ClimateTree.validate(dt)
<class 'pandera.api.xarray.container.DataTreeSchema'>
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0@check_types with DataTreeΒΆ
Use DataTree[Model] from pandera.typing.xarray with the @check_types
decorator:
from pandera.typing.xarray import DataTree
@pa.check_types
def process_tree(tree: DataTree[ClimateTree]) -> DataTree[ClimateTree]:
return tree
process_tree(dt)
<xarray.DataTree>
Group: /
β Attributes:
β conventions: CF-1.8
βββ Group: /surface
β Dimensions: (x: 3)
β Coordinates:
β * x (x) float64 24B 0.0 1.0 2.0
β Data variables:
β temperature (x) float64 24B 1.0 1.0 1.0
βββ Group: /upper
Dimensions: (x: 3)
Coordinates:
* x (x) float64 24B 0.0 1.0 2.0
Data variables:
wind (x) float64 24B 1.0 1.0 1.0See alsoΒΆ
DataArray Schemas / Dataset Schemas β imperative API
Data Models β
DataArrayModelandDatasetModelDecorators β decorator-based validation