CF Convention Checks¶
The Climate and Forecast (CF) conventions are a widely used metadata standard for geoscience data. Pandera provides built-in checks that validate CF metadata attributes on xarray objects.
There are two categories:
Category |
Requires |
What it checks |
|---|---|---|
Lightweight ( |
No |
Inspects |
Accessor-based ( |
Yes |
Uses the |
Lightweight CF checks¶
These checks inspect .attrs directly and do not need any extra dependencies.
Check.cf_standard_name()¶
Require that .attrs["standard_name"] equals a specific value:
import numpy as np
import xarray as xr
import pandera.xarray as pa
da = xr.DataArray(
np.ones((3, 4)),
dims=("x", "y"),
attrs={"standard_name": "air_temperature", "units": "K"},
)
pa.DataArraySchema(
checks=pa.Check.cf_standard_name("air_temperature"),
).validate(da)
<xarray.DataArray (x: 3, y: 4)> Size: 96B
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Dimensions without coordinates: x, y
Attributes:
standard_name: air_temperature
units: KWhen the standard name doesn’t match:
try:
pa.DataArraySchema(
checks=pa.Check.cf_standard_name("sea_surface_temperature"),
).validate(da)
except pa.errors.SchemaError as exc:
print(exc)
DataArraySchema 'None' failed series or dataframe validator 0: <Check cf_standard_name: cf_standard_name('sea_surface_temperature')>
Check.cf_units()¶
Require that .attrs["units"] equals a specific value:
pa.DataArraySchema(
checks=pa.Check.cf_units("K"),
).validate(da)
<xarray.DataArray (x: 3, y: 4)> Size: 96B
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Dimensions without coordinates: x, y
Attributes:
standard_name: air_temperature
units: Ktry:
pa.DataArraySchema(
checks=pa.Check.cf_units("degC"),
).validate(da)
except pa.errors.SchemaError as exc:
print(exc)
DataArraySchema 'None' failed series or dataframe validator 0: <Check cf_units: cf_units('degC')>
Check.cf_has_cell_methods()¶
Require that .attrs["cell_methods"] equals a specific string:
da_cell = xr.DataArray(
np.ones(12),
dims="time",
attrs={"cell_methods": "time: mean"},
)
pa.DataArraySchema(
checks=pa.Check.cf_has_cell_methods("time: mean"),
).validate(da_cell)
<xarray.DataArray (time: 12)> Size: 96B
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dimensions without coordinates: time
Attributes:
cell_methods: time: meanCombining CF checks¶
Multiple CF checks can be combined in a single schema:
schema = pa.DataArraySchema(
dims=("time", "lat", "lon"),
checks=[
pa.Check.cf_standard_name("air_temperature"),
pa.Check.cf_units("K"),
],
)
da_3d = xr.DataArray(
np.ones((12, 5, 10)),
dims=("time", "lat", "lon"),
attrs={
"standard_name": "air_temperature",
"units": "K",
},
)
schema.validate(da_3d)
<xarray.DataArray (time: 12, lat: 5, lon: 10)> Size: 5kB
array([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
...
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]])
Dimensions without coordinates: time, lat, lon
Attributes:
standard_name: air_temperature
units: Kcf_xarray accessor check¶
Check.cf_has_standard_names()¶
This check requires the cf_xarray
package and verifies that each listed standard name is resolvable via the
.cf accessor:
# pip install cf_xarray
import cf_xarray # noqa: F401
ds = xr.Dataset({
"T": (("x",), np.ones(3), {"standard_name": "air_temperature"}),
"P": (("x",), np.ones(3), {"standard_name": "air_pressure"}),
})
schema = pa.DatasetSchema(
checks=pa.Check.cf_has_standard_names(
("air_temperature", "air_pressure"),
),
)
schema.validate(ds)
When a standard name cannot be resolved, the check fails:
try:
pa.DatasetSchema(
checks=pa.Check.cf_has_standard_names(
("air_temperature", "sea_surface_height"),
),
).validate(ds)
except pa.errors.SchemaError as exc:
print(exc)
Note
If cf_xarray is not installed, Check.cf_has_standard_names() raises an
ImportError with installation instructions. The lightweight checks
(cf_standard_name, cf_units, cf_has_cell_methods) never require
cf_xarray.
Dataset-level CF validation¶
CF checks work on both DataArray and Dataset objects. On a Dataset,
the check receives the entire dataset:
ds_cf = xr.Dataset(
{"temp": (("x",), np.ones(3))},
attrs={"standard_name": "air_temperature", "units": "K"},
)
pa.DatasetSchema(
data_vars={"temp": pa.DataVar(dims=("x",))},
checks=[
pa.Check.cf_standard_name("air_temperature"),
pa.Check.cf_units("K"),
],
).validate(ds_cf)
<xarray.Dataset> Size: 24B
Dimensions: (x: 3)
Dimensions without coordinates: x
Data variables:
temp (x) float64 24B 1.0 1.0 1.0
Attributes:
standard_name: air_temperature
units: KPer-variable CF checks¶
To validate CF attributes on individual data variables, attach checks to the
DataVar:
ds_vars = xr.Dataset({
"temperature": xr.DataArray(
np.ones((3, 4)),
dims=("x", "y"),
attrs={"standard_name": "air_temperature", "units": "K"},
),
"pressure": xr.DataArray(
np.ones((3, 4)),
dims=("x", "y"),
attrs={"standard_name": "air_pressure", "units": "Pa"},
),
})
schema = pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(
dims=("x", "y"),
checks=[
pa.Check.cf_standard_name("air_temperature"),
pa.Check.cf_units("K"),
],
),
"pressure": pa.DataVar(
dims=("x", "y"),
checks=[
pa.Check.cf_standard_name("air_pressure"),
pa.Check.cf_units("Pa"),
],
),
},
)
schema.validate(ds_vars)
<xarray.Dataset> Size: 192B
Dimensions: (x: 3, y: 4)
Dimensions without coordinates: x, y
Data variables:
temperature (x, y) float64 96B 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
pressure (x, y) float64 96B 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0Combining CF checks with attrs validation¶
CF checks complement the attrs parameter. Use attrs for structural
attribute requirements and CF checks for semantic validation:
schema = pa.DataArraySchema(
attrs={
"standard_name": "air_temperature",
"units": "^(K|degC|degF)$",
},
checks=[
pa.Check.cf_standard_name("air_temperature"),
],
)
da_combined = xr.DataArray(
np.ones(3),
dims="x",
attrs={"standard_name": "air_temperature", "units": "K"},
)
schema.validate(da_combined)
<xarray.DataArray (x: 3)> Size: 24B
array([1., 1., 1.])
Dimensions without coordinates: x
Attributes:
standard_name: air_temperature
units: KSee also¶
Checks and Parsers — general check and parser usage
DataArray Schemas — attribute validation modes
Dataset Schemas — dataset attributes and
DataVarchecksEncoding Validation — encoding validation (related metadata)
CF Conventions — the CF metadata standard
cf_xarray documentation — the
cf_xarraypackage