pandera.check_output

pandera.check_output(schema, obj_getter=None, head=None, tail=None, sample=None, random_state=None, lazy=False)[source]

Validate function output.

Similar to input validator, but validates the output of the decorated function. Note that the transformer function supplied to the DataFrameSchema will not have an effect in the check_output schema validator.

Parameters
  • schema (Union[DataFrameSchema, SeriesSchema]) – dataframe/series schema object

  • obj_getter (Union[int, str, Callable, None]) – (Default value = None) if int, assumes that the output of the decorated function is a list-like object, where obj_getter is the index of the pandas data dataframe/series to be validated. If str, expects that the output is a dict-like object, and obj_getter is the key pointing to the dataframe/series to be validated. If a callable is supplied, it expects the output of decorated function and should return the dataframe/series to be validated.

  • head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrorReport. Otherwise, raise SchemaError as soon as one occurs.

Return type

Callable

Returns

wrapped function

Example

Check the output a decorated function.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"doubled_column": pa.Column(pa.Int)},
...     checks=pa.Check(
...         lambda df: df["doubled_column"] == df["column"] * 2
...     )
... )
>>>
>>> @pa.check_output(schema)
... def transform_data(df: pd.DataFrame) -> pd.DataFrame:
...     df["doubled_column"] = df["column"] * 2
...     return df
>>>
>>> df = pd.DataFrame({"column": range(5)})
>>>
>>> transform_data(df)
   column  doubled_column
0       0               0
1       1               2
2       2               4
3       3               6
4       4               8

See here for more usage details.