pandera.decorators.check_output#

pandera.decorators.check_output(schema, obj_getter=None, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]#

Validate function output.

Similar to input validator, but validates the output of the decorated function.

Parameters
  • schema (Union[DataFrameSchema, SeriesSchema]) – dataframe/series schema object

  • obj_getter (Union[str, int, Callable, None]) – (Default value = None) if int, assumes that the output of the decorated function is a list-like object, where obj_getter is the index of the pandas data dataframe/series to be validated. If str, expects that the output is a dict-like object, and obj_getter is the key pointing to the dataframe/series to be validated. If a callable is supplied, it expects the output of decorated function and should return the dataframe/series to be validated.

  • head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type

Callable[[~F], ~F]

Returns

wrapped function

Example

Check the output a decorated function.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"doubled_column": pa.Column(int)},
...     checks=pa.Check(
...         lambda df: df["doubled_column"] == df["column"] * 2
...     )
... )
>>>
>>> @pa.check_output(schema)
... def transform_data(df: pd.DataFrame) -> pd.DataFrame:
...     df["doubled_column"] = df["column"] * 2
...     return df
>>>
>>> df = pd.DataFrame({"column": range(5)})
>>>
>>> transform_data(df)
   column  doubled_column
0       0               0
1       1               2
2       2               4
3       3               6
4       4               8

See here for more usage details.