pandera.check_input

pandera.check_input(schema, obj_getter=None, head=None, tail=None, sample=None, random_state=None, lazy=False)[source]

Validate function argument when function is called.

This is a decorator function that validates the schema of a dataframe argument in a function. Note that if a transformer is specified by the schema, the decorator will return the transformed dataframe, which will be passed into the decorated function.

Parameters
  • schema (Union[DataFrameSchema, SeriesSchema]) – dataframe/series schema object

  • obj_getter (Union[str, int, None]) – (Default value = None) if int, obj_getter refers to the the index of the pandas dataframe/series to be validated in the args part of the function signature. If str, obj_getter refers to the argument name of the pandas dataframe/series in the function signature. This works even if the series/dataframe is passed in as a positional argument when the function is called. If None, assumes that the dataframe/series is the first argument of the decorated function

  • head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrorReport. Otherwise, raise SchemaError as soon as one occurs.

Return type

Callable

Returns

wrapped function

Example

Check the input of a decorated function.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({"column": pa.Column(pa.Int)})
>>>
>>> @pa.check_input(schema)
... def transform_data(df: pd.DataFrame) -> pd.DataFrame:
...     df["doubled_column"] = df["column"] * 2
...     return df
>>>
>>> df = pd.DataFrame({
...     "column": range(5),
... })
>>>
>>> transform_data(df)
   column  doubled_column
0       0               0
1       1               2
2       2               4
3       3               6
4       4               8

See here for more usage details.