pandera.decorators.check_input#

pandera.decorators.check_input(schema, obj_getter=None, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]#

Validate function argument when function is called.

This is a decorator function that validates the schema of a dataframe argument in a function.

Parameters

schema (Union[DataFrameSchema, SeriesSchema]) – dataframe/series schema object
obj_getter (Union[str, int, None]) – (Default value = None) if int, obj_getter refers to the the index of the pandas dataframe/series to be validated in the args part of the function signature. If str, obj_getter refers to the argument name of the pandas dataframe/series in the function signature. This works even if the series/dataframe is passed in as a positional argument when the function is called. If None, assumes that the dataframe/series is the first argument of the decorated function
head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state (Optional[int]) – random seed for the sample argument.
lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type

Callable[[~F], ~F]

Returns

wrapped function

Example

Check the input of a decorated function.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({"column": pa.Column(int)})
>>>
>>> @pa.check_input(schema)
... def transform_data(df: pd.DataFrame) -> pd.DataFrame:
...     df["doubled_column"] = df["column"] * 2
...     return df
>>>
>>> df = pd.DataFrame({
...     "column": range(5),
... })
>>>
>>> transform_data(df)
   column  doubled_column
0       0               0
1       1               2
2       2               4
3       3               6
4       4               8

See here for more usage details.