pandera.DataFrameSchema.__init__

DataFrameSchema.__init__(columns=None, checks=None, index=None, transformer=None, coerce=False, strict=False, name=None)[source]

Initialize DataFrameSchema validator.

Parameters
  • columns (mapping of column names and column schema component.) – a dict where keys are column names and values are Column objects specifying the datatypes and properties of a particular column.

  • checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – dataframe-wide checks.

  • index – specify the datatypes and properties of the index.

  • transformer (Optional[Callable]) – a callable with signature: pandas.DataFrame -> pandas.DataFrame. If specified, calling validate will verify properties of the columns and return the transformed dataframe object.

  • coerce (bool) – whether or not to coerce all of the columns on validation.

  • strict – whether or not to accept columns in the dataframe that aren’t in the DataFrameSchema.

  • name (Optional[str]) – name of the schema.

Raises

SchemaInitError – if impossible to build schema from parameters

Examples

>>> import pandera as pa
>>>
>>> schema = pa.DataFrameSchema({
...     "str_column": pa.Column(pa.String),
...     "float_column": pa.Column(pa.Float),
...     "int_column": pa.Column(pa.Int),
...     "date_column": pa.Column(pa.DateTime),
... })

Use the pandas API to define checks, which takes a function with the signature: pd.Series -> Union[bool, pd.Series] where the output series contains boolean values.

>>> from pandera import Check
>>>
>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         pa.Float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         pa.String, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })

See here for more usage details.

Return type

None