DataFrameSchema.__init__(columns=None, checks=None, index=None, pandas_dtype=None, transformer=None, coerce=False, strict=False, name=None, ordered=False)[source]

Initialize DataFrameSchema validator.

  • columns (mapping of column names and column schema component.) – a dict where keys are column names and values are Column objects specifying the datatypes and properties of a particular column.

  • checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – dataframe-wide checks.

  • index – specify the datatypes and properties of the index.

  • pandas_dtype (Union[str, type, PandasDtype, ExtensionDtype, None]) – datatype of the dataframe. This overrides the data types specified in any of the columns. If a string is specified, then assumes one of the valid pandas string values:

  • transformer (Optional[Callable]) – a callable with signature: pandas.DataFrame -> pandas.DataFrame. If specified, calling validate will verify properties of the columns and return the transformed dataframe object.

  • coerce (bool) – whether or not to coerce all of the columns on validation.

  • strict – whether or not to accept columns in the dataframe that aren’t in the DataFrameSchema.

  • name (Optional[str]) – name of the schema.

  • ordered (bool) – whether or not to validate the columns order.


SchemaInitError – if impossible to build schema from parameters


>>> import pandera as pa
>>> schema = pa.DataFrameSchema({
...     "str_column": pa.Column(pa.String),
...     "float_column": pa.Column(pa.Float),
...     "int_column": pa.Column(pa.Int),
...     "date_column": pa.Column(pa.DateTime),
... })

Use the pandas API to define checks, which takes a function with the signature: pd.Series -> Union[bool, pd.Series] where the output series contains boolean values.

>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         pa.Float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         pa.String, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })

See here for more usage details.