pandera.DataFrameSchema

class pandera.DataFrameSchema(columns=None, checks=None, index=None, transformer=None, coerce=False, strict=False, name=None)[source]

A light-weight pandas DataFrame validator.

Initialize DataFrameSchema validator.

Parameters
  • columns (mapping of column names and column schema component.) – a dict where keys are column names and values are Column objects specifying the datatypes and properties of a particular column.

  • checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – dataframe-wide checks.

  • index – specify the datatypes and properties of the index.

  • transformer (Optional[Callable]) – a callable with signature: pandas.DataFrame -> pandas.DataFrame. If specified, calling validate will verify properties of the columns and return the transformed dataframe object.

  • coerce (bool) – whether or not to coerce all of the columns on validation.

  • strict – whether or not to accept columns in the dataframe that aren’t in the DataFrameSchema.

  • name (Optional[str]) – name of the schema.

Raises

SchemaInitError – if impossible to build schema from parameters

Examples

>>> import pandera as pa
>>>
>>> schema = pa.DataFrameSchema({
...     "str_column": pa.Column(pa.String),
...     "float_column": pa.Column(pa.Float),
...     "int_column": pa.Column(pa.Int),
...     "date_column": pa.Column(pa.DateTime),
... })

Use the pandas API to define checks, which takes a function with the signature: pd.Series -> Union[bool, pd.Series] where the output series contains boolean values.

>>> from pandera import Check
>>>
>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         pa.Float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         pa.String, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })

See here for more usage details.

Attributes

coerce

Whether to coerce series to specified type.

dtype

A pandas style dtype dict where the keys are column names and values are pandas dtype for the column.

Methods

__init__

Initialize DataFrameSchema validator.

add_columns

Create a copy of the DataFrameSchema with extra columns.

from_yaml

Create DataFrameSchema from yaml file.

get_dtype

Same as the dtype property, but expands columns where regex == True based on the supplied dataframe.

remove_columns

Removes columns from a DataFrameSchema and returns a new copy.

rename_columns

Rename columns using a dictionary of key-value pairs.

select_columns

Select subset of columns in the schema.

to_yaml

Write DataFrameSchema to yaml file.

update_column

Create copy of a DataFrameSchema with updated column properties.

validate

Check if all columns in a dataframe have a column in the Schema.

__call__

Alias for DataFrameSchema.validate() method.