pandera.api.pandas.model.DataFrameModel#

class pandera.api.pandas.model.DataFrameModel(*args, **kwargs)[source]#

Definition of a DataFrameSchema.

new in 0.5.0

Important

This class is the new name for SchemaModel, which will be deprecated in pandera version 0.20.0.

See the User Guide for more.

Check if all columns in a dataframe have a column in the Schema.

Parameters
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state – random seed for the sample argument.

  • lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns

validated DataFrame

Raises

SchemaError – when DataFrame violates built-in or custom checks.

Example

Calling schema.validate returns the dataframe.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>> df = pd.DataFrame({
...     "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
...     "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
... })
>>>
>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         str, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })
>>>
>>> schema_withchecks.validate(df)[["probability", "category"]]
   probability category
0         0.10      dog
1         0.40      dog
2         0.52      cat
3         0.23     duck
4         0.80      dog
5         0.76      dog

Methods

example

Create a hypothesis strategy for generating a DataFrame.

pydantic_validate

Verify that the input is a compatible dataframe model.

strategy

Create a hypothesis strategy for generating a DataFrame.

to_schema

Create DataFrameSchema from the DataFrameModel.

to_yaml

Convert Schema to yaml using io.to_yaml.

validate

Check if all columns in a dataframe have a column in the Schema.