pandera.api.dataframe.model.DataFrameModel¶
- class pandera.api.dataframe.model.DataFrameModel(*args, **kwargs)[source]¶
Base class for the DataFrame model.
See the User Guide for more.
Validate a DataFrame based on the schema specification.
- Parameters:
check_obj (pd.DataFrame) – the dataframe to be validated.
head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state – random seed for the
sample
argument.lazy – if True, lazily evaluates dataframe against all validation checks and raises a
SchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Returns:
validated
DataFrame
- Raises:
SchemaError – when
DataFrame
violates built-in or custom checks.
Methods
- classmethod empty(*_args)[source]¶
Create an empty DataFrame instance.
- Return type:
DataFrame
[~TDataFrameModel]
- classmethod example(cls, **kwargs)[source]¶
Generate an example of a particular size.
- Parameters:
size – number of elements in the generated DataFrame.
- Return type:
DataFrameBase
[~TDataFrameModel]- Returns:
DataFrame object.
- classmethod pydantic_validate(schema_model)[source]¶
Verify that the input is a compatible dataframe model.
- Return type:
- classmethod strategy(cls, **kwargs)[source]¶
Create a
hypothesis
strategy for generating a DataFrame.- Parameters:
size – number of elements to generate
n_regex_columns – number of regex columns to generate.
- Returns:
a strategy that generates DataFrame objects.
- classmethod to_schema()[source]¶
Create
DataFrameSchema
from theDataFrameModel
.- Return type:
~TSchema
- classmethod validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]¶
Validate a DataFrame based on the schema specification.
- Parameters:
check_obj (pd.DataFrame) – the dataframe to be validated.
head (
Optional
[int
]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.tail (
Optional
[int
]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.sample (
Optional
[int
]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.random_state (
Optional
[int
]) – random seed for thesample
argument.lazy (
bool
) – if True, lazily evaluates dataframe against all validation checks and raises aSchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace (
bool
) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Return type:
DataFrameBase
[~TDataFrameModel]- Returns:
validated
DataFrame
- Raises:
SchemaError – when
DataFrame
violates built-in or custom checks.