pandera.api.dataframe.model.DataFrameModel

class pandera.api.dataframe.model.DataFrameModel(*args, **kwargs)[source]

Base class for the DataFrame model.

See the User Guide for more.

Validate a DataFrame based on the schema specification.

Parameters:
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state – random seed for the sample argument.

  • lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.

Methods

classmethod build_schema_(**kwargs)[source]
Return type:

~TSchema

classmethod empty(*_args)[source]

Create an empty DataFrame instance.

Return type:

DataFrameBase[Self]

classmethod example(cls, **kwargs)[source]

Generate an example of this data model specification.

Return type:

DataFrameBase[Self]

classmethod from_json(source)[source]

Load a schema from JSON.

Parameters:

source – str, Path, or file stream with JSON content.

Returns:

the backend-specific schema object.

classmethod from_yaml(yaml_schema)[source]

Load a schema from YAML.

Parameters:

yaml_schema – str, Path, or file stream with YAML content.

Returns:

the backend-specific schema object.

classmethod get_metadata()[source]

Provide metadata for columns and schema level

Return type:

UnionType[dict, None]

classmethod pydantic_validate(schema_model)[source]

Verify that the input is a compatible dataframe model.

Return type:

DataFrameModel

classmethod strategy(cls, **kwargs)[source]

Create a data synthesis strategy.

classmethod to_json(target=None, **kwargs)[source]

Convert this model’s schema to JSON.

classmethod to_json_schema()[source]

Serialize schema metadata into json-schema format.

classmethod to_schema()[source]

Create DataFrameSchema from the DataFrameModel.

Return type:

~TSchema

classmethod to_yaml(stream=None)[source]

Convert this model’s schema to YAML.

classmethod validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]

Validate a DataFrame based on the schema specification.

Parameters:
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head (UnionType[int, None]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (UnionType[int, None]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (UnionType[int, None]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (UnionType[int, None]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

DataFrameBase[Self]

Returns:

validated DataFrame

Raises:

SchemaError – when DataFrame violates built-in or custom checks.