pandera.api.dataframe.components.ComponentSchema¶

class pandera.api.dataframe.components.ComponentSchema(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, name=None, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶

Base class for data container component, e.g. columns.

Initialize array schema.

Parameters:

dtype (UnionType[Any, None]) – datatype of the column.
checks (Union[Check, list[Union[Check, Hypothesis]], None]) –
If element_wise is True, then callable signature should be:

Callable[Any, bool] where the Any input is a scalar element in the column. Otherwise, the input is assumed to be a the data object (Series, DataFrame).
nullable (bool) – Whether or not column can contain null values.
unique (bool) – Whether or not column can contain duplicate values.
report_duplicates (Union[Literal[‘exclude_first’], Literal[‘exclude_last’], Literal[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurrence - exclude_last: report all duplicates except last occurrence - all: (default) report all duplicates
coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.
name (Any) – column name in dataframe to validate.
title (UnionType[str, None]) – A human-readable label for the series.
description (UnionType[str, None]) – An arbitrary textual description of the series.
metadata (UnionType[dict, None]) – An optional key-value data.
default (UnionType[Any, None]) – The default value for missing values in the series.
drop_invalid_rows (bool) – if True, drop invalid rows on validation.

Attributes

`BACKEND_REGISTRY`
`properties`	Get the properties of the schema for serialization purposes.

Methods

__init__(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, name=None, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶

Initialize array schema.

Parameters:

dtype (UnionType[Any, None]) – datatype of the column.
checks (Union[Check, list[Union[Check, Hypothesis]], None]) –
If element_wise is True, then callable signature should be:

Callable[Any, bool] where the Any input is a scalar element in the column. Otherwise, the input is assumed to be a the data object (Series, DataFrame).
nullable (bool) – Whether or not column can contain null values.
unique (bool) – Whether or not column can contain duplicate values.
report_duplicates (Union[Literal[‘exclude_first’], Literal[‘exclude_last’], Literal[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurrence - exclude_last: report all duplicates except last occurrence - all: (default) report all duplicates
coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.
name (Any) – column name in dataframe to validate.
title (UnionType[str, None]) – A human-readable label for the series.
description (UnionType[str, None]) – An arbitrary textual description of the series.
metadata (UnionType[dict, None]) – An optional key-value data.
default (UnionType[Any, None]) – The default value for missing values in the series.
drop_invalid_rows (bool) – if True, drop invalid rows on validation.

coerce_dtype(check_obj)[source]¶

Coerce type of the data by type specified in dtype.

Parameters:: check_obj (~TDataObject) – data to coerce
Return type:: ~TDataObject
Returns:: data of the same type as the input

set_checks(checks)[source]¶

Create a new SeriesSchema with a new set of Checks

Caution

This method will be deprecated in favor of update_checks in v0.15.0

Parameters:: checks (Union[Check, list[Union[Check, Hypothesis]]]) – checks to set on the new schema
Returns:: a new SeriesSchema with a new set of checks

update_checks(checks)[source]¶

Create a new SeriesSchema with a new set of Checks

Parameters:: checks (Union[Check, list[Union[Check, Hypothesis]]]) – checks to set on the new schema
Returns:: a new SeriesSchema with a new set of checks

validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]¶

Validate a series or specific column in dataframe.

Check_obj:

data object to validate.

Parameters:

head (UnionType[int, None]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail (UnionType[int, None]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample (UnionType[int, None]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state (UnionType[int, None]) – random seed for the sample argument.
lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns:

validated DataFrame or Series.

__call__(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]¶

Alias for validate method.

Return type:: ~TDataObject