pandera.api.dataframe.components.ComponentSchema¶
- class pandera.api.dataframe.components.ComponentSchema(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, name=None, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶
Base class for data container component, e.g. columns.
Initialize array schema.
- Parameters:
checks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) –If element_wise is True, then callable signature should be:
Callable[Any, bool]
where theAny
input is a scalar element in the column. Otherwise, the input is assumed to be a the data object (Series, DataFrame).nullable (
bool
) – Whether or not column can contain null values.unique (
bool
) – Whether or not column can contain duplicate values.report_duplicates (
Union
[Literal
[‘exclude_first’],Literal
[‘exclude_last’],Literal
[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicatescoerce (
bool
) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.name (
Any
) – column name in dataframe to validate.title (
Optional
[str
]) – A human-readable label for the series.description (
Optional
[str
]) – An arbitrary textual description of the series.default (
Optional
[Any
]) – The default value for missing values in the series.drop_invalid_rows (
bool
) – if True, drop invalid rows on validation.
Attributes
BACKEND_REGISTRY
properties
Get the properties of the schema for serialization purposes.
Methods
- __init__(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, name=None, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶
Initialize array schema.
- Parameters:
checks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) –If element_wise is True, then callable signature should be:
Callable[Any, bool]
where theAny
input is a scalar element in the column. Otherwise, the input is assumed to be a the data object (Series, DataFrame).nullable (
bool
) – Whether or not column can contain null values.unique (
bool
) – Whether or not column can contain duplicate values.report_duplicates (
Union
[Literal
[‘exclude_first’],Literal
[‘exclude_last’],Literal
[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicatescoerce (
bool
) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.name (
Any
) – column name in dataframe to validate.title (
Optional
[str
]) – A human-readable label for the series.description (
Optional
[str
]) – An arbitrary textual description of the series.default (
Optional
[Any
]) – The default value for missing values in the series.drop_invalid_rows (
bool
) – if True, drop invalid rows on validation.
- coerce_dtype(check_obj)[source]¶
Coerce type of the data by type specified in dtype.
- Parameters:
check_obj (~TDataObject) – data to coerce
- Return type:
~TDataObject
- Returns:
data of the same type as the input
- set_checks(checks)[source]¶
Create a new SeriesSchema with a new set of Checks
Caution
This method will be deprecated in favor of
update_checks
in v0.15.0
- validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]¶
Validate a series or specific column in dataframe.
- Check_obj:
data object to validate.
- Parameters:
head (
Optional
[int
]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.tail (
Optional
[int
]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.sample (
Optional
[int
]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.random_state (
Optional
[int
]) – random seed for thesample
argument.lazy (
bool
) – if True, lazily evaluates dataframe against all validation checks and raises aSchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace (
bool
) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Returns:
validated DataFrame or Series.