pandera.schema_components.Column#

class pandera.schema_components.Column(dtype=None, checks=None, nullable=False, unique=False, report_duplicates='all', coerce=False, required=True, name=None, regex=False, title=None, description=None)[source]#

Validate types and properties of DataFrame columns.

Create column validator object.

Parameters

dtype (Union[str, type, DataType, Type, ExtensionDtype, dtype, None]) – datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes
checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – checks to verify validity of the column
nullable (bool) – Whether or not column can contain null values.
unique (bool) – whether column values should be unique.
report_duplicates (Union[Literal[‘exclude_first’], Literal[‘exclude_last’], Literal[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicates
coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.
required (bool) – Whether or not column is allowed to be missing
name (Union[str, Tuple[str, …], None]) – column name in dataframe to validate.
regex (bool) – whether the name attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.
title (Optional[str]) – A human-readable label for the column.
description (Optional[str]) – An arbitrary textual description of the column.

Raises

SchemaInitError – if impossible to build schema from parameters

Example

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "column": pa.Column(str)
... })
>>>
>>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]}))
  column
0    foo
1    bar

See here for more usage details.

Attributes

`checks`	Return list of checks or hypotheses.
`coerce`	Whether to coerce series to specified type.
`description`	An arbitrary textual description of the series.
`dtype`	Get the pandas dtype
`name`	Get SeriesSchema name.
`nullable`	Whether the series is nullable.
`properties`	Get column properties.
`regex`	True if `name` attribute should be treated as a regex pattern.
`title`	A human-readable label for the series.
`unique`	Whether to check for duplicates in check object

Methods

`__init__`	Create column validator object.
`coerce_dtype`	Coerce dtype of a column, handling duplicate column names.
`example`	Generate an example of a particular size.
`get_regex_columns`	Get matching column names based on regex column name pattern.
`set_name`	Used to set or modify the name of a column object.
`strategy`	Create a `hypothesis` strategy for generating a Column.
`strategy_component`	Generate column data object for use by DataFrame strategy.
`validate`	Validate a Column in a DataFrame object.
`__call__`	Alias for `validate` method.