pandera.schema_components.Column#

class pandera.schema_components.Column(dtype=None, checks=None, nullable=False, unique=False, report_duplicates='all', coerce=False, required=True, name=None, regex=False, title=None, description=None)[source]#

Validate types and properties of DataFrame columns.

Create column validator object.

Parameters
  • dtype (Union[str, type, DataType, Type, ExtensionDtype, dtype, None]) – datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes

  • checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – checks to verify validity of the column

  • nullable (bool) – Whether or not column can contain null values.

  • unique (bool) – whether column values should be unique.

  • report_duplicates (Union[Literal[‘exclude_first’], Literal[‘exclude_last’], Literal[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicates

  • coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.

  • required (bool) – Whether or not column is allowed to be missing

  • name (Union[str, Tuple[str, …], None]) – column name in dataframe to validate.

  • regex (bool) – whether the name attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.

  • title (Optional[str]) – A human-readable label for the column.

  • description (Optional[str]) – An arbitrary textual description of the column.

Raises

SchemaInitError – if impossible to build schema from parameters

Example

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "column": pa.Column(str)
... })
>>>
>>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]}))
  column
0    foo
1    bar

See here for more usage details.

Attributes

checks

Return list of checks or hypotheses.

coerce

Whether to coerce series to specified type.

description

An arbitrary textual description of the series.

dtype

Get the pandas dtype

name

Get SeriesSchema name.

nullable

Whether the series is nullable.

properties

Get column properties.

regex

True if name attribute should be treated as a regex pattern.

title

A human-readable label for the series.

unique

Whether to check for duplicates in check object

Methods

__init__

Create column validator object.

coerce_dtype

Coerce dtype of a column, handling duplicate column names.

example

Generate an example of a particular size.

get_regex_columns

Get matching column names based on regex column name pattern.

set_name

Used to set or modify the name of a column object.

strategy

Create a hypothesis strategy for generating a Column.

strategy_component

Generate column data object for use by DataFrame strategy.

validate

Validate a Column in a DataFrame object.

__call__

Alias for validate method.