pandera.schema_components.Column

class pandera.schema_components.Column(dtype=None, checks=None, nullable=False, unique=False, allow_duplicates=None, coerce=False, required=True, name=None, regex=False, pandas_dtype=None)[source]

Validate types and properties of DataFrame columns.

Create column validator object.

Parameters
  • dtype (Union[str, type, DataType, ExtensionDtype, dtype, None]) – datatype of the column. A PandasDtype for type-checking dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes

  • checks (Union[Check, Hypothesis, List[Union[Check, Hypothesis]], None]) – checks to verify validity of the column

  • nullable (bool) – Whether or not column can contain null values.

  • unique (bool) – whether column values should be unique

  • allow_duplicates (Optional[bool]) –

    Whether or not column can contain duplicate values.

    Warning

    This option will be deprecated in 0.8.0. Use the unique argument instead.

  • coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where pandas_dtype=None.

  • required (bool) – Whether or not column is allowed to be missing

  • name (Union[str, Tuple[str, …], None]) – column name in dataframe to validate.

  • regex (bool) – whether the name attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.

  • pandas_dtype (Union[str, type, DataType, ExtensionDtype, dtype, None]) –

    alias of dtype for backwards compatibility.

    Warning

    This option will be deprecated in 0.8.0

Raises

SchemaInitError – if impossible to build schema from parameters

Example

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "column": pa.Column(str)
... })
>>>
>>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]}))
  column
0    foo
1    bar

See here for more usage details.

Attributes

allow_duplicates

Whether to allow duplicate values.

checks

Return list of checks or hypotheses.

coerce

Whether to coerce series to specified type.

dtype

Get the pandas dtype

name

Get SeriesSchema name.

nullable

Whether the series is nullable.

properties

Get column properties.

regex

True if name attribute should be treated as a regex pattern.

unique

Whether to check for duplicates in check object

Methods

__init__

Create column validator object.

coerce_dtype

Coerce dtype of a column, handling duplicate column names.

example

Generate an example of a particular size.

get_regex_columns

Get matching column names based on regex column name pattern.

set_name

Used to set or modify the name of a column object.

strategy

Create a hypothesis strategy for generating a Column.

strategy_component

Generate column data object for use by DataFrame strategy.

validate

Validate a Column in a DataFrame object.

__call__

Alias for validate method.