pandera.schema_components.Column#
- class pandera.schema_components.Column(dtype=None, checks=None, nullable=False, unique=False, report_duplicates='all', coerce=False, required=True, name=None, regex=False, title=None, description=None)[source]#
Validate types and properties of DataFrame columns.
Create column validator object.
- Parameters
dtype (
Union
[str
,type
,DataType
,Type
,ExtensionDtype
,dtype
,None
]) – datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypeschecks (
Union
[Check
,Hypothesis
,List
[Union
[Check
,Hypothesis
]],None
]) – checks to verify validity of the columnnullable (
bool
) – Whether or not column can contain null values.unique (
bool
) – whether column values should be unique.report_duplicates (
Union
[Literal
[‘exclude_first’],Literal
[‘exclude_last’],Literal
[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicatescoerce (
bool
) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) – Whether or not column is allowed to be missingname (
Union
[str
,Tuple
[str
, …],None
]) – column name in dataframe to validate.regex (
bool
) – whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.title (
Optional
[str
]) – A human-readable label for the column.description (
Optional
[str
]) – An arbitrary textual description of the column.
- Raises
SchemaInitError – if impossible to build schema from parameters
- Example
>>> import pandas as pd >>> import pandera as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> >>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]})) column 0 foo 1 bar
See here for more usage details.
Attributes
checks
Return list of checks or hypotheses.
coerce
Whether to coerce series to specified type.
description
An arbitrary textual description of the series.
dtype
Get the pandas dtype
name
Get SeriesSchema name.
nullable
Whether the series is nullable.
properties
Get column properties.
regex
True if
name
attribute should be treated as a regex pattern.title
A human-readable label for the series.
unique
Whether to check for duplicates in check object
Methods
Create column validator object.
Coerce dtype of a column, handling duplicate column names.
Generate an example of a particular size.
Get matching column names based on regex column name pattern.
Used to set or modify the name of a column object.
Create a
hypothesis
strategy for generating a Column.Generate column data object for use by DataFrame strategy.
Validate a Column in a DataFrame object.
Alias for
validate
method.