pandera.api.pandas.components.Column¶
- class pandera.api.pandas.components.Column(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶
Validate types and properties of pandas DataFrame columns.
Create column validator object.
- Parameters:
dtype (
Union
[str
,type
,DataType
,Type
,ExtensionDtype
,dtype
]) – datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypeschecks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) – checks to verify validity of the columnparsers (
Union
[Parser
,List
[Parser
],None
]) – parsers to verify validity of the columnnullable (
bool
) – Whether or not column can contain null values.unique (
bool
) – whether column values should be uniquereport_duplicates (
Union
[Literal
[‘exclude_first’],Literal
[‘exclude_last’],Literal
[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicatescoerce (
bool
) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) – Whether or not column is allowed to be missingname (
Union
[str
,Tuple
[str
, …],None
]) – column name in dataframe to validate.regex (
bool
) – whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.title (
Optional
[str
,None
]) – A human-readable label for the column.description (
Optional
[str
,None
]) – An arbitrary textual description of the column.default (
Optional
[Any
,None
]) – The default value for missing values in the column.metadata (
Optional
[dict
,None
]) – An optional key value data.drop_invalid_rows (
bool
) – if True, drop invalid rows on validation.
- Raises:
SchemaInitError – if impossible to build schema from parameters
- Example:
>>> import pandas as pd >>> import pandera as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> >>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]})) column 0 foo 1 bar
See here for more usage details.
Attributes
BACKEND_REGISTRY
dtype
Get the pandas dtype
properties
Get column properties.
Methods
- __init__(dtype=None, checks=None, parsers=None, nullable=False, unique=False, report_duplicates='all', coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False)[source]¶
Create column validator object.
- Parameters:
dtype (
Union
[str
,type
,DataType
,Type
,ExtensionDtype
,dtype
]) – datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pandas string values: http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypeschecks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) – checks to verify validity of the columnparsers (
Union
[Parser
,List
[Parser
],None
]) – parsers to verify validity of the columnnullable (
bool
) – Whether or not column can contain null values.unique (
bool
) – whether column values should be uniquereport_duplicates (
Union
[Literal
[‘exclude_first’],Literal
[‘exclude_last’],Literal
[‘all’]]) – how to report unique errors - exclude_first: report all duplicates except first occurence - exclude_last: report all duplicates except last occurence - all: (default) report all duplicatescoerce (
bool
) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) – Whether or not column is allowed to be missingname (
Union
[str
,Tuple
[str
, …],None
]) – column name in dataframe to validate.regex (
bool
) – whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.title (
Optional
[str
,None
]) – A human-readable label for the column.description (
Optional
[str
,None
]) – An arbitrary textual description of the column.default (
Optional
[Any
,None
]) – The default value for missing values in the column.metadata (
Optional
[dict
,None
]) – An optional key value data.drop_invalid_rows (
bool
) – if True, drop invalid rows on validation.
- Raises:
SchemaInitError – if impossible to build schema from parameters
- Example:
>>> import pandas as pd >>> import pandera as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> >>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]})) column 0 foo 1 bar
See here for more usage details.
- example(size=None)[source]¶
Generate an example of a particular size.
- Parameters:
size – number of elements in the generated Index.
- Return type:
- Returns:
pandas DataFrame object.
- get_regex_columns(check_obj)[source]¶
Get matching column names based on regex column name pattern.
- Parameters:
columns – columns to regex pattern match
- Return type:
- Returns:
matchin columns
- set_name(name)[source]¶
Used to set or modify the name of a column object.
- Parameters:
name (str) – the name of the column object