pandera.api.polars.components.ColumnΒΆ
- class pandera.api.polars.components.Column(dtype=None, checks=None, nullable=False, unique=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False, **column_kwargs)[source]ΒΆ
Polars column schema component.
Create column validator object.
- Parameters:
dtype (
Union
[str
,type
,DataTypeClass
]) β datatype of the column. The datatype for type-checking a dataframe. All polars datatypes, supported built-in python types that are supported by polars, and the pandera polars engine datatypes.checks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) β checks to verify validity of the columnnullable (
bool
) β Whether or not column can contain null values.unique (
bool
) β whether column values should be uniquecoerce (
bool
) β If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) β Whether or not column is allowed to be missingname (
Optional
[str
,None
]) β column name in dataframe to validate. Names in the format β^{regex_pattern}$β are treated as regular expressions. During validation, this schema will be applied to any columns matching this pattern.regex (
bool
) β whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe. If the name is a regular expression, this attribute will automatically be set to True.title (
Optional
[str
,None
]) β A human-readable label for the column.description (
Optional
[str
,None
]) β An arbitrary textual description of the column.default (
Optional
[Any
,None
]) β The default value for missing values in the column.metadata (
Optional
[dict
,None
]) β An optional key value data.drop_invalid_rows (
bool
) β if True, drop invalid rows on validation.
- Raises:
SchemaInitError β if impossible to build schema from parameters
- Example:
>>> import pandas as pd >>> import pandera as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> >>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]})) column 0 foo 1 bar
See here for more usage details.
Attributes
BACKEND_REGISTRY
dtype
properties
Get column properties.
selector
Methods
- __init__(dtype=None, checks=None, nullable=False, unique=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False, **column_kwargs)[source]ΒΆ
Create column validator object.
- Parameters:
dtype (
Union
[str
,type
,DataTypeClass
]) β datatype of the column. The datatype for type-checking a dataframe. All polars datatypes, supported built-in python types that are supported by polars, and the pandera polars engine datatypes.checks (
Union
[Check
,List
[Union
[Check
,Hypothesis
]],None
]) β checks to verify validity of the columnnullable (
bool
) β Whether or not column can contain null values.unique (
bool
) β whether column values should be uniquecoerce (
bool
) β If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) β Whether or not column is allowed to be missingname (
Optional
[str
,None
]) β column name in dataframe to validate. Names in the format β^{regex_pattern}$β are treated as regular expressions. During validation, this schema will be applied to any columns matching this pattern.regex (
bool
) β whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe. If the name is a regular expression, this attribute will automatically be set to True.title (
Optional
[str
,None
]) β A human-readable label for the column.description (
Optional
[str
,None
]) β An arbitrary textual description of the column.default (
Optional
[Any
,None
]) β The default value for missing values in the column.metadata (
Optional
[dict
,None
]) β An optional key value data.drop_invalid_rows (
bool
) β if True, drop invalid rows on validation.
- Raises:
SchemaInitError β if impossible to build schema from parameters
- Example:
>>> import pandas as pd >>> import pandera as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> >>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]})) column 0 foo 1 bar
See here for more usage details.
- example(size=None)[source]ΒΆ
Generate an example of a particular size.
- Parameters:
size β number of elements in the generated Index.
- Returns:
pandas DataFrame object.
Warning
This method is not implemented in the polars backend.
- static register_default_backends(check_obj_cls)[source]ΒΆ
Register default backends.
This method is invoked in the get_backend method so that the appropriate validation backend is loaded at validation time instead of schema-definition time.
This method needs to be implemented by the schema subclass.
- set_name(name)[source]ΒΆ
Set the name of the schema.
If the name is a regex starting with β^β and ending with β$β set the regex attribute to True.
- strategy(*, size=None)[source]ΒΆ
Create a
hypothesis
strategy for generating a Column.- Parameters:
size β number of elements to generate
- Returns:
a dataframe strategy for a single column.
Warning
This method is not implemented in the polars backend.
- strategy_component()[source]ΒΆ
Generate column data object for use by DataFrame strategy.
Warning
This method is not implemented in the polars backend.
- validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ
Validate a Column in a DataFrame object.
- Parameters:
check_obj (
Union
[LazyFrame
,DataFrame
]) β polars LazyFrame to validate.head (
Optional
[int
,None
]) β validate the first n rows. Rows overlapping with tail or sample are de-duplicated.tail (
Optional
[int
,None
]) β validate the last n rows. Rows overlapping with head or sample are de-duplicated.sample (
Optional
[int
,None
]) β validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.random_state (
Optional
[int
,None
]) β random seed for thesample
argument.lazy (
bool
) β if True, lazily evaluates dataframe against all validation checks and raises aSchemaErrors
. Otherwise, raiseSchemaError
as soon as one occurs.inplace (
bool
) β if True, applies coercion to the object of validation, otherwise creates a copy of the data.
- Return type:
Union
[LazyFrame
,DataFrame
]- Returns:
validated DataFrame.