pandera.api.polars.components.ColumnΒΆ

class pandera.api.polars.components.Column(dtype=None, checks=None, nullable=False, unique=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False, **column_kwargs)[source]ΒΆ

Polars column schema component.

Create column validator object.

Parameters:
  • dtype (Union[str, type, DataTypeClass]) – datatype of the column. The datatype for type-checking a dataframe. All polars datatypes, supported built-in python types that are supported by polars, and the pandera polars engine datatypes.

  • checks (Union[Check, List[Union[Check, Hypothesis]], None]) – checks to verify validity of the column

  • nullable (bool) – Whether or not column can contain null values.

  • unique (bool) – whether column values should be unique

  • coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.

  • required (bool) – Whether or not column is allowed to be missing

  • name (Optional[str, None]) – column name in dataframe to validate. Names in the format β€˜^{regex_pattern}$’ are treated as regular expressions. During validation, this schema will be applied to any columns matching this pattern.

  • regex (bool) – whether the name attribute should be treated as a regex pattern to apply to multiple columns in a dataframe. If the name is a regular expression, this attribute will automatically be set to True.

  • title (Optional[str, None]) – A human-readable label for the column.

  • description (Optional[str, None]) – An arbitrary textual description of the column.

  • default (Optional[Any, None]) – The default value for missing values in the column.

  • metadata (Optional[dict, None]) – An optional key value data.

  • drop_invalid_rows (bool) – if True, drop invalid rows on validation.

Raises:

SchemaInitError – if impossible to build schema from parameters

Example:

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "column": pa.Column(str)
... })
>>>
>>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]}))
  column
0    foo
1    bar

See here for more usage details.

Attributes

BACKEND_REGISTRY

dtype

properties

Get column properties.

selector

Methods

__init__(dtype=None, checks=None, nullable=False, unique=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, default=None, metadata=None, drop_invalid_rows=False, **column_kwargs)[source]ΒΆ

Create column validator object.

Parameters:
  • dtype (Union[str, type, DataTypeClass]) – datatype of the column. The datatype for type-checking a dataframe. All polars datatypes, supported built-in python types that are supported by polars, and the pandera polars engine datatypes.

  • checks (Union[Check, List[Union[Check, Hypothesis]], None]) – checks to verify validity of the column

  • nullable (bool) – Whether or not column can contain null values.

  • unique (bool) – whether column values should be unique

  • coerce (bool) – If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns where dtype=None.

  • required (bool) – Whether or not column is allowed to be missing

  • name (Optional[str, None]) – column name in dataframe to validate. Names in the format β€˜^{regex_pattern}$’ are treated as regular expressions. During validation, this schema will be applied to any columns matching this pattern.

  • regex (bool) – whether the name attribute should be treated as a regex pattern to apply to multiple columns in a dataframe. If the name is a regular expression, this attribute will automatically be set to True.

  • title (Optional[str, None]) – A human-readable label for the column.

  • description (Optional[str, None]) – An arbitrary textual description of the column.

  • default (Optional[Any, None]) – The default value for missing values in the column.

  • metadata (Optional[dict, None]) – An optional key value data.

  • drop_invalid_rows (bool) – if True, drop invalid rows on validation.

Raises:

SchemaInitError – if impossible to build schema from parameters

Example:

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "column": pa.Column(str)
... })
>>>
>>> schema.validate(pd.DataFrame({"column": ["foo", "bar"]}))
  column
0    foo
1    bar

See here for more usage details.

example(size=None)[source]ΒΆ

Generate an example of a particular size.

Parameters:

size – number of elements in the generated Index.

Returns:

pandas DataFrame object.

Warning

This method is not implemented in the polars backend.

static register_default_backends(check_obj_cls)[source]ΒΆ

Register default backends.

This method is invoked in the get_backend method so that the appropriate validation backend is loaded at validation time instead of schema-definition time.

This method needs to be implemented by the schema subclass.

set_name(name)[source]ΒΆ

Set the name of the schema.

If the name is a regex starting with β€˜^’ and ending with β€˜$’ set the regex attribute to True.

set_regex()[source]ΒΆ
strategy(*, size=None)[source]ΒΆ

Create a hypothesis strategy for generating a Column.

Parameters:

size – number of elements to generate

Returns:

a dataframe strategy for a single column.

Warning

This method is not implemented in the polars backend.

strategy_component()[source]ΒΆ

Generate column data object for use by DataFrame strategy.

Warning

This method is not implemented in the polars backend.

validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ

Validate a Column in a DataFrame object.

Parameters:
  • check_obj (Union[LazyFrame, DataFrame]) – polars LazyFrame to validate.

  • head (Optional[int, None]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int, None]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int, None]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int, None]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

Union[LazyFrame, DataFrame]

Returns:

validated DataFrame.

__call__(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ

Alias for validate method.

Return type:

~TDataObject