pandera.api.pandas.components.MultiIndexΒΆ

class pandera.api.pandas.components.MultiIndex(indexes, coerce=False, strict=False, name=None, ordered=True, unique=None)[source]ΒΆ

Validate types and properties of a pandas DataFrame MultiIndex.

This class inherits from DataFrameSchema to leverage its validation logic.

Create MultiIndex validator.

Parameters:
  • indexes (List[Index]) – list of Index validators for each level of the MultiIndex index.

  • coerce (bool) – Whether or not to coerce the MultiIndex to the specified dtypes before validation

  • strict (bool) – whether or not to accept columns in the MultiIndex that aren’t defined in the indexes argument.

  • name (Optional[str]) – name of schema component

  • ordered (bool) – whether or not to validate the indexes order.

  • unique (Union[str, List[str], None]) – a list of index names that should be jointly unique.

Example:

>>> import pandas as pd
>>> import pandera.pandas as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"column": pa.Column(int)},
...     index=pa.MultiIndex([
...         pa.Index(str,
...               pa.Check(lambda s: s.isin(["foo", "bar"])),
...               name="index0"),
...         pa.Index(int, name="index1"),
...     ])
... )
>>>
>>> df = pd.DataFrame(
...     data={"column": [1, 2, 3]},
...     index=pd.MultiIndex.from_arrays(
...         [["foo", "bar", "foo"], [0, 1, 2]],
...         names=["index0", "index1"],
...     )
... )
>>>
>>> schema.validate(df)
               column
index0 index1
foo    0            1
bar    1            2
foo    2            3

See here for more usage details.

Attributes

BACKEND_REGISTRY

coerce

Whether or not to coerce data types.

dtype

Get the dtype property.

dtypes

A dict where the keys are column names and values are DataType s for the column.

named_indexes

Get named indexes.

names

Get index names in the MultiIndex schema component.

properties

Get the properties of the schema for serialization purposes.

unique

List of columns that should be jointly unique.

Methods

__init__(indexes, coerce=False, strict=False, name=None, ordered=True, unique=None)[source]ΒΆ

Create MultiIndex validator.

Parameters:
  • indexes (List[Index]) – list of Index validators for each level of the MultiIndex index.

  • coerce (bool) – Whether or not to coerce the MultiIndex to the specified dtypes before validation

  • strict (bool) – whether or not to accept columns in the MultiIndex that aren’t defined in the indexes argument.

  • name (Optional[str]) – name of schema component

  • ordered (bool) – whether or not to validate the indexes order.

  • unique (Union[str, List[str], None]) – a list of index names that should be jointly unique.

Example:

>>> import pandas as pd
>>> import pandera.pandas as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"column": pa.Column(int)},
...     index=pa.MultiIndex([
...         pa.Index(str,
...               pa.Check(lambda s: s.isin(["foo", "bar"])),
...               name="index0"),
...         pa.Index(int, name="index1"),
...     ])
... )
>>>
>>> df = pd.DataFrame(
...     data={"column": [1, 2, 3]},
...     index=pd.MultiIndex.from_arrays(
...         [["foo", "bar", "foo"], [0, 1, 2]],
...         names=["index0", "index1"],
...     )
... )
>>>
>>> schema.validate(df)
               column
index0 index1
foo    0            1
bar    1            2
foo    2            3

See here for more usage details.

example(size=None)[source]ΒΆ

Generate an example of a particular size.

Parameters:

size – number of elements in the generated DataFrame.

Return type:

MultiIndex

Returns:

pandas DataFrame object.

strategy(*, size=None)[source]ΒΆ

Create a hypothesis strategy for generating a DataFrame.

Parameters:
  • size – number of elements to generate

  • n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates pandas DataFrame objects.

__call__(dataframe, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]ΒΆ

Alias for DataFrameSchema.validate() method.

Parameters:
  • dataframe (pd.DataFrame) – the dataframe to be validated.

  • head (int) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (int) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

~TDataObject