pandera.api.pandas.components.MultiIndex¶

class pandera.api.pandas.components.MultiIndex(indexes, coerce=False, strict=False, name=None, ordered=True, unique=None)[source]¶

Validate types and properties of a pandas DataFrame MultiIndex.

This class inherits from DataFrameSchema to leverage its validation logic.

Create MultiIndex validator.

Parameters:

indexes (List[Index]) – list of Index validators for each level of the MultiIndex index.
coerce (bool) – Whether or not to coerce the MultiIndex to the specified dtypes before validation
strict (bool) – whether or not to accept columns in the MultiIndex that aren’t defined in the indexes argument.
name (Optional[str]) – name of schema component
ordered (bool) – whether or not to validate the indexes order.
unique (Union[str, List[str], None]) – a list of index names that should be jointly unique.

Example:

>>> import pandas as pd
>>> import pandera.pandas as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"column": pa.Column(int)},
...     index=pa.MultiIndex([
...         pa.Index(str,
...               pa.Check(lambda s: s.isin(["foo", "bar"])),
...               name="index0"),
...         pa.Index(int, name="index1"),
...     ])
... )
>>>
>>> df = pd.DataFrame(
...     data={"column": [1, 2, 3]},
...     index=pd.MultiIndex.from_arrays(
...         [["foo", "bar", "foo"], [0, 1, 2]],
...         names=["index0", "index1"],
...     )
... )
>>>
>>> schema.validate(df)
               column
index0 index1
foo    0            1
bar    1            2
foo    2            3

See here for more usage details.

Attributes

`BACKEND_REGISTRY`
`coerce`	Whether or not to coerce data types.
`dtype`	Get the dtype property.
`dtypes`	A dict where the keys are column names and values are `DataType` s for the column.
`named_indexes`	Get named indexes.
`names`	Get index names in the MultiIndex schema component.
`properties`	Get the properties of the schema for serialization purposes.
`unique`	List of columns that should be jointly unique.

Methods

__init__(indexes, coerce=False, strict=False, name=None, ordered=True, unique=None)[source]¶

Create MultiIndex validator.

Parameters:

indexes (List[Index]) – list of Index validators for each level of the MultiIndex index.
coerce (bool) – Whether or not to coerce the MultiIndex to the specified dtypes before validation
strict (bool) – whether or not to accept columns in the MultiIndex that aren’t defined in the indexes argument.
name (Optional[str]) – name of schema component
ordered (bool) – whether or not to validate the indexes order.
unique (Union[str, List[str], None]) – a list of index names that should be jointly unique.

Example:

>>> import pandas as pd
>>> import pandera.pandas as pa
>>>
>>>
>>> schema = pa.DataFrameSchema(
...     columns={"column": pa.Column(int)},
...     index=pa.MultiIndex([
...         pa.Index(str,
...               pa.Check(lambda s: s.isin(["foo", "bar"])),
...               name="index0"),
...         pa.Index(int, name="index1"),
...     ])
... )
>>>
>>> df = pd.DataFrame(
...     data={"column": [1, 2, 3]},
...     index=pd.MultiIndex.from_arrays(
...         [["foo", "bar", "foo"], [0, 1, 2]],
...         names=["index0", "index1"],
...     )
... )
>>>
>>> schema.validate(df)
               column
index0 index1
foo    0            1
bar    1            2
foo    2            3

See here for more usage details.

example(size=None)[source]¶

Generate an example of a particular size.

Parameters:: size – number of elements in the generated DataFrame.
Return type:: MultiIndex
Returns:: pandas DataFrame object.

strategy(*, size=None)[source]¶

Create a hypothesis strategy for generating a DataFrame.

Parameters:

size – number of elements to generate
n_regex_columns – number of regex columns to generate.

Returns:

a strategy that generates pandas DataFrame objects.

__call__(dataframe, head=None, tail=None, sample=None, random_state=None, lazy=False, inplace=False)[source]¶

Alias for DataFrameSchema.validate() method.

Parameters:

dataframe (pd.DataFrame) – the dataframe to be validated.
head (int) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.
tail (int) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.
sample (Optional[int]) – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.
random_state (Optional[int]) – random seed for the sample argument.
lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Return type:

~TDataObject