pandera.api.pandas.container.DataFrameSchema.set_index#

DataFrameSchema.set_index(keys, drop=True, append=False)[source]#

A method for setting the Index of a DataFrameSchema, via an existing Column or list of columns.

Parameters
  • keys (List[str]) – list of labels

  • drop (bool) – bool, default True

  • append (bool) – bool, default False

Return type

ForwardRef

Returns

a new DataFrameSchema with specified column(s) in the index.

Raises

SchemaInitError if column not in the schema.

Examples

Just as you would set the index in a pandas DataFrame from an existing column, you can set an index within the schema from an existing column in the schema.

>>> import pandera as pa
>>>
>>> example_schema = pa.DataFrameSchema({
...     "category" : pa.Column(str),
...     "probability": pa.Column(float)})
>>>
>>> print(example_schema.set_index(['category']))
<Schema DataFrameSchema(
    columns={
        'probability': <Schema Column(name=probability, type=DataType(float64))>
    },
    checks=[],
    coerce=False,
    dtype=None,
    index=<Schema Index(name=category, type=DataType(str))>,
    strict=False
    name=None,
    ordered=False,
    unique_column_names=False
)>

If you have an existing index in your schema, and you would like to append a new column as an index to it (yielding a Multiindex), just use set_index as you would in pandas.

>>> example_schema = pa.DataFrameSchema(
...     {
...         "column1": pa.Column(str),
...         "column2": pa.Column(int)
...     },
...     index=pa.Index(name = "column3", dtype = int)
... )
>>>
>>> print(example_schema.set_index(["column2"], append = True))
<Schema DataFrameSchema(
    columns={
        'column1': <Schema Column(name=column1, type=DataType(str))>
    },
    checks=[],
    coerce=False,
    dtype=None,
    index=<Schema MultiIndex(
        indexes=[
            <Schema Index(name=column3, type=DataType(int64))>
            <Schema Index(name=column2, type=DataType(int64))>
        ]
        coerce=False,
        strict=False,
        name=None,
        ordered=True
    )>,
    strict=False
    name=None,
    ordered=False,
    unique_column_names=False
)>

See also

reset_index()