pandera.schemas.DataFrameSchema.set_index

DataFrameSchema.set_index(keys, drop=True, append=False)[source]

A method for setting the Index of a DataFrameSchema, via an existing Column or list of columns.

Parameters
  • keys (List[str]) – list of labels

  • drop (bool) – bool, default True

  • append (bool) – bool, default False

Return type

DataFrameSchema

Returns

a new DataFrameSchema with specified column(s) in the index.

Raises

SchemaInitError if column not in the schema.

Examples

Just as you would set the index in a pandas DataFrame from an existing column, you can set an index within the schema from an existing column in the schema.

>>> import pandera as pa
>>>
>>> example_schema = pa.DataFrameSchema({
...     "category" : pa.Column(pa.String),
...     "probability": pa.Column(pa.Float)})
>>>
>>> print(example_schema.set_index(['category']))
DataFrameSchema(
    columns={
        "probability": "<Schema Column: 'probability' type=float>"
    },
    checks=[],
    index=<Schema Index: 'category'>,
    coerce=False,
    strict=False
)

If you have an existing index in your schema, and you would like to append a new column as an index to it (yielding a Multiindex), just use set_index as you would in pandas.

>>> example_schema = pa.DataFrameSchema(
...     {
...         "column1": pa.Column(pa.String),
...         "column2": pa.Column(pa.Int)
...     },
...     index=pa.Index(name = "column3", pandas_dtype = pa.Int)
... )
>>>
>>> print(example_schema.set_index(["column2"], append = True))
DataFrameSchema(
    columns={
        "column1": "<Schema Column: 'column1' type=str>"
    },
    checks=[],
    index=MultiIndex(
    columns={
        "column3": "<Schema Column: 'column3' type=int>",
        "column2": "<Schema Column: 'column2' type=int>"
    },
    checks=[],
    index=None,
    coerce=False,
    strict=False
),
    coerce=False,
    strict=False
)

See also

reset_index()