pandera.schemas.DataFrameSchema.set_index#
- DataFrameSchema.set_index(keys, drop=True, append=False)[source]#
A method for setting the
Index
of aDataFrameSchema
, via an existingColumn
or list of columns.- Parameters
- Return type
- Returns
a new
DataFrameSchema
with specified column(s) in the index.- Raises
SchemaInitError
if column not in the schema.- Examples
Just as you would set the index in a
pandas
DataFrame from an existing column, you can set an index within the schema from an existing column in the schema.>>> import pandera as pa >>> >>> example_schema = pa.DataFrameSchema({ ... "category" : pa.Column(str), ... "probability": pa.Column(float)}) >>> >>> print(example_schema.set_index(['category'])) <Schema DataFrameSchema( columns={ 'probability': <Schema Column(name=probability, type=DataType(float64))> }, checks=[], coerce=False, dtype=None, index=<Schema Index(name=category, type=DataType(str))>, strict=False name=None, ordered=False, unique_column_names=False )>
If you have an existing index in your schema, and you would like to append a new column as an index to it (yielding a
Multiindex
), just use set_index as you would in pandas.>>> example_schema = pa.DataFrameSchema( ... { ... "column1": pa.Column(str), ... "column2": pa.Column(int) ... }, ... index=pa.Index(name = "column3", dtype = int) ... ) >>> >>> print(example_schema.set_index(["column2"], append = True)) <Schema DataFrameSchema( columns={ 'column1': <Schema Column(name=column1, type=DataType(str))> }, checks=[], coerce=False, dtype=None, index=<Schema MultiIndex( indexes=[ <Schema Index(name=column3, type=DataType(int64))> <Schema Index(name=column2, type=DataType(int64))> ] coerce=False, strict=False, name=None, ordered=True )>, strict=False name=None, ordered=False, unique_column_names=False )>
See also