pandera.api.pandas.container.DataFrameSchema.select_columns#

DataFrameSchema.select_columns(columns)[source]#

Select subset of columns in the schema.

New in version 0.4.5

Parameters: columns (List[Any]) – list of column names to select.
Return type: ForwardRef
Returns: DataFrameSchema (copy of original) with only the selected columns.
Raises: SchemaInitError if column not in the schema.
Example

To subset a schema by column, and return a new schema:

>>> import pandera as pa
>>>
>>> example_schema = pa.DataFrameSchema({
...     "category" : pa.Column(str),
...     "probability": pa.Column(float)
... })
>>>
>>> print(example_schema.select_columns(['category']))
<Schema DataFrameSchema(
    columns={
        'category': <Schema Column(name=category, type=DataType(str))>
    },
    checks=[],
    coerce=False,
    dtype=None,
    index=None,
    strict=False
    name=None,
    ordered=False,
    unique_column_names=False
)>

Note

If an index is present in the schema, it will also be included in the new schema.