Pydantic¶
new in 0.8.0
Using Pandera Schemas in Pydantic Models¶
DataFrameModel
is fully compatible with
pydantic. You can specify
a DataFrameModel
in a pydantic BaseModel
as you would
any other field:
import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
import pydantic
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str] = pa.Field(unique=True)
class PydanticModel(pydantic.BaseModel):
x: int
df: DataFrame[SimpleSchema]
valid_df = pd.DataFrame({"str_col": ["hello", "world"]})
PydanticModel(x=1, df=valid_df)
invalid_df = pd.DataFrame({"str_col": ["hello", "hello"]})
PydanticModel(x=1, df=invalid_df)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[1], line 20
17 PydanticModel(x=1, df=valid_df)
19 invalid_df = pd.DataFrame({"str_col": ["hello", "hello"]})
---> 20 PydanticModel(x=1, df=invalid_df)
File ~/checkouts/readthedocs.org/user_builds/pandera/envs/latest/lib/python3.11/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
215 if self is not validated_self:
216 warnings.warn(
217 'A custom validator is returning a value other than `self`.\n'
218 "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
219 'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
220 stacklevel=2,
221 )
ValidationError: 1 validation error for PydanticModel
df
Value error, series 'str_col' contains duplicate values:
0 hello
1 hello
Name: str_col, dtype: object [type=value_error, input_value= str_col
0 hello
1 hello, input_type=DataFrame]
For further information visit https://errors.pydantic.dev/2.10/v/value_error
Other pandera components are also compatible with pydantic:
Note
The SeriesSchema
, DataFrameSchema
and schema_components
types
validates the type of a schema object, e.g. if your pydantic
BaseModel
contained a schema object, not a pandas
object.
Using Pydantic Models in Pandera Schemas¶
new in 0.10.0
You can also use a pydantic BaseModel
in a pandera schema. Suppose you had
a Record
model:
from pydantic import BaseModel
import pandera as pa
class Record(BaseModel):
name: str
xcoord: int
ycoord: int
The PydanticModel
datatype enables you to
specify the Record
model as a row-wise type.
import pandas as pd
from pandera.engines.pandas_engine import PydanticModel
class PydanticSchema(pa.DataFrameModel):
"""Pandera schema using the pydantic model."""
class Config:
"""Config with dataframe-level data type."""
dtype = PydanticModel(Record)
coerce = True # this is required, otherwise a SchemaInitError is raised
Note
By combining dtype=PydanticModel(...)
and coerce=True
, pandera will
apply the pydantic model validation process to each row of the dataframe,
converting the model back to a dictionary with the BaseModel.dict()
method.
The equivalent pandera schema would look like this:
class PanderaSchema(pa.DataFrameModel):
"""Pandera schema that's equivalent to PydanticSchema."""
name: pa.typing.Series[str]
xcoord: pa.typing.Series[int]
ycoord: pa.typing.Series[int]
Note
Since the PydanticModel
datatype
applies the BaseModel
constructor to each row of the dataframe, using
PydanticModel
might not scale well with larger datasets.
If you want to help benchmark, consider contributing a benchmark script