Supported DataFrame Libraries

Pandera started out as a pandas-specific dataframe validation library, and moving forward its core functionality will continue to support pandas. However, pandera’s adoption has resulted in the realization that it can be a much more powerful tool by supporting other dataframe-like formats.

Domain-specific Data Validation

The pandas ecosystem provides support for domain-specific data manipulation, and by extension pandera can provide access to data types, methods, and data container types specific to these libraries.


An extension of pandas that adds geospatial data processing capabilities.

Accelerated Data Validation

Pandera provides multiple ways of scaling up data validation to dataframes that don’t fit into memory. Fortunately, pandera doesn’t have to re-invent the wheel. Standing on shoulders of giants, it integrates with the existing ecosystem of libraries that allow you to perform validations on out-of-memory dataframes.


Apply pandera schemas to Dask dataframe partitions.


Apply pandera schemas to distributed dataframe partitions with Fugue.


A pandas drop-in replacement, distributed using a Ray or Dask backend.


Validate Polars dataframes, the blazingly fast dataframe library

Pyspark Pandas

Exposes a pyspark.pandas module, distributed using a Spark backend.

Pyspark SQL

A data processing library for large-scale data.


Don’t see a library that you want supported? Check out the github issues to see if that library is in the roadmap. If it isn’t, open up a new issue to add support for it!