Supported DataFrame Libraries (New)

Pandera started out as a pandas-specific dataframe validation library, and moving forward its core functionality will continue to support pandas. However, pandera’s adoption has resulted in the realization that it can be a much more powerful tool by supporting other dataframe-like formats.

Scaling Up Data Validation

Pandera provides multiple ways of scaling up data validation to dataframes that don’t fit into memory. Fortunately, pandera doesn’t have to re-invent the wheel. Standing on shoulders of giants, it integrates with the existing ecosystem of libraries that allow you to perform validations on out-of-memory dataframes.

Dask

Apply pandera schemas to Dask dataframe partitions.

Fugue

Apply pandera schemas to distributed dataframe partitions with Fugue.

Koalas

A pandas drop-in replacement, distributed using a Spark backend.

Modin

A pandas drop-in replacement, distributed using a Ray or Dask backend.

Note

Don’t see a library that you want supported? Check out the github issues to see if that library is in the roadmap. If it isn’t, open up a new issue to add support for it!