pandera.PandasDtype

class pandera.PandasDtype(value)[source]

Bases: enum.Enum

Enumerate all valid pandas data types.

pandera follows the numpy data types subscribed to by pandas and by default supports using the numpy data type string aliases to validate DataFrame or Series dtypes.

This class simply enumerates the valid numpy dtypes for pandas arrays. For convenience PandasDtype enums can all be accessed in the top-level pandera name space via the same enum name.

Examples

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> pa.SeriesSchema(pa.Int).validate(pd.Series([1, 2, 3]))
0    1
1    2
2    3
dtype: int64
>>> pa.SeriesSchema(pa.Float).validate(pd.Series([1.1, 2.3, 3.4]))
0    1.1
1    2.3
2    3.4
dtype: float64
>>> pa.SeriesSchema(pa.String).validate(pd.Series(["a", "b", "c"]))
    0    a
1    b
2    c
dtype: object

Alternatively, you can use built-in python scalar types for integers, floats, booleans, and strings:

>>> pa.SeriesSchema(int).validate(pd.Series([1, 2, 3]))
0    1
1    2
2    3
dtype: int64

You can also use the pandas string aliases in the schema definition:

>>> pa.SeriesSchema("int").validate(pd.Series([1, 2, 3]))
0    1
1    2
2    3
dtype: int64

Note

pandera also offers limited support for pandas extension types, however since the release of pandas 1.0.0 there are backwards incompatible extension types like the Integer array. The extension types, e.g. pd.IntDtype64() and their string alias should work when supplied to the pandas_dtype argument, unless otherwise specified below, but this functionality is only tested for pandas >= 1.0.0. Extension types in earlier versions are not guaranteed to work as the pandas_dtype argument in schemas or schema components.

Attributes

Bool

"bool" numpy dtype

Category

pandas "categorical" datatype

DateTime

"datetime64[ns]" numpy dtype

Float

"float" numpy dtype

Float16

"float16" numpy dtype

Float32

"float32" numpy dtype

Float64

"float64" numpy dtype

INT16

"Int16" pandas dtype: pandas 0.24.0+

INT32

"Int32" pandas dtype: pandas 0.24.0+

INT64

"Int64" pandas dtype: pandas 0.24.0+

INT8

"Int8" pandas dtype:: pandas 0.24.0+

Int

"int" numpy dtype

Int16

"int16" numpy dtype

Int32

"int32" numpy dtype

Int64

"int64" numpy dtype

Int8

"int8" numpy dtype

Object

"object" numpy dtype

String

The string datatype doesn’t map to the first-class pandas datatype and is representated as a numpy "object" array.

Timedelta

"timedelta64[ns]" numpy dtype

UINT16

"UInt16" pandas dtype: pandas 0.24.0+

UINT32

"UInt32" pandas dtype: pandas 0.24.0+

UINT64

"UInt64" pandas dtype: pandas 0.24.0+

UINT8

"UInt8" pandas dtype:: pandas 0.24.0+

UInt16

"uint16" numpy dtype

UInt32

"uint32" numpy dtype

UInt64

"uint64" numpy dtype

UInt8

"uint8" numpy dtype

str_alias

Get datatype string alias.

classmethod from_str_alias(str_alias)[source]

Get PandasDtype from string alias.

Param

pandas dtype string alias from https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes

Return type

PandasDtype

Returns

pandas dtype

classmethod from_pandas_api_type(pandas_api_type)[source]

Get PandasDtype enum from pandas api type.

Parameters

pandas_api_type (str) – string output from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.infer_dtype.html

Return type

PandasDtype

Returns

pandas dtype