pandera.strategies#
Generate synthetic data from a schema definition.
new in 0.6.0
This module is responsible for generating data based on the type and check
constraints specified in a pandera
schema. It’s built on top of the
hypothesis package
to compose strategies given multiple checks specified in a schema.
See the user guide for more details.
- pandera.strategies.column_strategy(pandera_dtype, strategy=None, *, checks=None, unique=False, name=None)[source]#
Create a data object describing a column in a DataFrame.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.unique (
bool
) – whether or not generated Series contains unique values.
- Returns
a column object.
- pandera.strategies.dataframe_strategy(pandera_dtype=None, strategy=None, *, columns=None, checks=None, unique=None, index=None, size=None, n_regex_columns=1)[source]#
Strategy to generate a pandas DataFrame.
- Parameters
pandera_dtype (
Optional
[DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – if specified, this will raise a BaseStrategyOnlyError, since it cannot be chained to a prior strategy.columns (
Optional
[Dict
]) – a dictionary where keys are column names and values areColumn
objects.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data at the dataframe level.unique (
Optional
[List
[str
]]) – a list of column names that should be jointly unique.index (
Optional
[Any
]) – Index or MultiIndex schema component.n_regex_columns (
int
) – number of regex columns to generate.
- Returns
hypothesis
strategy.
- pandera.strategies.eq_strategy(pandera_dtype, strategy=None, *, value)[source]#
Strategy to generate a single value.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.value (
Any
) – value to generate.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.field_element_strategy(pandera_dtype, strategy=None, *, checks=None)[source]#
Strategy to generate elements of a column or index.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.ge_strategy(pandera_dtype, strategy=None, *, min_value)[source]#
Strategy to generate values greater than or equal to a minimum value.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
Union
[int
,float
]) – generate values greater than or equal to this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.gt_strategy(pandera_dtype, strategy=None, *, min_value)[source]#
Strategy to generate values greater than a minimum value.
- Parameters
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.in_range_strategy(pandera_dtype, strategy=None, *, min_value, max_value, include_min=True, include_max=True)[source]#
Strategy to generate values within a particular range.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
Union
[int
,float
]) – generate values greater than this.max_value (
Union
[int
,float
]) – generate values less than this.include_min (
bool
) – include min_value in generated data.include_max (
bool
) – include max_value in generated data.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.index_strategy(pandera_dtype, strategy=None, *, checks=None, nullable=False, unique=False, name=None, size=None)[source]#
Strategy to generate a pandas Index.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.nullable (
bool
) – whether or not generated Series contains null values.unique (
bool
) – whether or not generated Series contains unique values.
- Returns
hypothesis
strategy.
- pandera.strategies.isin_strategy(pandera_dtype, strategy=None, *, allowed_values)[source]#
Strategy to generate values within a finite set.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.le_strategy(pandera_dtype, strategy=None, *, max_value)[source]#
Strategy to generate values less than or equal to a maximum value.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.max_value (
Union
[int
,float
]) – generate values less than or equal to this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.lt_strategy(pandera_dtype, strategy=None, *, max_value)[source]#
Strategy to generate values less than a maximum value.
- Parameters
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.multiindex_strategy(pandera_dtype=None, strategy=None, *, indexes=None, size=None)[source]#
Strategy to generate a pandas MultiIndex object.
- Parameters
pandera_dtype (
Optional
[DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Returns
hypothesis
strategy.
- pandera.strategies.ne_strategy(pandera_dtype, strategy=None, *, value)[source]#
Strategy to generate anything except for a particular value.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.value (
Any
) – value to avoid.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.notin_strategy(pandera_dtype, strategy=None, *, forbidden_values)[source]#
Strategy to generate values excluding a set of forbidden values
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.numpy_complex_dtypes(dtype, min_value=0j, max_value=None, allow_infinity=None, allow_nan=None)[source]#
Create numpy strategy for complex numbers.
- pandera.strategies.numpy_time_dtypes(dtype, min_value=None, max_value=None)[source]#
Create numpy strategy for datetime and timedelta data types.
- Parameters
dtype (
Union
[dtype
,DatetimeTZDtype
]) – numpy datetime or timedelta datatypemin_value – minimum value of the datatype to create
max_value – maximum value of the datatype to create
- Returns
hypothesis
strategy
- pandera.strategies.pandas_dtype_strategy(pandera_dtype, strategy=None, **kwargs)[source]#
Strategy to generate data from a
pandera.dtypes.DataType
.- Parameters
pandera_dtype (
DataType
) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Kwargs
key-word arguments passed into hypothesis.extra.numpy.from_dtype . For datetime, timedelta, and complex number datatypes, these arguments are passed into
numpy_time_dtypes()
andnumpy_complex_dtypes()
.- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.register_check_strategy(strategy_fn)[source]#
Decorate a Check method with a strategy.
This should be applied to a built-in
Check
method.- Parameters
strategy_fn (
Callable
[…,SearchStrategy
]) – add strategy to a check, using check statistics to generate ahypothesis
strategy.
- pandera.strategies.series_strategy(pandera_dtype, strategy=None, *, checks=None, nullable=False, unique=False, name=None, size=None)[source]#
Strategy to generate a pandas Series.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.nullable (
bool
) – whether or not generated Series contains null values.unique (
bool
) – whether or not generated Series contains unique values.
- Returns
hypothesis
strategy.
- pandera.strategies.str_contains_strategy(pandera_dtype, strategy=None, *, pattern)[source]#
Strategy to generate strings that contain a particular pattern.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.pattern (
str
) – regex pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.str_endswith_strategy(pandera_dtype, strategy=None, *, string)[source]#
Strategy to generate strings that end with a specific string pattern.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.string (
str
) – string pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.str_length_strategy(pandera_dtype, strategy=None, *, min_value, max_value)[source]#
Strategy to generate strings of a particular length
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
int
) – minimum string length.max_value (
int
) – maximum string length.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.str_matches_strategy(pandera_dtype, strategy=None, *, pattern)[source]#
Strategy to generate strings that patch a regex pattern.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.pattern (
str
) – regex pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
- pandera.strategies.str_startswith_strategy(pandera_dtype, strategy=None, *, string)[source]#
Strategy to generate strings that start with a specific string pattern.
- Parameters
pandera_dtype (
Union
[DataType
,DataType
]) –pandera.dtypes.DataType
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.string (
str
) – string pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy