pandera.strategies¶
Generate synthetic data from a schema definition.
new in 0.6.0
This module is responsible for generating data based on the type and check
constraints specified in a pandera
schema. It’s built on top of the
hypothesis package
to compose strategies given multiple checks specified in a schema.
See the user guide for more details.
-
pandera.strategies.
column_strategy
(pandas_dtype, strategy=None, *, checks=None, allow_duplicates=True, name=None)[source]¶ Create a data object describing a column in a DataFrame.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.allow_duplicates (
Optional
[bool
]) – whether or not generated Series contains duplicates.
- Returns
a column object.
-
pandera.strategies.
dataframe_strategy
(pandas_dtype=None, strategy=None, *, columns=None, checks=None, index=None, size=None, n_regex_columns=1)[source]¶ Strategy to generate a pandas DataFrame.
- Parameters
pandas_dtype (
Optional
[PandasDtype
]) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – if specified, this will raise a BaseStrategyOnlyError, since it cannot be chained to a prior strategy.columns (
Optional
[Dict
]) – a dictionary where keys are column names and values areColumn
objects.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data at the dataframe level.index (
Optional
[Any
]) – Index or MultiIndex schema component.n_regex_columns (
int
) – number of regex columns to generate.
- Returns
hypothesis
strategy.
-
pandera.strategies.
eq_strategy
(pandas_dtype, strategy=None, *, value)[source]¶ Strategy to generate a single value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.value (
Any
) – value to generate.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
field_element_strategy
(pandas_dtype, strategy=None, *, checks=None)[source]¶ Strategy to generate elements of a column or index.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
ge_strategy
(pandas_dtype, strategy=None, *, min_value)[source]¶ Strategy to generate values greater than or equal to a minimum value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
Union
[int
,float
]) – generate values greater than or equal to this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
gt_strategy
(pandas_dtype, strategy=None, *, min_value)[source]¶ Strategy to generate values greater than a minimum value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
Union
[int
,float
]) – generate values larger than this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
in_range_strategy
(pandas_dtype, strategy=None, *, min_value, max_value, include_min=True, include_max=True)[source]¶ Strategy to generate values within a particular range.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
Union
[int
,float
]) – generate values greater than this.max_value (
Union
[int
,float
]) – generate values less than this.include_min (
bool
) – include min_value in generated data.include_max (
bool
) – include max_value in generated data.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
index_strategy
(pandas_dtype, strategy=None, *, checks=None, nullable=False, allow_duplicates=True, name=None, size=None)[source]¶ Strategy to generate a pandas Index.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.nullable (
Optional
[bool
]) – whether or not generated Series contains null values.allow_duplicates (
Optional
[bool
]) – whether or not generated Series contains duplicates.
- Returns
hypothesis
strategy.
-
pandera.strategies.
isin_strategy
(pandas_dtype, strategy=None, *, allowed_values)[source]¶ Strategy to generate values within a finite set.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
le_strategy
(pandas_dtype, strategy=None, *, max_value)[source]¶ Strategy to generate values less than or equal to a maximum value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.max_value (
Union
[int
,float
]) – generate values less than or equal to this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
lt_strategy
(pandas_dtype, strategy=None, *, max_value)[source]¶ Strategy to generate values less than a maximum value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.max_value (
Union
[int
,float
]) – generate values less than this.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
multiindex_strategy
(pandas_dtype=None, strategy=None, *, indexes=None, size=None)[source]¶ Strategy to generate a pandas MultiIndex object.
- Parameters
pandas_dtype (
Optional
[PandasDtype
]) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Returns
hypothesis
strategy.
-
pandera.strategies.
ne_strategy
(pandas_dtype, strategy=None, *, value)[source]¶ Strategy to generate anything except for a particular value.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.value (
Any
) – value to avoid.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
notin_strategy
(pandas_dtype, strategy=None, *, forbidden_values)[source]¶ Strategy to generate values excluding a set of forbidden values
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
numpy_complex_dtypes
(dtype, min_value=0j, max_value=None, allow_infinity=None, allow_nan=None)[source]¶ Create numpy strategy for complex numbers.
-
pandera.strategies.
numpy_time_dtypes
(dtype, min_value=None, max_value=None)[source]¶ Create numpy strategy for datetime and timedelta data types.
- Parameters
dtype – numpy datetime or timedelta datatype
min_value – minimum value of the datatype to create
max_value – maximum value of the datatype to create
- Returns
hypothesis
strategy
-
pandera.strategies.
pandas_dtype_strategy
(pandas_dtype, strategy=None, **kwargs)[source]¶ Strategy to generate data from a
pandera.dtypes.PandasDtype
.- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.
- Kwargs
key-word arguments passed into hypothesis.extra.numpy.from_dtype . For datetime, timedelta, and complex number datatypes, these arguments are passed into
numpy_time_dtypes()
andnumpy_complex_dtypes()
.- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
register_check_strategy
(strategy_fn)[source]¶ Decorate a Check method with a strategy.
This should be applied to a built-in
Check
method.- Parameters
strategy_fn (
Callable
[…,SearchStrategy
]) – add strategy to a check, using check statistics to generate ahypothesis
strategy.
-
pandera.strategies.
series_strategy
(pandas_dtype, strategy=None, *, checks=None, nullable=False, allow_duplicates=True, name=None, size=None)[source]¶ Strategy to generate a pandas Series.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.checks (
Optional
[Sequence
]) – sequence ofCheck
s to constrain the values of the data in the column/index.nullable (
Optional
[bool
]) – whether or not generated Series contains null values.allow_duplicates (
Optional
[bool
]) – whether or not generated Series contains duplicates.
- Returns
hypothesis
strategy.
-
pandera.strategies.
str_contains_strategy
(pandas_dtype, strategy=None, *, pattern)[source]¶ Strategy to generate strings that contain a particular pattern.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.pattern (
str
) – regex pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
str_endswith_strategy
(pandas_dtype, strategy=None, *, string)[source]¶ Strategy to generate strings that end with a specific string pattern.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.string (
str
) – string pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
str_length_strategy
(pandas_dtype, strategy=None, *, min_value, max_value)[source]¶ Strategy to generate strings of a particular length
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.min_value (
int
) – minimum string length.max_value (
int
) – maximum string length.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
str_matches_strategy
(pandas_dtype, strategy=None, *, pattern)[source]¶ Strategy to generate strings that patch a regex pattern.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.pattern (
str
) – regex pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy
-
pandera.strategies.
str_startswith_strategy
(pandas_dtype, strategy=None, *, string)[source]¶ Strategy to generate strings that start with a specific string pattern.
- Parameters
pandas_dtype (
PandasDtype
) –pandera.dtypes.PandasDtype
instance.strategy (
Optional
[SearchStrategy
]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.string (
str
) – string pattern.
- Return type
SearchStrategy
- Returns
hypothesis
strategy