pandera.strategies

Generate synthetic data from a schema definition.

new in 0.6.0

This module is responsible for generating data based on the type and check constraints specified in a pandera schema. It’s built on top of the hypothesis package to compose strategies given multiple checks specified in a schema.

See the user guide for more details.

pandera.strategies.column_strategy(pandas_dtype, strategy=None, *, checks=None, allow_duplicates=True, name=None)[source]

Create a data object describing a column in a DataFrame.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (Optional[Sequence]) – sequence of Check s to constrain the values of the data in the column/index.

  • allow_duplicates (Optional[bool]) – whether or not generated Series contains duplicates.

  • name (Optional[str]) – name of the Series.

Returns

a column object.

pandera.strategies.dataframe_strategy(pandas_dtype=None, strategy=None, *, columns=None, checks=None, index=None, size=None, n_regex_columns=1)[source]

Strategy to generate a pandas DataFrame.

Parameters
  • pandas_dtype (Optional[PandasDtype]) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – if specified, this will raise a BaseStrategyOnlyError, since it cannot be chained to a prior strategy.

  • columns (Optional[Dict]) – a dictionary where keys are column names and values are Column objects.

  • checks (Optional[Sequence]) – sequence of Check s to constrain the values of the data at the dataframe level.

  • index (Optional[Any]) – Index or MultiIndex schema component.

  • size (Optional[int]) – number of elements in the Series.

  • n_regex_columns (int) – number of regex columns to generate.

Returns

hypothesis strategy.

pandera.strategies.eq_strategy(pandas_dtype, strategy=None, *, value)[source]

Strategy to generate a single value.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • value (Any) – value to generate.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.field_element_strategy(pandas_dtype, strategy=None, *, checks=None)[source]

Strategy to generate elements of a column or index.

Parameters
Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.ge_strategy(pandas_dtype, strategy=None, *, min_value)[source]

Strategy to generate values greater than or equal to a minimum value.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (Union[int, float]) – generate values greater than or equal to this.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.gt_strategy(pandas_dtype, strategy=None, *, min_value)[source]

Strategy to generate values greater than a minimum value.

Parameters
Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.in_range_strategy(pandas_dtype, strategy=None, *, min_value, max_value, include_min=True, include_max=True)[source]

Strategy to generate values within a particular range.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (Union[int, float]) – generate values greater than this.

  • max_value (Union[int, float]) – generate values less than this.

  • include_min (bool) – include min_value in generated data.

  • include_max (bool) – include max_value in generated data.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.index_strategy(pandas_dtype, strategy=None, *, checks=None, nullable=False, allow_duplicates=True, name=None, size=None)[source]

Strategy to generate a pandas Index.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (Optional[Sequence]) – sequence of Check s to constrain the values of the data in the column/index.

  • nullable (Optional[bool]) – whether or not generated Series contains null values.

  • allow_duplicates (Optional[bool]) – whether or not generated Series contains duplicates.

  • name (Optional[str]) – name of the Series.

  • size (Optional[int]) – number of elements in the Series.

Returns

hypothesis strategy.

pandera.strategies.isin_strategy(pandas_dtype, strategy=None, *, allowed_values)[source]

Strategy to generate values within a finite set.

Parameters
Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.le_strategy(pandas_dtype, strategy=None, *, max_value)[source]

Strategy to generate values less than or equal to a maximum value.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • max_value (Union[int, float]) – generate values less than or equal to this.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.lt_strategy(pandas_dtype, strategy=None, *, max_value)[source]

Strategy to generate values less than a maximum value.

Parameters
Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.multiindex_strategy(pandas_dtype=None, strategy=None, *, indexes=None, size=None)[source]

Strategy to generate a pandas MultiIndex object.

Parameters
Returns

hypothesis strategy.

pandera.strategies.ne_strategy(pandas_dtype, strategy=None, *, value)[source]

Strategy to generate anything except for a particular value.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • value (Any) – value to avoid.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.notin_strategy(pandas_dtype, strategy=None, *, forbidden_values)[source]

Strategy to generate values excluding a set of forbidden values

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • forbidden_values (Sequence[Any]) – set of forbidden values.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.numpy_complex_dtypes(dtype, min_value=0j, max_value=None, allow_infinity=None, allow_nan=None)[source]

Create numpy strategy for complex numbers.

Parameters
  • dtype – numpy complex number datatype

  • min_value (complex) – minimum value, must be complex number

  • max_value (Optional[complex]) – maximum value, must be complex number

Returns

hypothesis strategy

pandera.strategies.numpy_time_dtypes(dtype, min_value=None, max_value=None)[source]

Create numpy strategy for datetime and timedelta data types.

Parameters
  • dtype – numpy datetime or timedelta datatype

  • min_value – minimum value of the datatype to create

  • max_value – maximum value of the datatype to create

Returns

hypothesis strategy

pandera.strategies.pandas_dtype_strategy(pandas_dtype, strategy=None, **kwargs)[source]

Strategy to generate data from a pandera.dtypes.PandasDtype.

Parameters
Kwargs

key-word arguments passed into hypothesis.extra.numpy.from_dtype . For datetime, timedelta, and complex number datatypes, these arguments are passed into numpy_time_dtypes() and numpy_complex_dtypes().

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.register_check_strategy(strategy_fn)[source]

Decorate a Check method with a strategy.

This should be applied to a built-in Check method.

Parameters

strategy_fn (Callable[…, SearchStrategy]) – add strategy to a check, using check statistics to generate a hypothesis strategy.

pandera.strategies.series_strategy(pandas_dtype, strategy=None, *, checks=None, nullable=False, allow_duplicates=True, name=None, size=None)[source]

Strategy to generate a pandas Series.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (Optional[Sequence]) – sequence of Check s to constrain the values of the data in the column/index.

  • nullable (Optional[bool]) – whether or not generated Series contains null values.

  • allow_duplicates (Optional[bool]) – whether or not generated Series contains duplicates.

  • name (Optional[str]) – name of the Series.

  • size (Optional[int]) – number of elements in the Series.

Returns

hypothesis strategy.

pandera.strategies.str_contains_strategy(pandas_dtype, strategy=None, *, pattern)[source]

Strategy to generate strings that contain a particular pattern.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • pattern (str) – regex pattern.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.str_endswith_strategy(pandas_dtype, strategy=None, *, string)[source]

Strategy to generate strings that end with a specific string pattern.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • string (str) – string pattern.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.str_length_strategy(pandas_dtype, strategy=None, *, min_value, max_value)[source]

Strategy to generate strings of a particular length

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (int) – minimum string length.

  • max_value (int) – maximum string length.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.str_matches_strategy(pandas_dtype, strategy=None, *, pattern)[source]

Strategy to generate strings that patch a regex pattern.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • pattern (str) – regex pattern.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.str_startswith_strategy(pandas_dtype, strategy=None, *, string)[source]

Strategy to generate strings that start with a specific string pattern.

Parameters
  • pandas_dtype (PandasDtype) – pandera.dtypes.PandasDtype instance.

  • strategy (Optional[SearchStrategy]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • string (str) – string pattern.

Return type

SearchStrategy

Returns

hypothesis strategy

pandera.strategies.verify_pandas_dtype(pandas_dtype, schema_type, name)[source]

Verify that pandas_dtype argument is not None.