pandera.strategies.pandas_strategiesΒΆ

Generate synthetic data from a schema definition.

new in 0.6.0

This module is responsible for generating data based on the type and check constraints specified in a pandera schema. It’s built on top of the hypothesis package to compose strategies given multiple checks specified in a schema.

See the user guide for more details.

pandera.strategies.pandas_strategies.column_strategy(pandera_dtype, strategy=None, *, checks=None, unique=False, name=None)[source]ΒΆ

Create a data object describing a column in a DataFrame.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (UnionType[Sequence, None]) – sequence of Check s to constrain the values of the data in the column/index.

  • unique (bool) – whether or not generated Series contains unique values.

  • name (UnionType[str, None]) – name of the Series.

Returns:

a column object.

pandera.strategies.pandas_strategies.compile_field_strategy(pandera_dtype, constraints)[source]ΒΆ

Compile a merged FieldConstraints into a single hypothesis strategy.

All dtype-specific bridging (datetime tz, complex, time resolutions, surrogate handling) is delegated to pandas_dtype_strategy(); this helper only translates the aggregated constraints into the appropriate kwargs and parent strategy.

Parameters:
  • pandera_dtype (DataType) – the field’s dtype.

  • constraints (FieldConstraints) – merged FieldConstraints produced by bucketing every check on the field.

Return type:

SearchStrategy

Returns:

a hypothesis strategy with at most one trailing .filter per residual predicate (built-in checks contribute zero residuals).

Raises:

ConstraintConflictError – when the merged constraint set is jointly unsatisfiable (e.g. isin set pruned to empty by bounds + notin).

pandera.strategies.pandas_strategies.convert_dtype(array, col_dtype)[source]ΒΆ

Convert datatypes of an array (series or index).

pandera.strategies.pandas_strategies.convert_dtypes(df, col_dtypes)[source]ΒΆ

Convert datatypes of a dataframe.

pandera.strategies.pandas_strategies.dataframe_strategy(pandera_dtype=None, strategy=None, *, columns=None, checks=None, unique=None, index=None, size=None, n_regex_columns=1)[source]ΒΆ

Strategy to generate a pandas DataFrame.

Parameters:
Returns:

hypothesis strategy.

pandera.strategies.pandas_strategies.eq_strategy(pandera_dtype, strategy=None, *, value)[source]ΒΆ

Strategy to generate a single value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • value (Any) – value to generate.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.field_element_strategy(pandera_dtype, strategy=None, *, checks=None)[source]ΒΆ

Strategy to generate elements of a column or index.

Buckets checks into:

  1. constraint-providing checks (built-in or via Check.constraint / register_check_method(constraint=...)): aggregated into a single FieldConstraints and compiled to one hypothesis strategy via compile_field_strategy(). No .filter chaining.

  2. legacy check.strategy callables: applied as a chained strategy on top of the merged base, preserving today’s behaviour for users who haven’t migrated.

  3. element-wise checks with no strategy / constraint: lowered to a single residual filter (warn the user that this is slow).

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. Reserved; passing a non-None value raises BaseStrategyOnlyError.

  • checks (UnionType[Sequence, None]) – sequence of Check s to constrain the values of the data in the column/index.

Return type:

SearchStrategy

Returns:

hypothesis strategy.

pandera.strategies.pandas_strategies.ge_strategy(pandera_dtype, strategy=None, *, min_value)[source]ΒΆ

Strategy to generate values greater than or equal to a minimum value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (Union[int, float]) – generate values greater than or equal to this.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.gt_strategy(pandera_dtype, strategy=None, *, min_value)[source]ΒΆ

Strategy to generate values greater than a minimum value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (Union[int, float]) – generate values larger than this.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.in_range_strategy(pandera_dtype, strategy=None, *, min_value, max_value, include_min=True, include_max=True)[source]ΒΆ

Strategy to generate values within a particular range.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (Union[int, float]) – generate values greater than this.

  • max_value (Union[int, float]) – generate values less than this.

  • include_min (bool) – include min_value in generated data.

  • include_max (bool) – include max_value in generated data.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.index_strategy(pandera_dtype, strategy=None, *, checks=None, nullable=False, unique=False, name=None, size=None)[source]ΒΆ

Strategy to generate a pandas Index.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (UnionType[Sequence, None]) – sequence of Check s to constrain the values of the data in the column/index.

  • nullable (bool) – whether or not generated Series contains null values.

  • unique (bool) – whether or not generated Series contains unique values.

  • name (UnionType[str, None]) – name of the Series.

  • size (UnionType[int, None]) – number of elements in the Series.

Returns:

hypothesis strategy.

pandera.strategies.pandas_strategies.isin_strategy(pandera_dtype, strategy=None, *, allowed_values)[source]ΒΆ

Strategy to generate values within a finite set.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • allowed_values (Sequence[Any]) – set of allowable values.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.le_strategy(pandera_dtype, strategy=None, *, max_value)[source]ΒΆ

Strategy to generate values less than or equal to a maximum value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • max_value (Union[int, float]) – generate values less than or equal to this.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.lt_strategy(pandera_dtype, strategy=None, *, max_value)[source]ΒΆ

Strategy to generate values less than a maximum value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • max_value (Union[int, float]) – generate values less than this.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.multiindex_strategy(pandera_dtype=None, strategy=None, *, indexes=None, size=None)[source]ΒΆ

Strategy to generate a pandas MultiIndex object.

Parameters:
Returns:

hypothesis strategy.

pandera.strategies.pandas_strategies.ne_strategy(pandera_dtype, strategy=None, *, value)[source]ΒΆ

Strategy to generate anything except for a particular value.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • value (Any) – value to avoid.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.notin_strategy(pandera_dtype, strategy=None, *, forbidden_values)[source]ΒΆ

Strategy to generate values excluding a set of forbidden values

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • forbidden_values (Sequence[Any]) – set of forbidden values.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.numpy_complex_dtypes(dtype, min_value=0j, max_value=None, allow_infinity=None, allow_nan=None)[source]ΒΆ

Create numpy strategy for complex numbers.

Parameters:
  • dtype – numpy complex number datatype

  • min_value (complex) – minimum value, must be complex number

  • max_value (UnionType[complex, None]) – maximum value, must be complex number

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.numpy_time_dtypes(dtype, min_value=None, max_value=None)[source]ΒΆ

Create numpy strategy for datetime and timedelta data types.

Parameters:
  • dtype (Union[dtype, DatetimeTZDtype]) – numpy datetime or timedelta datatype

  • min_value – minimum value of the datatype to create

  • max_value – maximum value of the datatype to create

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.pandas_dtype_strategy(pandera_dtype, strategy=None, **kwargs)[source]ΒΆ

Strategy to generate data from a pandera.dtypes.DataType.

Parameters:
  • pandera_dtype (DataType) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

Kwargs:

key-word arguments passed into hypothesis.extra.numpy.from_dtype . For datetime, timedelta, and complex number datatypes, these arguments are passed into numpy_time_dtypes() and numpy_complex_dtypes().

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.register_check_constraint(constraint_fn)[source]ΒΆ

Decorate a Check method with a constraint adapter.

Constraint adapters take the check’s statistics as kwargs and return a FieldConstraints value describing the bounds/membership/regex constraints the check encodes. When a check has a constraint adapter, the strategy builder prefers it over check.strategy and merges its output with sibling constraints, emitting a single hypothesis strategy (no per-check .filter chaining). See specs/optimized-strategies.md.

Parameters:

constraint_fn (Callable) – callable with signature (**statistics) -> FieldConstraints.

pandera.strategies.pandas_strategies.register_check_strategy(strategy_fn)[source]ΒΆ

Decorate a Check method with a strategy.

This should be applied to a built-in Check method.

Parameters:

strategy_fn (Callable[…, SearchStrategy]) – add strategy to a check, using check statistics to generate a hypothesis strategy.

pandera.strategies.pandas_strategies.series_strategy(pandera_dtype, strategy=None, *, checks=None, nullable=False, unique=False, name=None, size=None)[source]ΒΆ

Strategy to generate a pandas Series.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • checks (UnionType[Sequence, None]) – sequence of Check s to constrain the values of the data in the column/index.

  • nullable (bool) – whether or not generated Series contains null values.

  • unique (bool) – whether or not generated Series contains unique values.

  • name (UnionType[str, None]) – name of the Series.

  • size (UnionType[int, None]) – number of elements in the Series.

Return type:

SearchStrategy

Returns:

hypothesis strategy.

pandera.strategies.pandas_strategies.str_contains_strategy(pandera_dtype, strategy=None, *, pattern)[source]ΒΆ

Strategy to generate strings that contain a particular pattern.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • pattern (str) – regex pattern.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.str_endswith_strategy(pandera_dtype, strategy=None, *, string)[source]ΒΆ

Strategy to generate strings that end with a specific string pattern.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • string (str) – string pattern.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.str_length_strategy(pandera_dtype, strategy=None, *, min_value=None, max_value=None, exact_value=None)[source]ΒΆ

Strategy to generate strings of a particular length

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • min_value (UnionType[int, None]) – minimum string length.

  • max_value (UnionType[int, None]) – maximum string length.

  • exact_value (UnionType[int, None]) – exact string length.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.str_matches_strategy(pandera_dtype, strategy=None, *, pattern)[source]ΒΆ

Strategy to generate strings that patch a regex pattern.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • pattern (str) – regex pattern.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.str_startswith_strategy(pandera_dtype, strategy=None, *, string)[source]ΒΆ

Strategy to generate strings that start with a specific string pattern.

Parameters:
  • pandera_dtype (Union[DataType, DataType]) – pandera.dtypes.DataType instance.

  • strategy (UnionType[SearchStrategy, None]) – an optional hypothesis strategy. If specified, the pandas dtype strategy will be chained onto this strategy.

  • string (str) – string pattern.

Return type:

SearchStrategy

Returns:

hypothesis strategy

pandera.strategies.pandas_strategies.to_numpy_dtype(pandera_dtype)[source]ΒΆ

Convert a DataType to numpy dtype compatible with hypothesis.

pandera.strategies.pandas_strategies.verify_dtype(pandera_dtype, schema_type, name)[source]ΒΆ

Verify that pandera_dtype argument is not None.