pandera.Hypothesis.one_sample_ttest

classmethod Hypothesis.one_sample_ttest(popmean, sample=None, groupby=None, relationship='equal', alpha=0.01, raise_warning=False)[source]

Calculate a t-test for the mean of one sample.

Parameters
  • sample (Optional[str]) – The sample group to test. For Column and SeriesSchema hypotheses, this refers to the groupby level that is used to subset the Column being checked. For DataFrameSchema hypotheses, refers to column in the DataFrame.

  • groupby (Union[str, List[str], Callable, None]) –

    If a string or list of strings is provided, then these columns are used to group the Column Series by groupby. If a callable is passed, the expected signature is DataFrame -> DataFrameGroupby. The function has access to the entire dataframe, but the Column.name is selected from this DataFrameGroupby object so that a SeriesGroupBy object is passed into fn.

    Specifying this argument changes the fn signature to: dict[str|tuple[str], Series] -> bool|pd.Series[bool]

    Where specific groups can be obtained from the input dict.

  • popmean (float) – population mean to compare sample to.

  • relationship (str) – Represents what relationship conditions are imposed on the hypothesis test. Available relationships are: “greater_than”, “less_than”, “not_equal” and “equal”. For example, group1 greater_than group2 specifies an alternative hypothesis that the mean of group1 is greater than group 2 relative to a null hypothesis that they are equal.

  • alpha (float) – (Default value = 0.01) The significance level; the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.01 indicates a 1% risk of concluding that a difference exists when there is no actual difference.

  • raise_warning – if True, check raises UserWarning instead of SchemaError on validation.

Example

If you want to compare one sample with a pre-defined mean:

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> schema = pa.DataFrameSchema({
...     "height_in_feet": pa.Column(
...         pa.Float, [
...             pa.Hypothesis.one_sample_ttest(
...                 popmean=5,
...                 relationship="greater_than",
...                 alpha=0.1),
...     ]),
... })
>>> df = (
...     pd.DataFrame({
...         "height_in_feet": [8.1, 7, 6.5, 6.7, 5.1],
...     })
... )
>>> schema.validate(df)
   height_in_feet
0             8.1
1             7.0
2             6.5
3             6.7
4             5.1