pandera.api.pyspark.container.DataFrameSchema.__call__ΒΆ

DataFrameSchema.__call__(dataframe, head=None, tail=None, sample=None, random_state=None, lazy=True, inplace=False)[source]ΒΆ

Alias for DataFrameSchema.validate() method.

Parameters:
  • dataframe (DataFrame) – DataFrame object i.e. the dataframe to be validated.

  • head (int) – Not used since spark has no concept of head or tail.

  • tail (int) – Not used since spark has no concept of head or tail.

  • sample (Optional[int]) – validate a random sample of n% rows. Value ranges from 0-1, for example 10% rows can be sampled using setting value as 0.1. refer below documentation. https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.sample.html

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.