pandera.api.pyspark.container.DataFrameSchema.call¶

DataFrameSchema.__call__(dataframe, head=None, tail=None, sample=None, random_state=None, lazy=True, inplace=False)[source]¶

Parameters:

dataframe (DataFrame) – DataFrame object i.e. the dataframe to be validated.
head (int) – Not used since spark has no concept of head or tail.
tail (int) – Not used since spark has no concept of head or tail.
sample (Optional[int]) – validate a random sample of n% rows. Value ranges from 0-1, for example 10% rows can be sampled using setting value as 0.1. refer below documentation. https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.sample.html
lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.
inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

pandera.api.pyspark.container.DataFrameSchema.__call__¶