pandera.api.pyspark.components.Column.validateΒΆ

Column.validate(check_obj, head=None, tail=None, sample=None, random_state=None, lazy=True, inplace=False, error_handler=None)[source]ΒΆ

Validate a Column in a DataFrame object.

Parameters:
  • check_obj (DataFrame) – pyspark DataFrame to validate.

  • head (Optional[int]) – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail (Optional[int]) – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample (Optional[int]) – validate a random sample of fractional rows. Rows overlapping with head or tail are de-duplicated.

  • random_state (Optional[int]) – random seed for the sample argument.

  • lazy (bool) – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace (bool) – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

  • error_handler (Optional[ErrorHandler]) – pyspark error handler object to provide the error in a dictionary format.

Return type:

DataFrame

Returns:

validated DataFrame.