pandera.api.pyspark.components.Column.__init__ΒΆ
- Column.__init__(dtype=None, checks=None, nullable=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, metadata=None)[source]ΒΆ
Create column validator object.
- Parameters:
dtype (
Union
[str
,int
,float
,bool
,type
,DataType
,Type
,BooleanType
,StringType
,IntegerType
,DecimalType
,FloatType
,DateType
,TimestampType
,DoubleType
,ShortType
,ByteType
,LongType
,BinaryType
,None
]) β datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pyspark string values: https://spark.apache.org/docs/latest/sql-ref-datatypes.htmlchecks (
Union
[Check
,List
[Check
],None
]) β checks to verify validity of the columnnullable (
bool
) β Whether or not column can contain null values.coerce (
bool
) β If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) β Whether or not column is allowed to be missingname (
Union
[str
,Tuple
[str
, β¦],None
]) β column name in dataframe to validate.regex (
bool
) β whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.title (
Optional
[str
]) β A human-readable label for the column.description (
Optional
[str
]) β An arbitrary textual description of the column.
- Raises:
SchemaInitError β if impossible to build schema from parameters
- Example:
>>> import pyspark as ps >>> from pyspark.sql import SparkSession >>> import pandera.pyspark as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> spark = SparkSession.builder.getOrCreate() >>> schema.validate(spark.createDataFrame([{"column": "foo"},{ "column":"bar"}])).show() +------+ |column| +------+ | foo| | bar| +------+
See here for more usage details.