Reading Third-Party Schema¶
new in 0.7.0
Pandera now accepts schema from other data validation frameworks. This requires
a pandera installation with the io
extension; please see the
installation instructions for more details.
Frictionless Data Schema¶
Note
Please see the Frictionless schema documentation for more information on this standard.
- pandera.io.from_frictionless_schema(schema)[source]¶
Create a
DataFrameSchema
from either a frictionless json/yaml schema file saved on disk, or from a frictionless schema already loaded into memory.Each field from the frictionless schema will be converted to a pandera column specification using
FrictionlessFieldParser
to map field characteristics to pandera column specifications.- Parameters:
schema (
Union
[str
,Path
,Dict
,Schema
]) – the frictionless schema object (or a string/Path to the location on disk of a schema specification) to parse.- Return type:
- Returns:
dataframe schema with frictionless field specs converted to pandera column checks and constraints for use as normal.
- Example:
Here, we’re defining a very basic frictionless schema in memory before parsing it and then querying the resulting
DataFrameSchema
object as per any other Pandera schema:>>> from pandera.io import from_frictionless_schema >>> >>> FRICTIONLESS_SCHEMA = { ... "fields": [ ... { ... "name": "column_1", ... "type": "integer", ... "constraints": {"minimum": 10, "maximum": 99} ... }, ... { ... "name": "column_2", ... "type": "string", ... "constraints": {"maxLength": 10, "pattern": "\\S+"} ... }, ... ], ... "primaryKey": "column_1" ... } >>> schema = from_frictionless_schema(FRICTIONLESS_SCHEMA) >>> schema.columns["column_1"].checks [<Check in_range: in_range(10, 99)>] >>> schema.columns["column_1"].required True >>> schema.columns["column_1"].unique True >>> schema.columns["column_2"].checks [<Check str_length: str_length(None, 10)>, <Check str_matches: str_matches('^\S+$')>]
under the hood, this uses the FrictionlessFieldParser
class
to parse each frictionless field (column):
- class pandera.io.pandas_io.FrictionlessFieldParser(field, primary_keys)[source]¶
Parses frictionless data schema field specifications so we can convert them to an equivalent Pandera
Column
schema.For this implementation, we are using field names, constraints and types but leaving other frictionless parameters out (e.g. foreign keys, type formats, titles, descriptions).
- Parameters:
field – a field object from a frictionless schema.
primary_keys – the primary keys from a frictionless schema. These are used to ensure primary key fields are treated properly - no duplicates, no missing values etc.
- property checks: Dict | None¶
Convert a set of frictionless schema field constraints into checks.
This parses the standard set of frictionless constraints which can be found here and maps them into the equivalent pandera checks.
- property coerce: bool¶
Determine whether values within this field should be coerced.
This currently returns
True
for all fields within a frictionless schema.- Return type:
- property dtype: str¶
Determine what type of field this is, so we can feed that into
DataType
. If no type is specified in the frictionless schema, we default to string values.- Return type:
- Returns:
the pandas-compatible representation of this field type as a string.
- property nullable: bool¶
Determine whether this field can contain missing values.
If a field is a primary key, this will return
False
.- Return type:
- property regex: bool¶
Determine whether this field name should be used for regex matches.
This currently returns
False
for all fields within a frictionless schema.- Return type: