Contributing¶
Whether you are a novice or experienced software developer, all contributions and suggestions are welcome!
Getting Started¶
If you are looking to contribute to the pandera codebase, the best place to start is the GitHub “issues” tab. This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation.
Contributing to the Codebase¶
The code is hosted on GitHub, so you will need to use Git to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment.
An excellent guide on setting up python environments can be found
here.
Pandera offers a environment.yml
to set up a conda-based environment and
requirements-dev.txt
for a virtualenv.
Project Releases¶
Releases are organized under milestones, which are be associated with a corresponding branch. This project uses semantic versioning, and we recommend prioritizing issues associated with the next release.
Contributing Documentation¶
Maybe the easiest, fastest, and most useful way to contribute to this project (and any other project) is to contribute documentation. If you find an API within the project that doesn’t have an example or description, or could be clearer in its explanation, contribute yours!
You can also find issues for improving documentation under the docs label. If you have ideas for documentation improvements, you can create a new issue here
This project uses Sphinx for auto-documentation and RST syntax for docstrings. Once you have the code downloaded and you find something that is in need of some TLD, take a look at the Sphinx documentation or well-documented examples within the codebase for guidance on contributing.
You can build the html documentation by running nox -s docs
. The built
documentation can be found in docs/_build
.
Contributing Bugfixes¶
Bugs are reported under the bug label, so if you find a bug create a new issue here.
Contributing Enhancements¶
New feature issues can be found under the enhancements label. You can request a feature by creating a new issue here.
Set up pre-commit
¶
This project uses pre-commit to ensure that code
standard checks pass locally before pushing to the remote project repo. Follow
the installation instructions, then
set up hooks with pre-commit install
. After, black
, pylint
and mypy
checks should be run with every commit.
Run the test suite locally¶
Before submitting your changes for review, make sure to check that your changes do not break any tests by running:
# if you're working with virtualenv
$ make nox
# if you're working with conda
$ make nox-conda
Making Pull Requests¶
Once your changes are ready to be submitted, make sure to push your changes to your fork of the GitHub repo before creating a pull request. Depending on the type of issue the pull request is resolving, your pull request should merge onto the appropriate branch:
Bugfixes¶
branch naming convention:
bugfix/<issue number>
orbugfix/<bugfix-name>
pull request to:
dev
Documentation¶
branch naming convention:
docs/<issue number>
ordocs/<doc-name>
pull request to:
release/x.x.x
branch if specified in the issue milestone, otherwisedev
Enhancements¶
branch naming convention:
feature/<issue number>
orfeature/<bugfix-name>
pull request to:
release/x.x.x
branch if specified in the issue milestone, otherwisedev
We will review your changes, and might ask you to make additional changes before it is finally ready to merge. However, once it’s ready, we will merge it, and you will have successfully contributed to the codebase!
Questions, Ideas, General Discussion¶
Head on over to the discussion
section if you have questions or ideas, want to show off something that you
did with pandera
, or want to discuss a topic related to the project.
Dataframe Schema Style Guides¶
We have guidelines regarding dataframe and schema styles that are encouraged for each pull request:
If specifying a single column DataFrame, this can be expressed as a one-liner:
DataFrameSchema({"col1": Column(...)})
If specifying one column with multiple lines, or multiple columns:
DataFrameSchema( { "col1": Column( int, checks=[ Check(...), Check(...), ] ), } )
If specifying columns with additional arguments that fit in one line:
DataFrameSchema( {"a": Column(int, nullable=True)}, strict=True )
If specifying columns with additional arguments that don’t fit in one line:
DataFrameSchema( { "a": Column( int, nullable=True, coerce=True, ... ), "b": Column( ..., ) }, strict=True)