10  Package maintenance

The following discussions on this page apply to both R and Python software unless otherwise noted.

10.1 Licenses

All software (R, Python, etc.) in the WILDS should by default use the MIT license.

If there are circumstances which prevent using MIT, please do discuss with Sean or Scott.

10.1.1 R

In R, you can add the MIT license to your repository with the following code:

if (!requireNamespace("usethis", quietly = TRUE)) {
  install.packages("usethis")
}
usethis::use_mit_license()

Note that usethis::use_mit_license adds two files to your repository (LICENSE and LICENSE.md, and adds entries to the .Rbuildignore file so that R CMD CHECK doesn’t complain).

10.1.2 Python

There’s no widely accepted single tool for dealing with licenses in Python similar to the above for R. For Python packages, simply include the license type (e.g., MIT) in your setup.py, setup.cfg, pyproject.toml, etc., and include the text of the license in a LICENSE file in the root of your repository.

10.1.3 WDL

Similarly, there is currently no universal standard for incorporating licenses within WDL packages. We recommend stating the license type in a comment at the top of the workflow script, and including the text of the license in a LICENSE file in the root of your repository.

10.1.4 Docker

License type can be specified within the corresponding Dockerfile of each container using the “license” label:

LABEL org.opencontainers.image.licenses=MIT

10.2 Package versioning

There is a detailed discussion of versioning R packages in the lifecycle chapter of the R Packages book by Hadley Wickham and Jenny Bryan. Please follow that chapter in general for versioning of R and Python packages within the WILDS. To make it easier to grok, below are some of the highlights, and some exceptions to that chapter.

10.2.1 Package version numbers

There’s quite a bit of nuance - and surprises - to package version numbers - see the Package version number section for details. For example, using the utils::package_version() function, which parses package version strings into S3 classes, we get a suprising result:

"2.0" > "10.0"
#> [1] TRUE
package_version("2.0") > package_version("10.0")
#> [1] FALSE

With that example, please do think about your package versions before setting them.

10.2.2 WILDS Conventions

All of the software development work we do is versioned at every status.

In general we try to follow the principles of Semantic Versioning 2.0.0:

  • All software starts as version 0.1.0.
  • Bug fixes and hotfixes cause the patch number to increment: 0.1.0 → 0.1.1
  • Adding backwards compatible functionality causes the minor number to increment: 0.1.0 → 0.2.0
  • Adding non-backwards compatible functionality causes the major number to increment: 0.1.0 → 1.0.0
  • Minor number incrementation resets the patch number to zero: 0.1.3 → 0.2.0
  • Major number incrementation resets the minor and patch numbers to zero: 2.4.16 → 3.0.0
  • Developmental/incremental/fix/hotfix versions of software end with a fourth number starting with 9000. For example: 0.1.0.9000. The purpose of this number is that it can be incremented rapidly during the development process. For example, an individual working on a branch might increment the version from 0.1.0.9050 to 0.1.0.9061 over the course of multiple commits. This fourth number should be removed whenever we merge from dev to main (see the Branching section).

Note that project status and version numbers are unrelated.

Following the Tidyverse package version conventions, WILDS packages will use the following conventions (see the link for more details):

  • Always use . as the separator, never use -.
  • A released version number consists of three components, <major>.<minor>.<patch>
  • While a package is in between releases, there is a fourth component: the development version, starting at 9000 (e.g., 0.2.2.9000), and incrementing from there until the package has another release at which point return to three components.

10.2.3 Ignore these

We are not following or enforcing any rules about changes at the function/class/etc level below the package level. For example, the R Packages book talks about using the lifecycle package to deal with function changes.

10.3 Branching

Overall the branching lifecycle can be summed up by the following:

  1. Make sure your local dev branch is up to date with the remote dev branch. When you want to add new content to a repository, you create a new branch from the dev branch.
  2. Make changes to that branch, then open a pull request against the dev branch.
  3. Once the contributions are reviewed and you get approval from a project lead, your pull request will be merged into the dev branch.
  4. Whenever project leads want to create a new release, they merge the dev branch into the main branch and tag a new release.
  5. Urgent fixes to the main branch can be made by creating a branch from main that begins with the hotfix- prefix, which is then merged directly into main via pull request and code review. The dev branch will then be updated to reflect the changes on main via pull request. We should do our best to avoid this pattern. Since this changes the main branch, there will be a new tagged release.

We try to follow the principles set out in Gitflow (Chapter 3) with a few modifications:

  • We use the main branch instead of the master branch, and we use the dev branch instead of the develop branch.
  • We don’t use release branches, we create release tags from the main branch. Therefore we shouldn’t use the release- prefix at all.

Every project should have a main branch which contains the most stable version of the software, or the version of the software that corresponds to the most recent release of the software.

Prototype status projects must have all of the above and the following:

  • At least two but no more than three designated project leads (specified in the CODEOWNERS file).
  • main branch protections such that main can only be updated by a pull request from dev or a branch that begins with hotfix-, and a review from at least one project lead.

Stable status projects must have all of the above and the following:

  • Code releases that correspond to specific git tags.

10.4 Package releases

In general follow the Releasing to CRAN chapter in the book R Packages for R, and the Releasing and versioning chapter in the book Python Packages for Python. Those chapters don’t have to be followed to the letter, but in general they provide really good guidance.

There are a few aspects of releases we are opinionated about and would like all WILDS R and Python packages to follow:

  • Follow our versioning guidelines above (Section 10.2).
  • Every merge from dev into main should constitute a release, which should generate a tagged version of the software and an increment to the version number
  • Every time the version number is incremented, the NEWS.md file should be updated describing what changes to the software have been made in the new version.
  • If a software package is distributed on an archive like CRAN or PyPI, wait to update the main branch until that software has been accepted by the archive.
  • How we create tagged releases on GitHub:
    • The tag itself should be called v[Version Number] (like v0.1.0 or v2.4.1).
    • Describe each update and change in the release in a bullet point.
    • To the greatest extent possible, link each update or change to an issue, pull request, and to the developers that led the implementation.
    • See https://github.com/getwilds/proof/releases/tag/v1.0 for a great example of how to structure a release.