Skip to content

Style Guide

Dylan Beaudette edited this page Mar 8, 2022 · 12 revisions

Overview

This document outlines some general guidelines for maintaining consistency in coding style, introduction of breaking changes, documentation, and overall design within aqp. We should adhere to these suggestions to the extent that the they are helpful; none of us have the time to constantly nit-pick each other's work.

Please keep the main ("production") branch reserved for working, documented, and well-tested code. This generally means code whose purpose has been discussed and generally agreed upon by main contributors.

Other People's Code

Please be considerate when making suggestions for major changes to other's code, put major change suggestions into an alternate branch, and propose via pull request. This also applies to blocks of comments that may include notes yet-to-be-moved to an issue or other such documentation.

Dependencies

Dependencies should be kept to a minimum.

Adding and removing dependencies can require significant coordination in the form of one or more Issues and Pull Requests. Also, it may be necessary to do testing with downstream packages (soilDB, sharpshootR, plotKML, GSIF), especially where compiled code dependencies may be involved.

There are some existing issues related to replacements of dependencies.

For instance:

  • replacing plyr and reshape with data.table and making the SPC {data.table}-driven #157

Code

See #140 for active discussion.

Committing to the master branch (GitHub)

Most significant commits to master should be through a Pull Request

Commits:

  • Should pass all expected tests - devtools::test(). If tests do not pass, explanation for change in test behavior.

  • Build and check documentation - devtools::check_man(). Only need to commit related changes.

  • R CMD CHECK passes w/ run-donttest examples - devtools::check(run_dont_test = TRUE)

The master branch is regularly downloaded by our users, and contributors need it as a stable "base" for their work. Do yourself and others the favor of taking time to review your contributions in a variety of ways.

The master branch /R folder is not the place to store "experimental" code, or code that has not been discussed as to how it will be integrated into the package. This stuff is "live" in the NAMESPACE as soon as it is pushed to master. You CAN put code and demos in /misc as long as it is reasonably polished such that someone else could follow it.

Folks are encouraged to make a branch (if they are part of the aqp-package-development group), or fork the package and make a branch and submit a pull request (PR) even for small changes.

Notes on Pull Requests

GitHub Actions are set up to run checks for any PR against master, so it is a great way to check, revise and even show off your work.

Simply reviewing your own code in a different format or a different data set/source may be enough to reveal something you didn't catch.

Other contributors may have ideas about how your code relates to other parts in the package, or suggestions. Suggestions from contributors are generally not hard requirements if a robust discussion is had about the needs. The main feature of GitHub for managing our version control repositories is a rich API for collaboration and code comparison/review. We use a variety of ways of sharing information and code and like to leave options open and flexible for contributors.

Requirements

At this time it is not "required" to use a PR to commit to the master (production) branch. Nor is there a hard requirement that R CMD check or other Actions pass. There is a strong suggestion that you do have all that sorted out, but the repository will not prevent the commit from occurring.

Bug "hot" fixes, limited changes that do not dramatically affect or relate to existing functionality (i.e. all tests passing for a new argument to existing function), and commits to demos in the /misc folder are all acceptable things for direct-to-master. Care should be taken with e.g. function argument naming, and changes to code contributed or recently altered by others.

A PR is strongly encouraged where any coordination with existing functionality is warranted. Discussion and concise ways of reviewing code become necessary in this case, and it is too challenging to track if things are getting mixed into the production branch. All individuals who have permission to commit to the production branch should strongly consider contacting other contributors to the package before adding anything new.

Function Design

Kent Beck gave four design rules that many programmers have accepted as valuable. An interpretation in parentheses. These are good things to consider when adding a new function to the NAMESPACE or considering adding/exposing additional features.

  • Passes the tests (write tests and then write function to specifications)
  • Reveals intention (clear, expressive, consistent)
  • No duplication (D.R.Y -- don't repeat yourself)
  • Fewest elements (be concise)

Martin Fowler summarizes them well here: https://martinfowler.com/bliki/BeckDesignRules.html. The rules are in priority order, so "passes the tests" takes priority over "reveals intention"

Aspirations:

  • optimize for intuition ("estimate a thing") and accurate code-completion (estimate...) vs. pure style adherance
  • relatively "uniform" syntax that reflects a cohesive, well-thought out package
  • minimal number of keystrokes, while still providing essential context as to what a function does
  • function names should include verbs -- they "do" something
  • arguments (names) that refer to the same data for similar functions should do similar things
  • some overlap of functionality is inevitable, but should be minimized and well justified

Documentation

Use Roxygen for all functions and datasets. We currently need to manually add to NAMESPACE, as Roxygen is not present to generate it (yet). Please strongly consider adding @importFrom where appropriate! Try to use the most recent {roxygen2} package from CRAN and keep the version the package is using up to date.

A stylistic note: do not over-emphasize that code is experimental except in so far as it may pertain to potential instabilities in the interface. All code in the production form of the package should have documentation that is good enough to show to your boss and/or someone who was skeptical of the package.

Markdown in Roxygen

Now you can use Markdown syntax in Roxygen which makes adding code and formatting a breeze.

Documentation checks

Code committed to the production branch should be passing all minimal R CMD CHECK documentation checks. You should run devtools::check_man() regularly to build the latest .Rd files and commit them along with your pull request or commits to the master branch. A good practice is to update the documentation after you have it working the way you want, and commit all affected .R/.Rds as a single related commit.

Do not make other people build your documentation for you if you can avoid it--but it does happen. If you do encounter an un-built documentation file not related to your work you are not obligated to commit it, but you can.

Use and Formatting of Roxygen Blocks and Tags

It is not required that Roxygen tags be explicitly specified where they can be inferred e.g. @title @description @details etc.; but it should be easy for a human reader to discern what is what.

You can specify anything explicitly that you want, but in general it is better not to over-specify things like @usage as much of the .Rd can be generated from the function definition adjacent to the Roxygen block. Roxygen blocks should be immediately adjacent to the object they document in the code.

Make appropriate use of white-space and new lines in Roxygen. Use Ctrl+Shift+/ to re-flow code to your RStudio set margin (Global Options >> Code >> Display >> Margin Column).

There is not currently a standard width specified for the package. Some individuals prefer to put paragraphs all on one line and rely on soft-wrapping. But most code bases using this type of in-source documentation use a fixed-width format.

A Margin Column value of 130 or so is probably reasonable balance between number of lines and readability, but not as a hard requirement -- with some manual correction for long URLs. and unusual formatting needs.

Testing

CRAN Release Cycle

Authorship

Package, functions, citations, documentation.