The following are a set of general expectations for the Taskcluster team. While some of the management-related expectations will not apply directly to community contributors, they still provide good guidance as to behavioral norms and accepted practice.
- Team members should do their utmost to adhere to expectations laid out in this document. While these are guidelines, not hard-and-fast rules, breaking them should always be accompanied by a solid rationale, communicated to those affected or the team at large.
- Team members should hold others accountable and indicate to them when they feel they are not meeting expectations. Persistent problems should be raised to management. Without collective accountability, these expectations will be ineffective.
- The Taskcluster team operates in multiple timezones on 3 different continents. Finding meeting times that work for everyone is challenging. We try to keep whole-group meetings to a minimum, but smaller meetings for subsets of interested people are encouraged. Information that comes out of smaller meetings should be shared with the group via email, bug comment, or documentation update. Decisions that come out of meetings still need to weather the RFC process (see below).
- The team commits to following relevant channels to stay informed. This includes bugmail and github notifications for important repositories like RFCs, minimum:
- bugzilla NIs
- github PRs
- Slack notifications
- Element pings
- calendar invites
- direct email
- When someone asks a question, give them the benefit of the doubt with the assumption that they’ve made an effort to figure it out. With that assumption, helpfully answer their question, rather than just directing them to the manual. “Being charitable” could describe this. This can be particularly challenging for community contributors, but the same logic applies.
- Similarly, when you need help, do at least some upfront research, and try to keep questions specific. Explaining the context of your problem is extremely helpful.
- We should switch to synchronous communication (zoom or other video chat) when asynchronous communication (email, bug comments) are artificially lengthening cycle times.
- Team members who do not work in Mozilla Standard Time (US Pacific Time) should be explicit about when they are available for meetings. They are not expected to be available in their evenings unless they choose to make themselves available.
- Interpersonal issues should be brought to management ASAP.
- Don’t suggest you will simply re-implement something as a way of shutting down discussion about a particular issue.
- Taskcluster manages major changes to the platform through "requests for comment", known as RFCs. These provide an open, transparent decision-making process and a way to track ideas from initial proposal through decision and implementation.
- Taskcluster's RFCs are stored in the taskcluster-rfcs repository. The RFC process is documented in the mechanics doc.
- RFCs should be written after establishing general agreement among the involved people about the issue and the direction to take.
- Design decisions can be revisited for security reasons or if there are significant concerns.
- Quick experiments can and should still happen in this model, but don't get too attached to them unless you are using them to strengthen a new or existing RFC. 30% reviews are a good way to gauge the appropriateness of a given experiment.
- Security concerns are an important part of the RFC process, i.e. security cannot be left to the implementation and review phase.
- New code should conform to the established Taskcluster best practices. Best efforts should be made to bring existing code up to those standards when changes are made. If best practices don't exist for a given area or language, the author should establish some with reference to other similar Taskcluster sources or industry best practices.
- For larger coding tasks, the author should seriously consider splitting it up and figuring out what can be committed from what has already been completed.
- Github Pull Requests should be assigned to two alternative reviewers, of which only one is required to perform the review, unless either the author or the reviewer explicitly request additional review from another party in the PR conversation.
- For PRs where the author does not have permission to merge the PR, the last reviewer to accept the changes should merge the PR on approval of the changes.
- For PRs where the author does have permission to merge the PR, the reviewer should NOT merge the PR, unless explicitly requested by the author in the PR conversation. This allows the PR author to be responsible for handling any issues resulting from the landing.
- Any changes landed to the main branch of the mozilla/community-tc-config github repo should be applied to the community-tc taskcluster deployment as soon as they land, by whoever landed the changes (until such time that this happens in automation).
- During code review, the author should, upon receiving feedback, assume that the reviewer had the best intentions during the review. At the same time, the reviewer should not give feedback in a negative way that could be taken personally. (See the article How to Do Code Reviews Like a Human (Part One) for more on this.)
- It is completely acceptable for the author to push back and discuss comments given during code review. The reviewer is not a gatekeeper to the “holy main branch”; we are a team working to build software. While we acknowledge that some services or areas of code have de facto owners, if there is disagreement about strategy or implementation, the larger group should be involved to decide on the best course or action. Where possible, these decisions should be reflected back into the best practices documentation.
- If there is a large or persistent disconnect between author and reviewer, setup a synchronous meeting to work through it. Management can facilitate as required.
- The author should have run the code, thoroughly tested their changes, and executed the test suite before posting their changes for review. Note: automated CI testing like travis is acceptable, even preferable, provided it exists for the repo in question AND is also augmented for your changes as required.
- The author should have reviewed their own changes critically, in the mindset of a reviewer, before formally requesting review. It is not acceptable to submit code for review immediately after the author has confirmed its functionality, as that is only part of writing good software.
- If an author needs additional help in testing, they should explicitly request it and give a reason as to why further manual testing is required. Otherwise there is no assumption that the fix/change will be tested by the reviewer. The reviewer is of course free to do so if they feel this will help the review.
- For reviewers, read the code before you review, i.e. don’t offer off-the-cuff responses based on what you assume is in the patch.
- When a reviewer approves a patch, they should be confident that they fully understand every change made. They should have carefully looked through test and doc changes. They should have looked for ways to break the code, due to bad networks, system failures, malicious users, attackers, etc. If the reviewer didn’t do a thorough review, e.g. the reviewer is requesting architectural changes, then the reviewer should note this.
- Reviewers should not demand scope creep/bloat when reviewing. File follow-up bugs for related improvements.
- It is OK to decline a review request, or bring in an alternate reviewer.
- Individual team members should set their own expectations for review turnaround, but should generally adhere to the Mozilla standard of 1 business day.
- Everyone involved in reviews should watch for potential security issues. Security-sensitive changes should involve multiple reviewers.
- Service requests should be handled same-day. Make sure the correct team is acting on the service request, e.g. releng should handle scope changes for Firefox CI.
- The QA contact is responsible for regular triage of their component(s). The weekly group triage meeting can be used to discuss “tricky” bugs.
- Open github issues will be reviewed weekly to ensure nothing is falling through the cracks.
- For bugs that won’t be tackled immediately (lack of resources, design decisions, etc), the last comment in the bug from the TC team should explain the rationale behind that decision. This applies whether the bug is to be resolved or not.
- Bugs marked as P5 are valid issues with no plans to fix. This is done to exclude these bugs from regular triage.
- Bugs marked as “blocker” are keeping the trees closed and require immediate action.
- One person should coordinate each outage using the #taskcluster Element channel. This coordinator is responsible for communicating outage status to interested parties (sheriffs, bugzilla updates, etc). The coordinator doesn’t necessarily need to come from the Taskcluster team.
- During an outage, the coordinator can call on whatever resources they need to unblock the issue.
- If an outage extends beyond the end of a coordinator’s working day, coordination should be explicitly handed off to someone else
- We will hold blameless post-mortems for all outages.
- Retrospective documentation for all outages should be documented in the retrospectives repository.