Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add a github bot to support advanced PR review workflows #3037

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .github/workflows/bot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: GitHub Bot

on:
# Watch for changes on PR state, assignees, labels, head branch and draft/ready status
pull_request_target:
types:
- assigned
- unassigned
- labeled
- unlabeled
- opened
- reopened
- synchronize # PR head updated
- converted_to_draft
- ready_for_review
Comment on lines +7 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing types for reference:

- assigned
- unassigned
- labeled
- unlabeled
- opened
- edited
- closed
- reopened
- synchronize
- converted_to_draft
- locked
- unlocked
- enqueued
- dequeued
- milestoned
- demilestoned
- ready_for_review
- review_requested
- review_request_removed
- auto_merge_enabled
- auto_merge_disabled

Why don't just run on all types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only included the ones that were used by the existing conditions/requirements of the bot. It seems better to me than subscribing to all possible events for several reasons :

  • It makes it clearer which events the bot uses
  • in term of security, it follows the PoLP (even though a trigger is not exactly a privilege)
  • it prevents the workflow being triggered and launching two runners for nothing

With the idea that if we added, for example, a condition/requirement based on adding to/removing from a milestone, we would just add the corresponding trigger event to the workflow.


# Watch for changes on PR comment
issue_comment:
types: [created, edited, deleted]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, why don't just run on all types and let the bot decide what to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


# Manual run from GitHub Actions interface
workflow_dispatch:
inputs:
pull-request-list:
description: "PR(s) to process: specify 'all' or a comma separated list of PR numbers, e.g. '42,1337,7890'"
required: true
default: all
type: string

jobs:
# This job creates a matrix of PR numbers based on the inputs from the various
# events that can trigger this workflow so that the process-pr job below can
# handle the parallel processing of the pull-requests
define-prs-matrix:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this job. Do you mind to explain it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry if it's not clear enough, so there are two constraints:

  • First, we need to prevent multiple runners from processing the same PR at the same time (to avoid any bug)
  • Second, there are different events that can trigger this workflow, and not all of them provide the PR number(s) to process the same way.

So the goal of this job (define-prs-matrix) is first to establish a list of PR numbers to process from the different possible events, then to format this list as a matrix so that the next job (process-pr) can launch a runner for each PR number in the matrix while ensuring (through the 'concurrency' feature of GitHub Actions) that no PR is processed concurrently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeddi some additional questions here:

  • Is there a scenario where one PR can block another PR? Or concurrency only happen when manually triggered?
  • How much time does a job usually take to complete? If one PR can block another PR and take 10 minutes (e.g., long process, network issue, ...) to complete, it might start to be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The concurrency parameter of the job is really set to prevent a PR X from being processed by two runners in parallel. For example, if PR 1111 is already being processed by one runner and a new event (any event, manual or not) triggers the job to also process PR 1111, the new runner will be queued and will wait for the first runner to finish execution before starting the job. However, the processing of PR 2222 can occur without any issues in parallel with that of 1111.
  • Never saw a run last longer than 1 min, on average the execution time is around 40 secs (and one PR can't block another one anyway ;) )

name: Define PRs matrix
# Prevent bot from retriggering itself
if: ${{ github.actor != vars.GH_BOT_LOGIN }}
runs-on: ubuntu-latest
permissions:
pull-requests: read
outputs:
pr-numbers: ${{ steps.pr-numbers.outputs.pr-numbers }}

steps:
- name: Parse event inputs
id: pr-numbers
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Triggered by a workflow dispatch event
if [ '${{ github.event_name }}' = 'workflow_dispatch' ]; then

# If the input is 'all', create a matrix with every open PRs
if [ '${{ inputs.pull-request-list }}' = 'all' ]; then
pr_list=`gh pr list --state 'open' --repo '${{ github.repository }}' --json 'number' --template '{{$first := true}}{{range .}}{{if $first}}{{$first = false}}{{else}}, {{end}}{{"\""}}{{.number}}{{"\""}}{{end}}'`
[ -z "$pr_list" ] && echo 'Error: no opened PR found' >&2 && exit 1
echo "pr-numbers=[$pr_list]" >> "$GITHUB_OUTPUT"

# If the input is not 'all', test for each number in the comma separated
# list if the associated PR is opened, then add it to the matrix
else
pr_list_raw='${{ inputs.pull-request-list }}'
pr_list=''
IFS=','
for number in $pr_list; do
trimed=`echo "$number" | xargs`
pr_state=`gh pr view "$trimed" --repo '${{ github.repository }}' --json 'state' --template '{{.state}}' 2> /dev/null`
[ "$pr_state" != 'OPEN' ] && echo "Error: PR with number <$trimed> is not opened" >&2 && exit 1
done
echo "pr-numbers=[$pr_list]" >> "$GITHUB_OUTPUT"
fi

# Triggered by comment event, just add the associated PR number to the matrix
elif [ '${{ github.event_name }}' = 'issue_comment' ]; then
echo 'pr-numbers=["${{ github.event.issue.number }}"]' >> "$GITHUB_OUTPUT"

# Triggered by pull request target event, just add the associated PR number to the matrix
elif [ '${{ github.event_name }}' = 'pull_request_target' ]; then
echo 'pr-numbers=["${{ github.event.pull_request.number }}"]' >> "$GITHUB_OUTPUT"

# Should never happen
else
echo 'Error: unknown event ${{ github.event_name }}' >&2 && exit 1
fi

# This job processes each pull request in the matrix individually while ensuring
# that a same PR cannot be processed concurrently by mutliple runners
process-pr:
name: Process PR
needs: define-prs-matrix
runs-on: ubuntu-latest
strategy:
matrix:
# Run one job for each PR to process
pr-number: ${{ fromJSON(needs.define-prs-matrix.outputs.pr-numbers) }}
concurrency:
# Prevent running concurrent jobs for a given PR number
group: ${{ matrix.pr-number }}

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod

- name: Run GitHub Bot
working-directory: contribs/github-bot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than a bot is a script, right? I was expecting a server handling events through the GitHub API with a state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we defined the specifications with @moul and @thehowl (I can't remember who else was on the call we had in Turin about it), one of the requirements was that the bot shouldn't run on a server but that everything should be processed through GitHub Actions.

It makes things more complicated to test and the manual checks more complicated to implement using the comment body as their state instead of a DB or similar, but it has the advantage of having a piece of code that runs in a serverless way.

But if the name is misleading, we can rename it.

env:
GITHUB_TOKEN: ${{ secrets.GH_BOT_PAT }}
run: go run . -pr-numbers '${{ matrix.pr-number }}' -verbose
51 changes: 51 additions & 0 deletions contribs/github-bot/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# GitHub Bot

## Overview

The GitHub Bot is designed to automate and streamline the process of managing pull requests. It can automate certain tasks such as requesting reviews, assigning users or applying labels, but it also ensures that certain requirements are satisfied before allowing a pull request to be merged. Interaction with the bot occurs through a comment on the pull request, providing all the information to the user and allowing them to check boxes for the manual validation of certain rules.

## How It Works

### Configuration

The bot operates by defining a set of rules that are evaluated against each pull request passed as parameter. These rules are categorized into automatic and manual checks:

- **Automatic Checks**: These are rules that the bot evaluates automatically. If a pull request meets the conditions specified in the rule, then the corresponding requirements are exectued. For example, ensuring that changes to specific directories are reviewed by specific team members.
- **Manual Checks**: These require human intervention. If a pull request meets the conditions specified in the rule, then a checkbox that can be checked only by specified teams is displayed on the bot comment. For example, determining if infrastructure needs to be updated based on changes to specific files.

The bot configuration is defined in Go and is located in the file [config.go](./config.go).

### GitHub Token

For the bot to make requests to the GitHub API, it needs a Personal Access Token. The fine-grained permissions to assign to the token for the bot to function are:

- `pull_requests` scope to read is the bare minimum to run the bot in dry-run mode
- `pull_requests` scope to write to be able to update bot comment, assign user, apply label and request review
- `contents` scope to read to be able to check if the head branch is up to date with another one
- `commit_statuses` scope to write to be able to update pull request bot status check

## Usage

```bash
> go install github.com/gnolang/gno/contribs/github-bot@latest
// (go: downloading ...)

> github-bot --help
This tool checks if the requirements for a PR to be merged are satisfied (defined in config.go) and displays PR status checks accordingly.
A valid GitHub Token must be provided by setting the GITHUB_TOKEN environment variable.

-dry-run
print if pull request requirements are satisfied without updating anything on GitHub
-owner string
owner of the repo to process, if empty, will be retrieved from GitHub Actions context
-pr-all
process all opened pull requests
-pr-numbers value
pull request(s) to process, must be a comma separated list of PR numbers, e.g '42,1337,7890'. If empty, will be retrieved from GitHub Actions context
-repo string
repo to process, if empty, will be retrieved from GitHub Actions context
-timeout uint
timeout in milliseconds
-verbose
set logging level to debug
```
Loading
Loading