MegaLinter speed optimization #3461
wesley-dean-flexion
started this conversation in
General
Replies: 1 comment
-
@wesley-dean-flexion this is a perfect illustration that MegaLinter is a tool, but each project owns the strategy around the tool :) I think nobody uses 100% of the default MegaLinter configuration, but we have to start somewhere :) I think you and your team worked enough on your configuration to optimize it very well, but if you share the table with the list of linters and their execution time, maybe we can find even more time saving ^^ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
I was working with a team on a project that involved a SaaS solution in the Sales space.. one that's quite Forceful, for what it's worth. The engagement was primarily about cost-cutting, specifically surrounding the amount of billable time GitHub Actions was using. The organization's 6,000 minute monthly quota was being eaten up before the end of the month so the team was asked to take a look for ways to cut back on the amount of billable time spent running GitHub Actions.
Methodology
First, we looked at the Elapsed Time column as reported by the GitHub Comment reporter.
(this image came from the MegaLinter repository (docs/assets/images/) and not the project being optimized; it's provided as an example to show the columns in samples output)
The linters that took longer to run received the most attention. Linters that only took a second or two received no attention.
Then, we looked at the number under the Found column. If there were hundreds or thousands of findings consistently across many runs and the number didn't go down, we quickly concluded that the results of the scanner weren't being used and the linter could likely be disabled. This wasn't intended as a commentary on the quality of the work or the quality of the scanner -- we took the pragmatic perspective of what yielded the most benefit for the least cost (so, a Return on Investment decision).
Changes
These are a few of the changes we made to bring the costs down:
Switch the trigger from push to pull_request
Initially, the team had MegaLinter trigger when developers pushed code up to the repo. This was helpful in keeping the code in compliance with their coding style guidelines; however, it meant that every time anyone did anything, MegaLinter ran. We changed it to only run when a Pull Request is generated to merge branches into
main
ormaster
or when manually initiated.This change was made in the project's
.github/workflows/megalinter.yml
file:This was a huge reduction in the number of times MegaLinter ran, especially when
APPLY_FIXES
was set.We also found that having
APPLY_FIXES
set meant that every time a linter fixed something, the developer would have to pull from the repo after MegaLinter finished up in order to pick up the most recent changes; when they didn't they would receive messages saying that their (local) branch was out of date.Switch to a smaller flavor
The team had been using a flavor that only ran the linters they actually needed run. The full v7.10.0 image is about 3.34 GB while the flavor they had been using included 4 linters that, while relevant to the project, weren't being used. As the project hadn't been using MegaLinter since the start (i.e., it was adopted after several years of development), there were a bunch (!!!) of linter findings that the team had no intention of addressing. The findings were reasonable and correct, but they just weren't relevant to the project in its current state. The ci_light flavor included the tools they wanted to use (e.g., GitLeaks, Grype, Secretlint, Trivy, and TruffleHog, among others). We contemplated using the security flavor (1.05 GB) but opted not to go in that direction as there was no IaC in the repo, so there was no need to run KICS, Checkov, tflint, etc..
As a result, going from the flavor they had been using down to the ci_light flavor cut the size of the image being pulled from 1.45 GB down to 0.49 GB.
This change was made in the project's
.github/workflows/megalinter.yml
file:That is, they eliminated the tools they weren't using and cut the size of the image down by two thirds.
Tell GitLeaks to only scan the current commit
GitLeaks is used to detect secrets (credentials, tokens, API keys, passwords, etc.) stored in files in the repository. Generally speaking -- and this is just my personal opinion -- it's usually not great to store secrets in the source code for an application.
By default, GitLeaks detects whether the stuff being scanned is a Git project (generally a safe assumption given that it was running as a GitHub Action and had a
.git/
directory). As a result, it'll scan the scan the repository and its entire history for secrets.Once we established that there were no secrets in the history of the repository, we made the decision to accept the risk of only having MegaLinter scan the commits it was requested to scan and not the entire history. We judged that the risk was acceptable given that the project was closed-source, only signed commits were accepted, and the
main
branch required approved PRs before other branches could be moved in.This tweak cut down GitLeaks runtime from 50 seconds down to 4 seconds.
To implement this decision, we configured MegaLinter to pass the
--no-git
flag to Git Leaks in the project's.mega-linter.yml
file:Only scan updated files
The team's concern with only scanning updated files was wanting to have security-related tooling to run on all the files all of the time so that as the tooling improved and was able to detect more potentially problematic situations, not just on updated files.
The security-related scanners we were using were generally in the
REPOSITORY_*
group. Scanning the documentation for these linters showed that the ones we were using typically included the following notation:That is, even if
VALIDATE_ALL_CODEBASE
was set tofalse
, the security linters would still run. The team decided that this was acceptable and updated the.mega-linter.yml
file like this:Other tweaks
We made some other tweaks, such as disabling Trivy-SBOM (we weren't building anything that would consume an SBOM), limiting the scope of the formatting linters (jsonlint, v8r, prettier, etc.). However, these changes did not yield a noticeable improvement.
Overall
Does anyone have any thoughts on ways to further improve runtime performance?
Beta Was this translation helpful? Give feedback.
All reactions