Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting the number of CompatHelper installations #56

Open
DilumAluthge opened this issue May 11, 2021 · 3 comments
Open

Counting the number of CompatHelper installations #56

DilumAluthge opened this issue May 11, 2021 · 3 comments

Comments

@DilumAluthge
Copy link
Member

DilumAluthge commented May 11, 2021

I'm interested in answering the following questions for the packages in the General registry.

For the purpose of this analysis, let us define package X has CompatHelper installed as on the default branch of the Git repo for package X, there is a file named .github/workflows/CompatHelper.yml. (I'm open to a better definition.)

  1. How many packages in the General registry have CompatHelper installed?
  2. How many unique individual user accounts (i.e. not organization accounts) have CompatHelper installed on one or more of their packages?
  3. How many unique organization accounts (i.e. not individual user accounts) have CompatHelper installed on one or more of their packages?

Would it be possible for me to use PackageAnalyzer.jl to help me answer these questions?

@giordano
Copy link
Member

giordano commented May 11, 2021

We ignore it:

# Exclude TagBot and CompatHelper
filter(f -> lowercase(f) ("compathelper.yml", "tagbot.yml"), files)

😛

@ericphanson
Copy link
Member

ericphanson commented May 12, 2021

It would be pretty easy to modify PackageAnalyzer to answer (1). All the "analysis" takes places in

"""
analyze_path(dir::AbstractString; repo = "", reachable=true, subdir="", auth::GitHub.Authorization=github_auth(), sleep=0) -> Package
Analyze the package whose source code is located at the local path `dir`. If
the package's repository is hosted on GitHub and `auth` is a non-anonymous
GitHub authentication, wait for `sleep` seconds before collecting the list of
its contributors.
"""
function analyze_path(dir::AbstractString; repo = "", reachable=true, subdir="", auth::GitHub.Authorization=github_auth(), sleep=0)
# we will look for docs, tests, license, and count lines of code
# in the `pkgdir`; we will look for CI in the `dir`.
pkgdir = joinpath(dir, subdir)
name, uuid, licenses_in_project = parse_project(pkgdir)
docs = isfile(joinpath(pkgdir, "docs", "make.jl")) || isfile(joinpath(pkgdir, "doc", "make.jl"))
runtests = isfile(joinpath(pkgdir, "test", "runtests.jl"))
travis = isfile(joinpath(dir, ".travis.yml"))
appveyor = isfile(joinpath(dir, "appveyor.yml"))
cirrus = isfile(joinpath(dir, ".cirrus.yml"))
circle = isfile(joinpath(dir, ".circleci", "config.yml"))
drone = isfile(joinpath(dir, ".drone.yml"))
azure_pipelines = isfile(joinpath(dir, "azure-pipelines.yml"))
buildkite = isfile(joinpath(dir, ".buildkite", "pipeline.yml"))
gitlab_pipeline = isfile(joinpath(dir, ".gitlab-ci.yml"))
github_workflows = joinpath(dir, ".github", "workflows")
if isdir(github_workflows)
# Find all workflows
files = readdir(github_workflows)
# Exclude TagBot and CompatHelper
filter(f -> lowercase(f) ("compathelper.yml", "tagbot.yml"), files)
# Assume all other files are GitHub Actions for CI. May not
# _always_ be the case, but it's a good first-order approximation.
github_actions = length(files) > 0
else
github_actions = false
end
license_files = find_licenses(dir)
if isdir(pkgdir)
if !isempty(subdir)
# Look for licenses at top-level and in the subdirectory
subdir_licenses_files = [(; license_filename = joinpath(subdir, row.license_filename), row.licenses_found, row.license_file_percent_covered) for row in find_licenses(pkgdir)]
license_files = [subdir_licenses_files; license_files]
end
lines_of_code = count_loc(pkgdir)
else
license_files = LicenseTableEltype[]
lines_of_code = LoCTableEltype[]
end
# If the repository is on GitHub and we have a non-anonymous GitHub
# authentication, get the list of contributors
contributors = if !(auth isa GitHub.AnonymousAuth) && occursin("github.com", repo)
Base.sleep(sleep)
repo_name = replace(replace(repo, r"^https://github\.com/" => ""), r"\.git$" => "")
contribution_table(repo_name; auth)
else
ContributionTableElType[]
end
Package(name, uuid, repo; subdir, reachable, docs, runtests, travis, appveyor, cirrus,
circle, drone, buildkite, azure_pipelines, gitlab_pipeline, github_actions,
license_files, licenses_in_project, lines_of_code, contributors)
end
. At that point of the pipeline, we've cloned the package to a local directory and are populating a Package struct with information gained by inspecting the repo. So we could add a compat_helper field to the struct and check if the workflow exists in that function, either by direct name or actually looping through workflow file contents and looking for the string "CompatHelper" or such. (I'd be in favor of such an addition since I think it's useful to know!)

For (2) and (3), I think that might be doable by combining the results of (1) with queries to the github api to check if an account is an org or not. I think that step might be outside the purview of PackageAnalyzer itself though.

@giordano
Copy link
Member

I think that in the future we may want to collect all filenames in .github/workflows, and exclude compathelper.yml, tagbot.yml only as part of data munging analysis, much like what we do now for the contributors (we were initially filtering out the bots, including @staticfloat), but honestly I'd like to avoid changing the data structure before JuliaCon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants