Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial list of top dependencies #2440

Closed
ryscheng-mobile opened this issue Oct 31, 2024 · 5 comments
Closed

Initial list of top dependencies #2440

ryscheng-mobile opened this issue Oct 31, 2024 · 5 comments
Assignees
Labels
P0 Highest priority issue

Comments

@ryscheng-mobile
Copy link
Contributor

What is it?

Jonas is asking for a preliminary list of top packages in the Optimism collective.

  • Basically start with all repos in the Optimism collection.
  • Crawl dependencies
  • Sort list of dependencies but number of direct dependents.

Just looking to sanity check the initial dataset we have, this can just be a CSV dump.

@JSeiferth
Copy link

The motivation for this is to get an initial sense of what the data looks like and possible complexity we need to tackle

@ccerv1 ccerv1 self-assigned this Nov 5, 2024
@ccerv1 ccerv1 added the P0 Highest priority issue label Nov 5, 2024
@ccerv1 ccerv1 moved this from Backlog to Up Next in OSO Nov 5, 2024
@ccerv1
Copy link
Member

ccerv1 commented Nov 5, 2024

OK, I think this is looking pretty solid.

Here is a sheet of top dependencies with some basic filters applied.

Here is the notebook used to generated the analysis. It includes some charts that I couldn't copy over here due to slow internet (✈ 🤕 )

Quick description of the methodology:

  1. We start with the SBOM (Software Bill of Materials) for every project on OSO (2000+ projects). More than 80K dependencies are captured this way.

  2. We drill down on the projects that are part of one or more of our OP collections (anything in a past RF round, as well as many other onchain projects and grant recipients). Total of 638 projects have at least one dependency. NPM and Rust are the clear favorites, with a little bit of Go and Python (PIP) still. Now we are down to around 50K dependencies.

  3. Then, we can look specifically at the onchain projects. This gets us to 348 projects, of which NPM is still by far the most popular, with Rust a distant second. This is similar to what Faina predicted. This still leaves about 40K dependencies.

  4. Finally, I added some simple filters to catch some of the common web2 packages that are not really relevant. We can refine this if necessary, but the end result is around ~8K packages with at least 3 dependents. If you check out the notebook, there's a scatterplot graph at the bottom showing the filtering technique. The projects that are above the line and farther to right are effectively "popular dependencies for onchain projects". For example, ethers is in 90% of onchain projects and 79% of all OP projects, showing that it is hugely popular and more popular with onchain projects than other types of projects. Meanwhile, ipfs-pubsub-peer-monitor is only in 3 projects, so much more niche.

@ccerv1
Copy link
Member

ccerv1 commented Nov 5, 2024

See also #2364

@ccerv1 ccerv1 moved this from Up Next to Needs Review in OSO Nov 5, 2024
@JSeiferth
Copy link

JSeiferth commented Nov 6, 2024

@ccerv1 this is great! Some things I'm curious about

  1. Dedup submodules: Ethers has many of these? How might we dedup them?
  2. Weighting by time: When was the repo created that used the package? How recently has it been contributed to in meaningful ways? I'm trying to understand if we see clear trends of "up and coming" packages.
  3. Weight by Gas fees: Which packages are most popular with the top Gas generators?
  4. Compare to GH Stars: Use a "trusted" GH star model from open rank to look at the GH repos relating to the NPM packages. Do we see any stark differences? Could this be valuable in understanding "utility function"?

Some general callouts:

  1. Surprised to see Viem/Wagmi so low on the list
  2. Surprised to see OpenZeppelin so low on the list
  3. We can probably use the ranking of dependencies from Retro Funding 3 as our first "qualitative expert input" to compare against.
  4. OpenZeppelin has 10x the github stars than some of the most popular libraries

What's the path forward? I'd love to move fast in understanding what we can get out of dependency data.

@ccerv1
Copy link
Member

ccerv1 commented Nov 18, 2024

Update @JSeiferth

I have joined the initial dependency list on projects that we already have in OSO in v1 here.

In this version, we can see the popularity of a variety of OpenZeppelin packages:

Image

We can also see the OpenZeppelin contracts is near the top of the list in terms of onchain projects (ignore some of the web2 libraries like babel).

Image

I'm going to close out this exploratory work and create a new issue that tracks some of the experimental metrics for dev tooling.

cc @Jabolol here was my mapping script for the SBOM <-> repo <-> project logic (I did this with lots of distractions (and flaky internet) last week but shows the basics of how to connect the data.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Highest priority issue
Projects
Status: Done
Development

No branches or pull requests

3 participants