Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most popular dependencies of projects in each category #87

Open
andrew opened this issue Dec 6, 2023 · 17 comments
Open

Most popular dependencies of projects in each category #87

andrew opened this issue Dec 6, 2023 · 17 comments
Labels
enhancement New feature or request

Comments

@andrew
Copy link
Member

andrew commented Dec 6, 2023

For each category, get a list of direct dependencies for each project and group them up to show the top 50 most used dependencies.

We may need to filter out some very popular dependencies that show up for every category, potentially making a top 20 overall that can include them instead.

@andrew andrew added the enhancement New feature or request label Dec 6, 2023
@Ly0n
Copy link

Ly0n commented Dec 6, 2023

To make the whole analysis reproducible and more standardized, it would be great to create the list of projects we filter for in an automated way. Would it perhaps be possible to create a Top50 of dependencies for the different programming languages, which we can then use as a filter for the different categories? This would give us the top 50 dependencies that are unique for this category.

andrew added a commit that referenced this issue Dec 7, 2023
@andrew
Copy link
Member Author

andrew commented Dec 7, 2023

Deployed a very experimental version: https://ost.ecosyste.ms/projects/dependencies

It take the top 50 direct dependencies across all reviewed projects (listed in the first section) and excludes them from each of the categories, which have a list of the top 50 dependencies used by at least two different projects within a category.

Currently not grouping by ecosystem/language but can investigate that as an alternative filtering method.

If this seems reasonable I can start pulling in more info about each package, link, description etc.

@andrew
Copy link
Member Author

andrew commented Dec 7, 2023

I also added a sidebar with breakdowns of which package manager ecosystems are most used across each category, unsurprisingly, python and R are very popular, but also github actions and docker both very highly used for infrastructure.

@andrew
Copy link
Member Author

andrew commented Dec 7, 2023

Infact, I would be tempted to exclude actions, docker and homebrew as they are very low level dependencies and highly unlikely to be specific to sustainability and climate change projects.

edit: I've made this change now, easy to add or remove more here: https://github.com/ecosyste-ms/ost/blob/main/app/models/project.rb#L428

@Ly0n
Copy link

Ly0n commented Dec 8, 2023

Unfortunately, the data shows what I already had a feeling: the different Ecosystems domains don't have strong domain-specific modules to rely on besides rasterio, terra, pvlib and the packages to work with NetCFD and xarray. There is simply not enough collaboration between the different projects. They do not depend on each other and do not build larger projects from individual developments.

It also makes no sense to add these projects to ClimateTriage as most projects are not domain specific.

What might still make sense is to count how often open source projects listed on OST have been used as dependencies. In contrast, however, it makes more sense to ramk all projects directly by the number of downloads to get a better understanding of the reuse of the projects. Is that hard to do? We could always write a blog post about the whole approach with download counts, then at least we'd have something like a chart that people would want to look at.

@andrew
Copy link
Member Author

andrew commented Dec 8, 2023

@Ly0n here's a page detailing all the projects that have been detected as being published on package manager, sorted by most downloads: https://ost.ecosyste.ms/projects/packages

note: not every package manager currently has downloads support

@Ly0n
Copy link

Ly0n commented Dec 8, 2023

This is awesome data. People will love this! I will check this out more in detail at the weekend. Loading this data into a Python Panda Frame should not be that hard!

@Ly0n
Copy link

Ly0n commented Dec 8, 2023

At least for Julia you should get the Download numbers ;)
https://discourse.julialang.org/t/julia-downloads-stats-and-julia-downloads-badges/74712

@andrew
Copy link
Member Author

andrew commented Dec 8, 2023

@Ly0n would you like a json version of the page?

@Ly0n
Copy link

Ly0n commented Dec 8, 2023

Yes, I just wanted to ask for this ;).

@Ly0n
Copy link

Ly0n commented Dec 8, 2023

I think it would be extremly useful for Ecosyste.ms to create a Python package that automatically loads data from a JSON interface into a Panda frame. This would be very handy for a lot of developers and data scientists.

@andrew
Copy link
Member Author

andrew commented Dec 8, 2023

A quick and dirty json api: https://ost.ecosyste.ms/api/v1/projects/packages

It lists all projects that have one or more packages, sorted by total downloads, some of the summed values in the html page aren't their as individual fields, but can be calculated using the raw data.

@andrew
Copy link
Member Author

andrew commented Dec 8, 2023

At least for Julia you should get the Download numbers ;) https://discourse.julialang.org/t/julia-downloads-stats-and-julia-downloads-badges/74712

There should be download stats for julia packages, must be a bug in my code.

@andrew
Copy link
Member Author

andrew commented Dec 9, 2023

Also relevant to this thread ecosyste-ms/packages#366

@Ly0n
Copy link

Ly0n commented Dec 10, 2023

Here the first analystics: https://github.com/protontypes/ost-ecosystems-analytic
For some projects the download numbers are None in the API but not in the frontend. Here one example: https://ost.ecosyste.ms/projects/19933

@andrew
Copy link
Member Author

andrew commented Dec 10, 2023

For some projects the download numbers are None in the API but not in the frontend. Here one example: https://ost.ecosyste.ms/projects/19933

that’s because you are only taking downloads from the first package of the project, you need to sum the total of all packages for each of project

@Ly0n
Copy link

Ly0n commented Dec 11, 2023

Got it! I'll fix that later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants