-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most popular dependencies of projects in each category #87
Comments
To make the whole analysis reproducible and more standardized, it would be great to create the list of projects we filter for in an automated way. Would it perhaps be possible to create a Top50 of dependencies for the different programming languages, which we can then use as a filter for the different categories? This would give us the top 50 dependencies that are unique for this category. |
Deployed a very experimental version: https://ost.ecosyste.ms/projects/dependencies It take the top 50 direct dependencies across all reviewed projects (listed in the first section) and excludes them from each of the categories, which have a list of the top 50 dependencies used by at least two different projects within a category. Currently not grouping by ecosystem/language but can investigate that as an alternative filtering method. If this seems reasonable I can start pulling in more info about each package, link, description etc. |
I also added a sidebar with breakdowns of which package manager ecosystems are most used across each category, unsurprisingly, python and R are very popular, but also github actions and docker both very highly used for infrastructure. |
Infact, I would be tempted to exclude edit: I've made this change now, easy to add or remove more here: https://github.com/ecosyste-ms/ost/blob/main/app/models/project.rb#L428 |
Unfortunately, the data shows what I already had a feeling: the different Ecosystems domains don't have strong domain-specific modules to rely on besides rasterio, terra, pvlib and the packages to work with NetCFD and xarray. There is simply not enough collaboration between the different projects. They do not depend on each other and do not build larger projects from individual developments. It also makes no sense to add these projects to ClimateTriage as most projects are not domain specific. What might still make sense is to count how often open source projects listed on OST have been used as dependencies. In contrast, however, it makes more sense to ramk all projects directly by the number of downloads to get a better understanding of the reuse of the projects. Is that hard to do? We could always write a blog post about the whole approach with download counts, then at least we'd have something like a chart that people would want to look at. |
@Ly0n here's a page detailing all the projects that have been detected as being published on package manager, sorted by most downloads: https://ost.ecosyste.ms/projects/packages note: not every package manager currently has downloads support |
This is awesome data. People will love this! I will check this out more in detail at the weekend. Loading this data into a Python Panda Frame should not be that hard! |
At least for Julia you should get the Download numbers ;) |
@Ly0n would you like a json version of the page? |
Yes, I just wanted to ask for this ;). |
I think it would be extremly useful for Ecosyste.ms to create a Python package that automatically loads data from a JSON interface into a Panda frame. This would be very handy for a lot of developers and data scientists. |
A quick and dirty json api: https://ost.ecosyste.ms/api/v1/projects/packages It lists all projects that have one or more packages, sorted by total downloads, some of the summed values in the html page aren't their as individual fields, but can be calculated using the raw data. |
There should be download stats for julia packages, must be a bug in my code. |
Also relevant to this thread ecosyste-ms/packages#366 |
Here the first analystics: https://github.com/protontypes/ost-ecosystems-analytic |
that’s because you are only taking downloads from the first package of the project, you need to sum the total of all packages for each of project |
Got it! I'll fix that later. |
For each category, get a list of direct dependencies for each project and group them up to show the top 50 most used dependencies.
We may need to filter out some very popular dependencies that show up for every category, potentially making a top 20 overall that can include them instead.
The text was updated successfully, but these errors were encountered: