Skip to content

Use the GitHub-API to update the list of tools and extensions related to TaskWarrior. It is displayed on the web site.

Notifications You must be signed in to change notification settings

BrunoVernay/tw-tools-update

Repository files navigation

tw-tools-update

Description

Use the GitHub-API to update the list of tools and extensions related to TaskWarrior. It will be displayed on the web site: http://taskwarrior.org/tools/

This is linked to the project of future Tool page: http://brunovernay.github.io/taskwarrior-site-test/

The idea is to use the GitHub-API to search project related to TaskWarrior and update the list of tools displayed on TaskWarrior site from this list.

The project started in Java, but I created a Python branch, as it is more idiomatic to the TaskWarrior community. It should be compatible with Python v2 & v3 (http://pythonclock.org/).

I use https://github.com/PyGithub/PyGithub , there are many Python projects addressing GitHub, even a book Mining the Social Web .

Usage

  • cp Config.py.example Config.py and edit Config.py with your GitHub token
  • old tool list is in data-tools-old.json
  • python3 Main.py > log-$(date -Iminutes).txt (takes about 5 min)
  • New data is in data-tools.json

Usage

python3 Main.py > log-$(date -Iminutes).txt

Usage

python3 Main.py > log-$(date -Iminutes).txt

Status

  • It works
  • We still have to set the category manually
  • There is no API yet to get the license (GitHub is working on it)
  • You have to enter your GitHub token given the number of required requests. (https://github.com/settings/tokens)
  • It only covers GitHub projects currently (BitBucket maybe one day ...)
  • We might apply a diff after the update, to keep manual changes

Note:

  • the text description is pure text, no HTML.
  • There are duplicated names, I use the url_src as a unique identifier. But some project changed URL, for example xtw changed its login name, so the url is different. I output a warning and create a duplicate

The mapping:

  • category: manual
  • name name
  • description description
  • url homepage
  • url_src html_url
  • license ???
  • language language (will get only the primary language, have to request languages_url to know more)
  • author owner/login (+ collaborators, contributors, teams ...) We have to make multiple request to get the real name instead of the Login.
  • theme best guess from description
  • verified today
  • last_update updated_at (pushed_at would be more conservative, but would miss commits in non-master branches)

Automatic classification

I get all the "Readme" in order to perform some Machine Learning. The first idea would be to classify by category. The Python library seems to be SciKit. There is a more active NLTK library, but since I only need simple text feature extraction and no complex Natural Language processing, I will stick to SciKit. Some ref:

About

Use the GitHub-API to update the list of tools and extensions related to TaskWarrior. It is displayed on the web site.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages