There are many developers on Github, following influential developers is highly beneficial because they usually spread promising repositories. You might agree the influential developers have the powers to promote repositories on Github by starring, their followers may star successively. This survey employed the well-known PageRank algorithm, the data of watching events from the GitHub Archive and users' following relationships from the Github API to mine the most influential developers on Github.
The result is based on limited data (2014/1/1 ~ 2014/8/26) and not on behalf of Github. The rank might be changed in case the collected data increased.
- Top 25 Influential Developers in General
- Top 25 Influential Developers in Python
- Top 25 Influential Developers in JavaScript
- Top 25 Influential Developers in Go
- Top 25 Influential Developers in Ruby
- Top 25 Influential Developers in PHP
- Top 25 Influential Developers in Perl
- Top 25 Influential Developers in CSS
- Top 25 Influential Developers in C
- Top 25 Influential Developers in C++
- Top 25 Influential Developers in Java
- Top 25 Influential Developers in C#
- Top 25 Influential Developers in Objective-C
- Top 25 Influential Developers in Swift
- Top 25 Influential Developers in Haskell
- Top 25 Influential Developers in Scala
- Top 25 Influential Developers in Clojure
- Top 25 Influential Developers in Erlang
The watching events data were collected from the GitHub Archive from 2014/1/1 to 2014/8/26, the repository's name, the actor's name and the event issued time were extracted respectively. The users' following relationships were collected from the Github API.
To collect the data, issuing python task_grab_watch_events
. Please make sure the MongoDB has already started, this task will create a database named github
.
Since the task consumes the Github API, please add robots' login names and passwords respectively in the config.py
under the same directory.
To build graphs, please make sure the watch events have already collected to MongoDB and issue python task_gen_events_graphs
.
Every repository's watching event can be represented a 3-tuple vertex likes (event's created time, repository's name, actor's name), each vertex has directed edges with its following users' watching events formed vertices who are also stargazers of the repository but prior to the user, in the other words, a graph represents the cascade of a repository's watching events. The whole Github's repositories' watching events form many graphs.
In addition, the owner of the repository also has edges from the followers who starred the repository to capture the influence of open source.
Suppose the actor has less possibility to influence followers by time, to diminish the influence by time, the edges are weighted by a Fibonacci function, 1.0 / fib(interval + 2)
, the fib
is the Fibonacci series from 0 and the unit of interval is a day. Longer the events' interval, lesser the connection is between events.
Issue python task_cal_pagerank
then python task_cal_influence
.
We can score the influence among users by PageRank since the cascade of watching events can be represented as a directed graph, and so forth we can get the influence of a user by combining scores which are the user got from involved graphs. To reduce noise, the score equals the unit 1
were removed before combining.
PageRank is a link analysis algorithm and it assigns a numerical weight to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. In this survey, the elements are of the watching events and the links are of the following relationship among actors.
Since the original PageRank is specific to a single graph, we have to find a way to combine PageRanks from multiple graphs, that is, the PageRank have to be normalized. The PageRank can be normalized by dividing the original PageRank by the least PageRank. There is a gentle introduction to the Normalized PageRank.
Besides of the ranking in general, we can consider ranking by language since Github API can provide the metadata which includes the language of repository, and then we can only display the PageRank algorithm on selected repositories which are of the same language. However, the result might not very make sense due to the naive classification of languages.
To gain better performance, using python task_gen_events_graphs-cal_pagerank-cal_influence.py
for integrating the processes from task_gen_events_graphs
to task_cal_influence
.
Our goal is measuring the total influence of the star, and the maximum of a user's direct influencing stars is (starred + repos) * followers
where the repos
stands for the user's number of public repository, so the products can help to analyze the performance of our method. To make the histogram more readable, the products are square root of product indeed. To get this histogram, issuing python task_draw_histogram General
after calculating the influence.
According to the histogram, the gradient of the products is falling, the PageRank method works!
Evolving graph animation captures the time series of watching events and their connections, we can then analyze the compactness of a repository's community by observing the forming clusters from animation. The animation was made of one frame per hour of the timeline, collapsing the gap of no event. To make the animation, issuing python task_draw_graphs {repository's full name}
.
The popular repository josephmisiti/awesome-machine-learning was created at 2014-07-15T19:11:19Z
, so the animation can cover its growth. The clusters in the graph might be communities, we can find that there is a main cluster in the center, growing up with the passage of time. There are some frames that most parts of the graph grew up simultaneously perhaps from spread outside the Github.
The sebyddd/YouAreAwesome was found because of its strange presentation. It was created at 2014-08-18T18:50:57Z
with an accompanying post How to get #1 trending on GitHub or ”GitHub’s security flaws”, according to the post, the stargazers were fabrications and was bursting at the time. We can observe some clues from the animation: the animation is much shorter than josephmisiti/awesome-machine-learning beacuse of the burst, and it lacks scattered clusters due to the stargazers were fabrications without natural connections.
The Venn diagram of top 25 in C++ and Javascript has an order 6 intersection because many repositories for NodeJS were created by C++, one can review the developers in the intersection to realize the fact. The intersection contains mcollina, jeresig, sindresorhus, hughsk, andrew and visionmedia.
- Python 2.7
- MongoDB 2.6
- PyMongo 2.7
- PyGithub 1.25
- graph-tool 2.2
- matplotlib 1.4
- matplotlib-venn
- Gevent
- urlgrabber
- lxml
- underscore.py
- funcy
- more-itertools
- arrow
- visionmedia
- sindresorhus
- mattt
- steipete
- daimajia
- andrew
- JakeWharton
- Trinea
- substack
- onevcat
- lexrus
- stormzhang
- turingou
- myell0w
- youxiachai
- addyosmani
- igrigorik
- jeresig
- MatthewMueller
- ManuelPeinado
- juliangruber
- mattn
- azu
- romaonthego
- xhzengAIB
- kennethreitz
- mitsuhiko
- rochacbruno
- avelino
- jezdez
- lepture
- visionmedia
- pydanny
- saghul
- vinta
- clowwindy
- dahlia
- fengmk2
- tangqiaoboy
- jd
- numbbbbb
- osantana
- ionelmc
- jefftriplett
- tonyseek
- Zulko
- reduxionist
- turingou
- ellisonleao
- dcramer
- visionmedia
- sindresorhus
- substack
- turingou
- andrew
- MatthewMueller
- juliangruber
- jeresig
- addyosmani
- maxogden
- paulirish
- studiomohawk
- azu
- cheeaun
- feross
- mathiasbynens
- mafintosh
- TooTallNate
- yyx990803
- mcollina
- fengmk2
- hughsk
- ianstormtaylor
- igrigorik
- hakimel
- visionmedia
- mattn
- dgryski
- Unknwon
- codegangsta
- rakyll
- daaku
- igrigorik
- bradfitz
- c4milo
- mitchellh
- astaxie
- lunny
- spf13
- mreiferson
- andrew
- philips
- crosbymichael
- fatih
- samuel
- codahale
- pengwynn
- michaelhood
- armon
- takuan-osho
- ankane
- mattt
- andrew
- JuanitoFatas
- igrigorik
- goshakkk
- hsbt
- amatsuda
- chloerei
- josh
- flyerhzm
- huacnlee
- fgrehm
- futoase
- defunkt
- rkh
- parkr
- joker1007
- ryanb
- pengwynn
- mitchellh
- kenn
- r7kamura
- maccman
- takkanm
- GrahamCampbell
- sebastianbergmann
- fabpot
- Ocramius
- vojtech-dobes
- msurguy
- barryvdh
- JeffreyWay
- philsturgeon
- nikic
- laracasts
- lsmith77
- panique
- phalcon
- Zauberfisch
- taylorotwell
- dg
- igorw
- pminnieur
- harikt
- cfoellmann
- Ph3nol
- pippinsplugins
- jasonlewis
- Anahkiasen
- tokuhirom
- kraih
- miyagawa
- moznion
- DHowett
- visionmedia
- turingou
- agentzh
- lulzlabs
- kazeburo
- ingydotnet
- skx
- brendangregg
- gugod
- jberger
- sjackman
- oetiker
- rjbs
- pjf
- naoya
- goccy
- hirose31
- mattn
- wireghoul
- jonreid
- mdo
- sindresorhus
- addyosmani
- andrew
- zenorocha
- mrmrs
- visionmedia
- sahat
- turingou
- jxnblk
- gabrielecirulli
- jeresig
- umaar
- daneden
- csswizardry
- mreiferson
- sofish
- youxiachai
- necolas
- daimajia
- vitorbritto
- studiomohawk
- goshakkk
- joewalnes
- cheeaun
- torvalds
- cloudwu
- visionmedia
- mattn
- antirez
- julycoding
- igrigorik
- jwerle
- c9s
- steipete
- laruence
- andrew
- huangz1990
- phalcon
- clowwindy
- cloudhead
- mattt
- saghul
- r-lyeh
- winocm
- tmm1
- orangeduck
- Constellation
- pengwynn
- Trinea
- jeresig
- rogerwang
- r-lyeh
- osteele
- zcbenz
- jwerle
- sindresorhus
- fabpot
- satoruhiga
- andrew
- ideawu
- BYVoid
- hughsk
- vczh
- hij1nx
- chenshuo
- visionmedia
- kylemcdonald
- patriciogonzalezvivo
- youxiachai
- mcollina
- creationix
- eugeneware
- JacksonTian
- indutny
- daimajia
- JakeWharton
- Trinea
- stormzhang
- ManuelPeinado
- jgilfelt
- dodola
- jpardogo
- youxiachai
- kyze8439690
- chrisbanes
- mcxiaoke
- soarcn
- flavienlaurent
- baoyongzhang
- sd6352051
- snowdream
- castorflex
- hotchemi
- romannurik
- pedrovgs
- vbauer
- nostra13
- RomainPiel
- johnkil
- shanselman
- tugberkugurlu
- jamesmontemagno
- paulcbetts
- prime31
- madskristensen
- robconery
- filipw
- keijiro
- leekelleher
- Haacked
- pierceboggan
- Cheesebaron
- migueldeicaza
- ayende
- davidfowl
- mythz
- yreynhout
- Rohansi
- Chandu
- adamralph
- neuecc
- punker76
- UnityPatterns
- daimajia
- steipete
- mattt
- myell0w
- onevcat
- lexrus
- xhzengAIB
- romaonthego
- jessesquires
- iiiyu
- krzysztofzablocki
- jamztang
- supermarin
- nicklockwood
- soffes
- neonichu
- cyndibaby905
- 0xced
- indragiek
- EvgenyKarkan
- chroman
- mps
- nst
- tangqiaoboy
- andreamazz
- jpsim
- mattt
- lexrus
- onevcat
- iiiyu
- soffes
- robb
- romaonthego
- krzysztofzablocki
- jspahrsummers
- indragiek
- AshFurrow
- neonichu
- tangqiaoboy
- hollance
- chroman
- qiaoxueshi
- myell0w
- rnystrom
- JacksonTian
- fastred
- jessesquires
- jakemarsh
- andreamazz
- jpsim
- youxiachai
- sdiehl
- ekmett
- puffnfresh
- bos
- bitemyapp
- jfischoff
- cartazio
- cloudhead
- darinmorrison
- CodeBlock
- rehno-lindeque
- chrisdone
- jgm
- egonSchiele
- ocharles
- TimothyKlim
- rockymadden
- Gabriel439
- feuerbach
- vincenthz
- copumpkin
- adinapoli
- maxpow4h
- jonsterling
- Heather
- ryanlecompte
- xuwei-k
- jboner
- lihaoyi
- softprops
- paulp
- non
- ktoso
- krasserm
- puffnfresh
- takezoe
- milessabin
- mateiz
- rockymadden
- dlwh
- TimothyKlim
- mandubian
- tototoshi
- mrdoob
- ornicar
- hexx
- rxin
- jamieowen
- jsuereth
- pathikrit