Skip to content

Types of Data to Collect

Ben Klein edited this page Nov 13, 2019 · 2 revisions

Lizard provided stats

  • count LOC (no comments)
  • CCN (cyclomatic score)
  • Token count per function
  • Parameter count per function

File based stats

  • last modified
  • size
    • lines
    • bytes
  • encoding
  • mode
    • executable? (does it have a shebang?)
    • symlink?
  • non-text?
    • linker stats
    • compiler stats
    • architecture
  • path
    • project path length
    • filename length
    • casing (snake,camel,etc)
    • extension (and does it match contents?)

Git based stats

  • number of contributors to file
    • per-line ownership of file contents
    • per-token ownership of contents
    • line/token "heat" (frecency of changes)
  • meta files
    • .gitkeep usage
    • lfs objects
    • binary objects
    • other git extensions in use

GitHub based stats

  • number of issues
    • tags
    • number of comments (frecency)
  • contributor heat index
    • commits vs hub activity
    • see github's little diagram of issues,prs,reviews,commits
    • do that per repo per user
  • all the same data as the insights tab could provide
  • relating stargazers/watchers to activity in the repo
Clone this wiki locally