These are directly retrievable from the database. Any query has to start with one of these:
- Projects (each is a repository identified by the user/project pair, most queries start here)
- Commits
- User
- Snapshot (each is the contents of a single file at a specific point in time, the same snapshot can occur in many repo, under many paths)
- Paths (each is a specific file location, eg.
"src/main.c"
, the same path can occur in many repos)
Each entity has a number of attributes. Some attributes are optional, which means that these are missing from data. Each entity has a unique ID that can be depended on for comparisons etc.
Rust type primer:
bool
is Booleanu64
,u32
,usize
are unsigned integersi64
,i32
, are signed integersString
,&str
,Cow<str>
are stringsVec<...>
is a collection/a vector of some typeVec<u8>
is a byte arrayDuration
,Instant
are types related to dates, time and timestamps
Also our objects, for reference:
Language
is an enum that factorizes strings representing languages, full details belowPath
,Snapshot
,Commit
,User
,Project
are the aforementioned entitiesPathId
,SnapshotId
,CommitId
,UserId
,ProjectId
are the IDs of those (wrappedu64
for the curious)
- id (type:
ProjectId
) - url (type:
String
) - is_fork (type:
bool
, optional) - is_archived (type:
bool
, optional) - is_disabled (type:
bool
, optional) - star_count (type:
usize
, optional) - watcher_count (type:
usize
, optional) - size (type:
usize
, optional) - open_issue_count (type:
usize
, optional) - fork_count (type:
usize
, optional) - subscriber_count (type:
usize
, optional) - license (type:
String
, optional) - language (type:
Language
, optional) - description (type:
String
, optional) - homepage (type:
String
, optional) - heads (type:
Vec<(CommitId, String)>
, optional) - head_count (type:
usize
, optional) - commit_ids (type:
Vec<CommitId>
, optional) - commits (type:
Vec<Commit>
, optional) - commit_count (type:
usize
, optional) - author_ids (type:
Vec<UserId>
, optional) - authors (type:
Vec<User>
, optional) - author_count (type:
usize
, optional) - path_ids (type:
Vec<PathId>
, optional) - paths (type:
Vec<Path>
, optional) - path_count (type:
usize
, optional) - snapshot_ids (type:
Vec<SnapshotId>
, optional) - snapshots (type:
Vec<Snapshot>
, optional) - snapshot_count (type:
usize
, optional) - committer_ids (type:
Vec<UserId>
, optional) - committers (type:
Vec<User>
, optional) - committer_count (type:
usize
, optional) - user_ids (type:
Vec<UserId>
, optional) - users (type:
Vec<User>
, optional) - user_count (type:
usize
, optional) - lifetime (type:
Duration
, optional) - has_issues (type:
bool
, optional) - has_downloads (type:
bool
, optional) - has_wiki (type:
bool
, optional) - has_pages (type:
bool
, optional) - created (type:
i64
, optional) - updated (type:
i64
, optional) - pushed (type:
i64
, optional) - default_branch (type:
String
, optional)
- id (type:
UserId
) - email (type:
String
) - authored_commit_ids (type:
Vec<CommitId>
, optional) - authored_commits (type:
Vec<Commit>
, optional) - authored_commit_count (type:
usize
, optional) - committed_commit_ids (type:
Vec<CommitId>
, optional) - committed_commits (type:
Vec<Commit>
, optional) - committed_commit_count (type:
usize
, optional) - committer_experience (type:
Duration
, optional) - author_experience (type:
Duration
, optional) - experience (type:
Duration
, optional)
-
id (type:
CommitId
) -
committer_id (type:
UserId
) -
author_id (type:
UserId
) -
parent_ids (type:
&Vec<CommitId>
) -
parent_count (type:
usize
) -
committer (type:
User
, optional) -
author (type:
User
, optional) -
parents (type:
Vec<Commit>
, optional) -
hash (type:
String
, optional) -
message (type:
String
, optional) -
message_length (type:
usize
, optional) -
author_timestamp (type:
i64
, optional) -
committer_timestamp (type:
i64
, optional) -
change_ids (type:
Vec<(PathId, SnapshotId)>
, optional) -
changed_path_ids (type:
Vec<PathId>
, optional) -
changed_snapshot_ids (type:
Vec<SnapshotId>
, optional) -
changed_paths (type:
Vec<Path>
, optional) -
changed_path_count (type:
usize
, optional) -
changed_snapshots (type:
Vec<Snapshot>
, optional) -
changed_snapshot_count (type:
usize
, optional)
- id (type:
SnapshotId
) - raw_contents (type:
&Vec<u8>
) - contents (type:
Cow<str>
)
- id (type:
PathId(u64)
) - **new(id: PathId, location: String) -> Self { Path { id, location } }
- location (type:
String
) - language (type:
Option<Language>
, yieldsNone
if language is not recognized (see below for details))
We recognize the following languages:
- C
- Cpp
- ObjectiveC
- Go
- Java
- CoffeeScript
- JavaScript
- TypeScript
- Ruby
- Rust
- PHP
- Python
- Perl
- Clojure
- Erlang
- Haskell
- Scala
If a language is not on that list it will be listed as Other.
Languages are translated from file extensions as follows. In case of C and C++, it's worth noting how headers are treated.
"c" | "h" => Some(Language::C),
"C" | "cc" | "cpp" | "cxx" | "c++" | "hpp" => Some(Language::Cpp),
"m" | "mm" | "M" => Some(Language::ObjectiveC),
"go" => Some(Language::Go),
"java" => Some(Language::Java),
"coffee" | "litcoffee" => Some(Language::CoffeeScript),
"js" | "mjs" => Some(Language::JavaScript),
"ts" | "tsx" => Some(Language::TypeScript),
"rb" => Some(Language::Ruby),
"rs" => Some(Language::Rust),
"py" | "pyi" | "pyc" | "pyd" | "pyo" | "pyw" | "pyz" => Some(Language::Python),
"plx" | "pl" | "pm" | "xs" | "t" | "pod" => Some(Language::Perl),
"clj" | "cljs" | "cljc" | "edn" => Some(Language::Clojure),
"erl" | "hrl" => Some(Language::Erlang),
"hs" | "lhs" => Some(Language::Haskell),
"scala" | "sc" => Some(Language::Scala),
"php" | "phtml" | "php3" | "php4" | "php5" |
"php7" | "phps" | "php-s" | "pht" | "phar" => Some(Language::PHP),
_ => None,