-
Notifications
You must be signed in to change notification settings - Fork 39
Git
For users not familiar with git, we provide some references and a cheat sheet that will cover the most common use cases:
Git is a distributed VCS (version control system), meaning that it has a very flexible approach to collaborating with one, many or even no other parties (remotes). It is conceptually very different from the likes of CVS or SVN. Please forget everything that you know about SVN and carefully read the following table to understand the key concepts in git.
Concept | Meaning |
---|---|
git specific lang | |
work-tree | the files actually present in your filesystem (at
the top level where also the .git folder is) |
repository | storage of objects and references to objects
(stored in .git/ folder) |
object reference | SHA-1 hash of the object payload |
object | most important: blob, tree, commit |
object: blob | payload: contents of a file |
object: tree | payload: list of names, plus references to the corresponding files and subtrees |
object: commit | payload: message, author, date, reference to tree, reference to parent(s), other metadata |
history | directed acyclic graph (DAG) of all ancestor commits of a commit |
branch | reference to commit |
index (staging area) | virtual tree that will become the tree of the next commit |
stash | temporary storage for changes, can be stacked |
remote | a related repository located elsewhere on the network or filesystem |
General terms | |
merge | joining multiple branches |
(un-)tracked | whether a file is registered for version control |
checkout | instance of the tracked files in the working tree (created/updated from the object repository) |
clone | full copy of a git repository |
fork | derived project, with different authors or development goals |
At this point, take a few minutes and think about the following questions for fun and to tighten your understanding:
- what is a repository and how does it differ from a working-tree?
- will all changes in the work-tree automatically be part of the next commit?
- how expensive is it to create a branch?
- do all clones of a repository have equal rights or is there an upstream repository that plays a special role?
- is it possible to have remote repositories that have no commits in common?
- is it possible that a commit has an earlier date field than its parent?
- why should the history be a DAG?
- how is this property enforced on a technical level, i.e. could you manually construct a circular graph of commits?
- is a commit represented internally as a snapshot of the files or as a diff?
- if a file with the same data is present in two commits, is it stored twice on disk?
You can check your understanding by referring to the section about Git internals.
Rather than enforcing one opinionated work-flow, git has a lot of commands that can be used in a very modular fashion. While this may be overwhelming at first, you will find that most of it is quite natural to learn. We list the most common ones here.
Note: Most git commands are safe, i.e. revertible. This means that they will either gracefully handle any uncommitted changes or refuse to execute – depending on the command and whether there will be conflicts between the local changes and the performed action.
Command | Purpose |
---|---|
Basic commands for working with a single repository: | |
init |
create empty repository (.git/ ) in current
directory |
add |
copy changes: work-tree → index |
rm |
remove file from index + work-tree |
mv |
move file (index + work-tree) |
commit |
create commit: index → commit |
branch |
manage branches (create, delete, rename, move) |
tag |
manage tags (releases) |
merge |
merge two or more branches: branch+ → commit |
revert |
create a commit to cancel the diff of a previous commit (safe) |
config |
change settings |
Informational commands: | |
help |
get help about a topic or command |
status |
show general status |
log |
show history |
diff |
show diff between work-tree, index, commits |
show |
show info about a commit |
blame |
show who commited what, when |
bisect |
find the commit that introduced a bug |
Commands that can be dangerous: | |
clean |
Remove untracked files, etc |
reset |
usually: clear index, move current branch pointer (safe) OR: |
reset --hard |
also: overwrite local changes (DANGEROUS) |
checkout COMMIT |
move HEAD and checkout files (safe!) OR: |
checkout BRANCH |
also: switch branches (safe!) OR: |
checkout -- FILE |
also: overwrite local changes (DANGEROUS) |
stash |
save local changes + reset --hard
|
stash drop |
delete an item from the stash (DANGEROUS) |
Working with remotes (network access): | |
fetch |
download data: remote → .git/
|
push |
upload data: .git/ → remote |
pull |
fetch + merge |
clone |
init + remote add + fetch + checkout |
Interact with foreign repositories: | |
remote |
manage remotes (add, rm, rename, set-url) |
submodule |
manage submodules (add, rm, update) |
subtree |
manage subtrees |
Editting history: | |
commit --amend |
index → previous commit |
cherry-pick |
apply existing commit at current location |
rebase |
rewrite history (reorder, modify, merge, insert or drop commits) |
filter-branch |
apply changes to all existing commits |
Note: These commands are examples of so called porcelain commands, meaning they are comfortable for regular use. There is another set of so called plumbing commands that allow a lower level access to git internals. These can be useful in certain less common situations, for example when doing automated history rewrites or when writing your own subcommands.
We employ the following workflow for MAD-X:
- Fork the official MAD-X repository onto your own github username
- Create a new branch for your feature/bugfix
- Work on your branch until the feature/bugfix is ready for inclusion
- Push your changes to your github
- Create a pull-request for inclusion into the official MAD-X repository
As a very first step, configure your name and email:
git config --global user.name "Full Name" git config --global user.email "[email protected]"
Now fork the MAD-X repository onto your own github username.
Clone your fork and add the official MAD-X repository as upstream:
git clone [email protected]:USERNAME/MAD-X cd MAD-X git remote add upstream [email protected]:MethodicalAcceleratorDesign/MAD-X
You're now ready to work on your local copy of the MAD-X repository.
(See also the Cloning subsection of this document.)
First, make sure that your upstream is up-to-date:
git fetch upstream
(Note this command just fetches data to be stored into the .git
folder but
never interferes with your local checkout.)
Now create and checkout a new branch for your new feature/bugfix. Most of the time you will want to branch off the upstream master:
git checkout -b my_feature upstream/master
Associate the branch to a branch on your github fork:
git push -u origin my_feature
After making a coherent unit of changes, build and test your changes, and then add relevant changes to the INDEX:
git add FILES...
Note that with the -p
or -e
options, you gain more fine grained control
over what goes into the index.
Commit changes to your current branch in the local repository:
git commit -m "Implement awesome feature XYZ"
The commit messages describes the change of the commit, starts with a verb and an uppercase letter. The first line should be no more than 80 characters. If you want to provide a more information (which is very welcome!), leave an empty line after the subject line, e.g.:
Fuse twcpgo and twchgo routines Tracking *common* and *chromatic* optical functions independently was more error prone (no "single source of truth") and meant that many computations had to be performed twice. Resolves #735
The Resolves #NUM
line can be used to automatically close issues on github
upon merge.
Important: Regularly inspect your status, diff, log and last commit to check whether they contain exactly the changes you intended, see Inspection.
If/when you want, publish your changes to github:
git push
Go to the github website and create a pull request. This creates a thread where further review and discussion can take place.
In case of merge conflicts, see Conflict resolution.
For history modifications, see History alteration.
This section contains basic usage instructions for the most common git
operations. For more details please refer to builtin help system (git
help
). See also Getting help.
git is very well documented. The man page for every subcommand can be accessed by typing:
git help COMMAND
Note that git also has great official online resources such as tutorials and comprehensive documentation that you should absolutely refer to in case of problems:
- book
- documentation
- crash course for SVN users
There are multiple ways to query information about the current branch/work-tree/commit/history and so on. This is especially important directly before and after committing:
command | what it shows |
---|---|
git status |
general status |
git diff |
differences between local files and index |
git diff --cached |
differences between index and previous commit |
git show [COMMIT] |
commit message + diff |
git log [BRANCH] |
commit history of branch |
git log --all |
commit history of all branches |
git log --graph |
graph structure of history |
git blame FILE |
which line was changed last when, in which commit and by whom |
Cloning means copying a remote repository. Note that (unlike a checkout in SVN) a clone fetches the entire history and creates a fully-fledged repository on which you can work and commit locally without having to push your changes to the upstream repository.
For example, cloning the upstream MAD-X repository on github to your local PC can be achieved as follows:
For users without github account or SSH key:
git clone https://github.com/MethodicalAcceleratorDesign/MAD-X
For users with github account and SSH key:
git clone [email protected]:MethodicalAcceleratorDesign/MAD-X
Note that a clone is basically equivalent to a series of commands:
mkdir MAD-X && cd MAD-X git init git remote add origin [email protected]:MethodicalAcceleratorDesign/MAD-X git fetch origin git checkout master origin/master
If you plan to contribute, you should first login to your github account and fork the MAD-X repository on github to your own username (using the button somewhere in the upper right). I recommend accessing your fork via SSH (See Adding an SSH key).
Now clone your fork using:
git clone [email protected]:USERNAME/MAD-X
If you had already cloned the upstream repository, you can add your own fork as an additional remote instead:
git remote add myfork [email protected]:USERNAME/MAD-X git fetch myfork
Many git operations allow or require specifying an ID to a commit. One way to do call these commands is using the full SHA-1 hash of the commit. However, it is often more convenient to use alternative method to refer to the desired commit:
- usually the first 8 or so digits (or another unique subsequence of the commits hash) can identify a commit
-
HEAD
is a symbolic reference to the active commit - the name of a branch can be used to refer to the last commit in the branch
-
COMMIT^
refers to the first parent of a given commit -
COMMIT^N
refers to the N'th parent of a given commit (only useful for merge commits, i.e. when a commit has multiple parents) -
COMMIT~N
refers to the N'th ancestor going back in the commits history via the first parents
Example:
0a4775ca Abbreviation of 0a4775caa631b3fb99e75becf8fbb6683b68cf0c HEAD~3 Go back 3 commits from current commit HEAD^2~3 Go back 3 commits from the second parent of current commit master^ The first parent of master
Branches are diverging lines of development, for example for bug fixes or new features that can later be merged, see Merging.
In git a branch is merely a pointer, i.e. reference, to a commit. Creating
a branch is an extremely lightweight, atomic operation and involves no file
copying. When on a branch the reference and creating a new commit (git
commit
) the reference is automatically advanced. It is not necessary to have
an active branch. In this case you operate in so called detached HEAD mode.
Similar to branches, tags (releases) in git point to a specified commit, but can hold additional metadata, like commits.
Note that this is very different from SVN which doesn't have a true builtin concepts of branches – but relies on conventions upheld by the user instead: Branches and tags in SVN are created as copies of a directory. And while copying a directory on the server side is cheap in SVN as well, it can lead to large network load and disc usage if checking out the entire repository including all tags and branches.
Branches: | |
git branch -a |
list all branches (including remote ones) |
git branch NAME [BASE] |
create a new branch (at BASE ) |
git branch -m [OLD] NEW |
rename branch |
git branch -d NAME |
delete branch |
git checkout -b NAME |
create and activate new branch |
git push -u origin NAME |
set upstream tracking branch |
Tags: | |
git tag NAME [BASE] |
create a new tag |
git tag -d NAME |
delete tag |
git push --tags |
push tags |
Common operations: | |
git fetch -p |
delete references to remote branches that were deleted upstream |
As a distributed version control system, git features a very flexible approach to working with multiple parties. This is captured by the concepts of remotes which specify the URLs (or filesystem path) of remote repositories.
git remote -v |
list your remotes with URL (verbose) |
git remote add NAME URL |
add a new remote |
git fetch NAME |
download data into your .git folder
(does not affect the work-tree) |
git remote rename OLD NEW |
rename an existing remote |
git remote rm NAME |
forget a remote (does not delete the upstream repository) |
Merging is the process of rejoining two or more lines of development.
Before merging branches, you should first stash (or better commit) uncommitted changes:
git stash
Then download and merge the newest version of the other branch:
git pull upstream master
Note that this is the same as the two commands:
git fetch upstream git merge upstream/master
If your current branch refers to a commit in the history of the other branch git by default merely fast-forwards the pointer without creating a new commit. You can enforce the creation of a new merge commit (for purpose of getting a cleaner structured history graph):
git merge --no-ff other-branch
In some cases there will be merge conflicts, in which case git complains and tells you to resolve the conflict. See Conflict resolution.
If there is a merge conflict, check the status to see which files need your care:
git status
In those files, search for the conflicted sections in between <<<<<<
,
======
and >>>>>>
marker lines and manually replace them with the
correct code (remove the markers!).
When you're done resolving, double check your changes (!), then add the changes and commit:
git add FILE git diff --cached git status git commit
Apart from committing itself, git has very powerful mechanisms to rewrite history, i.e. modify existing commits. One can freely insert, drop, modify and even fuse commits before publishing them.
However, this will change the commits and hence their hashes. Therefore, as a rule of thumb you should never do this with commits that have been seen or created by others. Most importantly, never alter commits that have been merged to master.
If you just want to add additional changes to most recent commit, you can add them to the index and simply type:
git commit --amend
If you want to reorder/join/drop/fuse/modify commits in your recent history or if you have a branch that is based on an old version of master and you want to move (reapply) the commits onto the new master (or another branch):
git fetch upstream git rebase -i upstream/master
I recommend to always use the -i
option!
In certain situations, you might need:
git rebase -i --onto master next topic
This allows to transform a commit graph like this:
o---o---o---o---o master \ o---o---o---o---o next \ o---o---o topic
into a graph that looks like this:
o---o---o---o---o master | \ | o'--o'--o' topic \ o---o---o---o---o next
Rebasing is extremely useful to clean up your history and fix bugs into the commits that introduced them, before publishing your changes. Having a clean bug-free history is important for peer-review and especially for Bisecting. I usually rebase many times every day.
A less common operation is the git filter-branch
command, which allows to
rewrite the history of one or many branches. It is useful mainly before
publishing a previously private repository or extracting code and history from
an existing repository. For example, it can be used to:
- remove certain files from all previous commits
- move all files in a folder to a different location
- change commit messages in a systematic way
- extract the history of a subfolder or a specific file
In fact, filter-branch
was used during the migration of the MAD-X SVN
repository to git to add the revision numbers to the commit messages.
git bisect
is a convenient tool to find the commit that introduced a bug
(or more generally: changed a behaviour) using a logarithmic number of steps.
git bisect start NEW OLD |
start bisecting, you provide a commit that you know to have the new behaviour and one you to to behave the old way. |
git bisect (new|old) |
after testing, manually mark the current commit as having the new/old behaviour and checkout the next commit. |
git bisect run COMMAND |
execute a script that determines whether commits are old/new based on exit status for the rest of all bisected commits. |
git bisect reset |
reset after bisecting |
The following is a very brief summary of some of the git internals that may help your understanding of git:
- git repositories are a collection of objects stored inside the
.git
repository. - git objects consist of some blob of data and are referenced by their SHA-1
hash. The most important objects and their data blobs are:
- file: the file content
- tree: list of filenames + references to the corresponding objects
- commit: - commit message, author, date and other metadata - reference to the file tree - references to parent commit(s)
- note that this has the following important implications:
- commits are snapshots that have direct knowledge of the entire working directory, they are not implemented changesets.
- since hashes are deterministic, commit IDs are deterministic based contents, parents and other metadata, and:
- tree/file IDs are deterministic based on contents – which means that identical files will only be stored once on disc
- using a cryptographic hash (more or less):
- ensures that it is computationally infeasible to generate cyclic histories
- can be used to verify data integrity
- allows to detect attempts of tempering
- git branches are just pointers (references to) to commits. As such, branching is an extremely lightweight operation. (Compare this to SVN, where a branch is a copy of a directory. Here, while creating and switching branches are cheap operations as well, branches can lead to significant overhead when checking out the entire repository including all branches and tags.)
- git has a garbage collector that can delete objects if they become unreferenced for too long (e.g. more than a week). Objects referenced directly or indirectly by a branch or tag will never be deleted by git.
- As long as objects are not deleted they can always be checked out or
queried otherwise.
git reflog
is a useful tool to get the hashes of commits that were deleted by accident.
The most important differences between git and SVN are summarized by the following table:
Aspect | SVN | git |
---|---|---|
history structure | linear | DAG |
commit id | revision number | commit hash |
tags/branches | copy of a directory | pointer to a commit |
main branch | trunk | master |
staging area | no | yes |
creating commits | on server | locally |
conflict resolution | on every commit | only when merging |
server / remotes | exactly one SVN server | fully dynamic |
local data | checkout (only files, only one branch) | fully fledged clone of the entire repository |
network access | most operations | only for synchronizing |
history alteration | mostly static | yes, fully dynamic |
If you plan to contribute, I recommend accessing the repository via SSH rather than HTTPS. This means that you will not have to enter your password on every push. If you haven't already, create an SSH key:
ssh-keygen -b 4096
You can leave the password blank to store the key unencrypted (!) to your home folder. In this case, you won't need to enter password when pushing (in my opinion, if someone gets access to your harddrive, all your data is compromised anyway). If you dislike storing the key without further protection, you can enter a password and consider
Copy the SSH public key (cat ~/.ssh/config/id_rsa.pub
) and add it on your
github settings page.
You may find it convenient to add a few basic aliases and settings to your
~/.gitconfig
. I encourage you to try and find out how these aliases work
and what they do:
[core]
editor = vim
excludesfile = /home/thomas/.gitignore_global
filemode = true
[alias]
co = checkout
cp = cherry-pick
st = status
re = remote -v
# fixup current index to the previous commit
amend = commit --amend
amenda = commit --amend -a
# merge and create a structural merge-commit (no fast-forward)
mm = merge --no-ff
# useful diffs:
cdiff = diff --cached
wdiff = diff --word-diff=color
wcdiff = diff --cached --word-diff=color
# log with graph structure:
alog = log --graph --all --format=cmedium
# unstage (remove from index) some files:
unstage = reset HEAD --
# checkout pull request by issue number (and remote name):
copr = "!f() { git fetch -fu ${2:-origin} refs/pull/$1/head:pr/$1 && git checkout pr/$1; }; f"
# enable/specify colors:
[color]
ui = true
[color "branch"]
current = yellow reverse
local = yellow
remote = green
[color "diff"]
meta = yellow bold
frag = magenta bold
old = red bold
new = green bold
[color "status"]
added = yellow
updated = green
changed = magenta
untracked = cyan
branch = green bold
[diff]
tool = vimdiff
[difftool]
prompt = false
# required for the nice graph log:
[pretty]
cmedium =\
%C(yellow)%h%C(cyan)% an %C(green)(%ar)%C(red)%d%n\
%C(white)%s%n\