Skip to content
Thomas Gläßle edited this page Jun 20, 2017 · 16 revisions

For users not familiar with git, we provide some references and a cheat sheet that will cover the most common use cases:

Git is a distributed VCS (version control system), meaning that it has a very flexible approach to collaborating with one, many or even no other parties (remotes). It is conceptually very different from the likes of CVS or SVN. Please forget everything that you know about SVN and carefully read the following table to understand the key concepts in git.

Concept Meaning
git specific lang
work-tree the files actually present in your filesystem (at the top level where also the .git folder is)
repository storage of objects and references to objects (stored in .git/ folder)
object reference SHA-1 hash of the object payload
object most important: blob, tree, commit
object: blob payload: contents of a file
object: tree payload: list of names, plus references to the corresponding files and subtrees
object: commit payload: message, author, date, reference to tree, reference to parent(s), other metadata
history directed acyclic graph (DAG) of all ancestor commits of a commit
branch reference to commit
index (staging area) virtual tree that will become the tree of the next commit
stash temporary storage for changes, can be stacked
remote a related repository located elsewhere on the network or filesystem
General terms
merge joining multiple branches
(un-)tracked whether a file is registered for version control
checkout instance of the tracked files in the working tree (created/updated from the object repository)
clone full copy of a git repository
fork derived project, with different authors or development goals

At this point, take a few minutes and think about the following questions for fun and to tighten your understanding:

  • what is a repository and how does it differ from a working-tree?
  • will all changes in the work-tree automatically be part of the next commit?
  • how expensive is it to create a branch?
  • do all clones of a repository have equal rights or is there an upstream repository that plays a special role?
  • is it possible to have remote repositories that have no commits in common?
  • is it possible that a commit has an earlier date field than its parent?
  • why should the history be a DAG?
  • how is this property enforced on a technical level, i.e. could you manually construct a circular graph of commits?
  • is a commit represented internally as a snapshot of the files or as a diff?
  • if a file with the same data is present in two commits, is it stored twice on disk?

You can check your understanding by referring to the section about Git internals.

Rather than enforcing one opinionated work-flow, git has a lot of commands that can be used in a very modular fashion. While this may be overwhelming at first, you will find that most of it is quite natural to learn. We list the most common ones here.

Note: Most git commands are safe, i.e. revertible. This means that they will either gracefully handle any uncommitted changes or refuse to execute – depending on the command and whether there will be conflicts between the local changes and the performed action.

Command Purpose
Basic commands for working with a single repository:
init create empty repository (.git/) in current directory
add copy changes: work-tree → index
rm remove file from index + work-tree
mv move file (index + work-tree)
commit create commit: index → commit
branch manage branches (create, delete, rename, move)
tag manage tags (releases)
merge merge two or more branches: branch+ → commit
revert create a commit to cancel the diff of a previous commit (safe)
config change settings
Informational commands:
help get help about a topic or command
status show general status
log show history
diff show diff between work-tree, index, commits
show show info about a commit
blame show who commited what, when
bisect find the commit that introduced a bug
Commands that can be dangerous:
clean Remove untracked files, etc
reset usually: clear index, move current branch pointer (safe) OR:
reset --hard also: overwrite local changes (DANGEROUS)
checkout COMMIT move HEAD and checkout files (safe!) OR:
checkout BRANCH also: switch branches (safe!) OR:
checkout -- FILE also: overwrite local changes (DANGEROUS)
stash save local changes + reset --hard
stash drop delete an item from the stash (DANGEROUS)
Working with remotes (network access):
fetch download data: remote → .git/
push upload data: .git/ → remote
pull fetch + merge
clone init + remote add + fetch + checkout
Interact with foreign repositories:
remote manage remotes (add, rm, rename, set-url)
submodule manage submodules (add, rm, update)
subtree manage subtrees
Editting history:
commit --amend index → previous commit
cherry-pick apply existing commit at current location
rebase rewrite history (reorder, modify, merge, insert or drop commits)
filter-branch apply changes to all existing commits

Note: These commands are examples of so called porcelain commands, meaning they are comfortable for regular use. There is another set of so called plumbing commands that allow a lower level access to git internals. These can be useful in certain less common situations, for example when doing automated history rewrites or when writing your own subcommands.

We employ the following workflow for MAD-X:

  1. Fork the official MAD-X repository onto your own github username
  2. Create a new branch for your feature/bugfix
  3. Work on your branch until the feature/bugfix is ready for inclusion
  4. Push your changes to your github
  5. Create a pull-request for inclusion into the official MAD-X repository

As a very first step, configure your name and email:

git config --global user.name "Full Name"
git config --global user.email "[email protected]"

Now fork the MAD-X repository onto your own github username.

Clone your fork and add the official MAD-X repository as upstream:

git clone [email protected]:USERNAME/MAD-X
cd MAD-X
git remote add upstream [email protected]:MethodicalAcceleratorDesign/MAD-X

You're now ready to work on your local copy of the MAD-X repository.

(See also the Cloning subsection of this document.)

First, make sure that your upstream is up-to-date:

git fetch upstream

(Note this command just fetches data to be stored into the .git folder but never interferes with your local checkout.)

Now create and checkout a new branch for your new feature/bugfix. Most of the time you will want to branch off the upstream master:

git checkout -b my_feature upstream/master

Associate the branch to a branch on your github fork:

git push -u origin my_feature

After making a coherent unit of changes, build and test your changes, and then add relevant changes to the INDEX:

git add FILES...

Note that with the -p or -e options, you gain more fine grained control over what goes into the index.

Commit changes to your current branch in the local repository:

git commit -m "Implement awesome feature XYZ"

The commit messages describes the change of the commit, starts with a verb and an uppercase letter. The first line should be no more than 80 characters. If you want to provide a more information (which is very welcome!), leave an empty line after the subject line, e.g.:

Fuse twcpgo and twchgo routines

Tracking *common* and *chromatic* optical functions independently was more
error prone (no "single source of truth") and meant that many computations
had to be performed twice.

Resolves #735

The Resolves #NUM line can be used to automatically close issues on github upon merge.

Important: Regularly inspect your status, diff, log and last commit to check whether they contain exactly the changes you intended, see Inspection.

If/when you want, publish your changes to github:

git push

Go to the github website and create a pull request. This creates a thread where further review and discussion can take place.

In case of merge conflicts, see Conflict resolution.

For history modifications, see History alteration.

This section contains basic usage instructions for the most common git operations. For more details please refer to builtin help system (git help). See also Getting help.

git is very well documented. The man page for every subcommand can be accessed by typing:

git help COMMAND

Note that git also has great official online resources such as tutorials and comprehensive documentation that you should absolutely refer to in case of problems:

There are multiple ways to query information about the current branch/work-tree/commit/history and so on. This is especially important directly before and after committing:

command what it shows
git status general status
git diff differences between local files and index
git diff --cached differences between index and previous commit
git show [COMMIT] commit message + diff
git log [BRANCH] commit history of branch
git log --all commit history of all branches
git log --graph graph structure of history
git blame FILE which line was changed last when, in which commit and by whom

Cloning means copying a remote repository. Note that (unlike a checkout in SVN) a clone fetches the entire history and creates a fully-fledged repository on which you can work and commit locally without having to push your changes to the upstream repository.

For example, cloning the upstream MAD-X repository on github to your local PC can be achieved as follows:

For users without github account or SSH key:

git clone https://github.com/MethodicalAcceleratorDesign/MAD-X

For users with github account and SSH key:

git clone [email protected]:MethodicalAcceleratorDesign/MAD-X

Note that a clone is basically equivalent to a series of commands:

mkdir MAD-X && cd MAD-X
git init
git remote add origin [email protected]:MethodicalAcceleratorDesign/MAD-X
git fetch origin
git checkout master origin/master

If you plan to contribute, you should first login to your github account and fork the MAD-X repository on github to your own username (using the button somewhere in the upper right). I recommend accessing your fork via SSH (See Adding an SSH key).

Now clone your fork using:

git clone [email protected]:USERNAME/MAD-X

If you had already cloned the upstream repository, you can add your own fork as an additional remote instead:

git remote add myfork [email protected]:USERNAME/MAD-X
git fetch myfork

Many git operations allow or require specifying an ID to a commit. One way to do call these commands is using the full SHA-1 hash of the commit. However, it is often more convenient to use alternative method to refer to the desired commit:

  • usually the first 8 or so digits (or another unique subsequence of the commits hash) can identify a commit
  • HEAD is a symbolic reference to the active commit
  • the name of a branch can be used to refer to the last commit in the branch
  • COMMIT^ refers to the first parent of a given commit
  • COMMIT^N refers to the N'th parent of a given commit (only useful for merge commits, i.e. when a commit has multiple parents)
  • COMMIT~N refers to the N'th ancestor going back in the commits history via the first parents

Example:

0a4775ca        Abbreviation of 0a4775caa631b3fb99e75becf8fbb6683b68cf0c
HEAD~3          Go back 3 commits from current commit
HEAD^2~3        Go back 3 commits from the second parent of current commit
master^         The first parent of master

Branches are diverging lines of development, for example for bug fixes or new features that can later be merged, see Merging.

In git a branch is merely a pointer, i.e. reference, to a commit. Creating a branch is an extremely lightweight, atomic operation and involves no file copying. When on a branch the reference and creating a new commit (git commit) the reference is automatically advanced. It is not necessary to have an active branch. In this case you operate in so called detached HEAD mode.

Similar to branches, tags (releases) in git point to a specified commit, but can hold additional metadata, like commits.

Note that this is very different from SVN which doesn't have a true builtin concepts of branches – but relies on conventions upheld by the user instead: Branches and tags in SVN are created as copies of a directory. And while copying a directory on the server side is cheap in SVN as well, it can lead to large network load and disc usage if checking out the entire repository including all tags and branches.

Branches:
git branch -a list all branches (including remote ones)
git branch NAME [BASE] create a new branch (at BASE)
git branch -m [OLD] NEW rename branch
git branch -d NAME delete branch
git checkout -b NAME create and activate new branch
git push -u origin NAME set upstream tracking branch
Tags:
git tag NAME [BASE] create a new tag
git tag -d NAME delete tag
git push --tags push tags
Common operations:
git fetch -p delete references to remote branches that were deleted upstream

As a distributed version control system, git features a very flexible approach to working with multiple parties. This is captured by the concepts of remotes which specify the URLs (or filesystem path) of remote repositories.

git remote -v list your remotes with URL (verbose)
git remote add NAME URL add a new remote
git fetch NAME download data into your .git folder (does not affect the work-tree)
git remote rename OLD NEW rename an existing remote
git remote rm NAME forget a remote (does not delete the upstream repository)

Merging is the process of rejoining two or more lines of development.

Before merging branches, you should first stash (or better commit) uncommitted changes:

git stash

Then download and merge the newest version of the other branch:

git pull upstream master

Note that this is the same as the two commands:

git fetch upstream
git merge upstream/master

If your current branch refers to a commit in the history of the other branch git by default merely fast-forwards the pointer without creating a new commit. You can enforce the creation of a new merge commit (for purpose of getting a cleaner structured history graph):

git merge --no-ff other-branch

In some cases there will be merge conflicts, in which case git complains and tells you to resolve the conflict. See Conflict resolution.

If there is a merge conflict, check the status to see which files need your care:

git status

In those files, search for the conflicted sections in between <<<<<<, ====== and >>>>>> marker lines and manually replace them with the correct code (remove the markers!).

When you're done resolving, double check your changes (!), then add the changes and commit:

git add FILE
git diff --cached
git status
git commit

Apart from committing itself, git has very powerful mechanisms to rewrite history, i.e. modify existing commits. One can freely insert, drop, modify and even fuse commits before publishing them.

However, this will change the commits and hence their hashes. Therefore, as a rule of thumb you should never do this with commits that have been seen or created by others. Most importantly, never alter commits that have been merged to master.

amend

If you just want to add additional changes to most recent commit, you can add them to the index and simply type:

git commit --amend

rebase

If you want to reorder/join/drop/fuse/modify commits in your recent history or if you have a branch that is based on an old version of master and you want to move (reapply) the commits onto the new master (or another branch):

git fetch upstream
git rebase -i upstream/master

I recommend to always use the -i option!

In certain situations, you might need:

git rebase -i --onto master next topic

This allows to transform a commit graph like this:

o---o---o---o---o  master
     \
      o---o---o---o---o  next
                       \
                        o---o---o  topic

into a graph that looks like this:

o---o---o---o---o  master
    |            \
    |             o'--o'--o'  topic
     \
      o---o---o---o---o  next

Rebasing is extremely useful to clean up your history and fix bugs into the commits that introduced them, before publishing your changes. Having a clean bug-free history is important for peer-review and especially for Bisecting. I usually rebase many times every day.

filter-branch

A less common operation is the git filter-branch command, which allows to rewrite the history of one or many branches. It is useful mainly before publishing a previously private repository or extracting code and history from an existing repository. For example, it can be used to:

  • remove certain files from all previous commits
  • move all files in a folder to a different location
  • change commit messages in a systematic way
  • extract the history of a subfolder or a specific file

In fact, filter-branch was used during the migration of the MAD-X SVN repository to git to add the revision numbers to the commit messages.

git bisect is a convenient tool to find the commit that introduced a bug (or more generally: changed a behaviour) using a logarithmic number of steps.

git bisect start NEW OLD start bisecting, you provide a commit that you know to have the new behaviour and one you to to behave the old way.
git bisect (new|old) after testing, manually mark the current commit as having the new/old behaviour and checkout the next commit.
git bisect run COMMAND execute a script that determines whether commits are old/new based on exit status for the rest of all bisected commits.
git bisect reset reset after bisecting

The following is a very brief summary of some of the git internals that may help your understanding of git:

  • git repositories are a collection of objects stored inside the .git repository.
  • git objects consist of some blob of data and are referenced by their SHA-1 hash. The most important objects and their data blobs are:
    • file: the file content
    • tree: list of filenames + references to the corresponding objects
    • commit: - commit message, author, date and other metadata - reference to the file tree - references to parent commit(s)
  • note that this has the following important implications:
    • commits are snapshots that have direct knowledge of the entire working directory, they are not implemented changesets.
    • since hashes are deterministic, commit IDs are deterministic based contents, parents and other metadata, and:
    • tree/file IDs are deterministic based on contents – which means that identical files will only be stored once on disc
    • using a cryptographic hash (more or less):
      • ensures that it is computationally infeasible to generate cyclic histories
      • can be used to verify data integrity
      • allows to detect attempts of tempering
  • git branches are just pointers (references to) to commits. As such, branching is an extremely lightweight operation. (Compare this to SVN, where a branch is a copy of a directory. Here, while creating and switching branches are cheap operations as well, branches can lead to significant overhead when checking out the entire repository including all branches and tags.)
  • git has a garbage collector that can delete objects if they become unreferenced for too long (e.g. more than a week). Objects referenced directly or indirectly by a branch or tag will never be deleted by git.
  • As long as objects are not deleted they can always be checked out or queried otherwise. git reflog is a useful tool to get the hashes of commits that were deleted by accident.

The most important differences between git and SVN are summarized by the following table:

Aspect SVN git
history structure linear DAG
commit id revision number commit hash
tags/branches copy of a directory pointer to a commit
main branch trunk master
staging area no yes
creating commits on server locally
conflict resolution on every commit only when merging
server / remotes exactly one SVN server fully dynamic
local data checkout (only files, only one branch) fully fledged clone of the entire repository
network access most operations only for synchronizing
history alteration mostly static yes, fully dynamic

If you plan to contribute, I recommend accessing the repository via SSH rather than HTTPS. This means that you will not have to enter your password on every push. If you haven't already, create an SSH key:

ssh-keygen -b 4096

You can leave the password blank to store the key unencrypted (!) to your home folder. In this case, you won't need to enter password when pushing (in my opinion, if someone gets access to your harddrive, all your data is compromised anyway). If you dislike storing the key without further protection, you can enter a password and consider

Copy the SSH public key (cat ~/.ssh/config/id_rsa.pub) and add it on your github settings page.

You may find it convenient to add a few basic aliases and settings to your ~/.gitconfig. I encourage you to try and find out how these aliases work and what they do:

[core]
    editor = vim
    excludesfile = /home/thomas/.gitignore_global
    filemode = true

[alias]
    co = checkout
    cp = cherry-pick
    st = status
    re = remote -v

    # fixup current index to the previous commit
    amend = commit --amend
    amenda = commit --amend -a

    # merge and create a structural merge-commit (no fast-forward)
    mm = merge --no-ff

    # useful diffs:
    cdiff = diff --cached
    wdiff = diff --word-diff=color
    wcdiff = diff --cached --word-diff=color

    # log with graph structure:
    alog = log  --graph --all --format=cmedium

    # unstage (remove from index) some files:
    unstage = reset HEAD --

    # checkout pull request by issue number (and remote name):
    copr = "!f() { git fetch -fu ${2:-origin} refs/pull/$1/head:pr/$1 && git checkout pr/$1; }; f"

# enable/specify colors:
[color]
    ui = true
[color "branch"]
    current = yellow reverse
    local = yellow
    remote = green
[color "diff"]
    meta = yellow bold
    frag = magenta bold
    old = red bold
    new = green bold
[color "status"]
    added = yellow
    updated = green
    changed = magenta
    untracked = cyan
    branch = green bold

[diff]
    tool = vimdiff
[difftool]
    prompt = false

# required for the nice graph log:
[pretty]
    cmedium =\
%C(yellow)%h%C(cyan)% an %C(green)(%ar)%C(red)%d%n\
%C(white)%s%n\