generated from jtr13/bookdown-template
-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathgit.qmd
1473 lines (1200 loc) · 50.8 KB
/
git.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Version control with Git
Modern software development would be impossible without version control systems,
and the same goes for building analytical pipelines that are reproducible and
robust. It doesn’t really matter what the output of the pipeline is: a simple
graph, a report with a statistical analysis, a scientific publication, a trained
machine learning model that you want to hook to an API... if the code to the
project is not versioned, you incur major risks and the pipeline is not
reproducible.
But what is version control anyway?
Version control tools make it easy to keep track of the changes that were made
to text files (like R scripts). Any change made to any file of a project is
catalogued, making it possible to trace back how the file changed, who made the
changes, and when these changes were made. Using version control it is also
quite easy to collaborate on a project by forcing team members to deal
explicitly with the potential conflicts that might arise when the same file got
changed by different people at the same time. Should your computer get lost,
stolen, or explode, your projects are safely backed up on a server: this is
because version control tools make use of a server which keeps track of all the
changes (and in some cases, this *server* is actually your team-mates’
computers!)
Version control tools also make it easy to experiment with new ideas. You can
start new *branches* which essentially make a copy of your current project. In
this new branch, you can safely experiment with new features, and if the
experiments are not conclusive, you can simply discard this branch: the
*original* copy of your project will remain untouched. We will also use branches
to implement features, fix bugs quickly, and manage the project in a paradigm
called *trunk-based development*.
There are several version control tools out there, but Git is undoubtedly the
most popular one. You might have heard of Github; this is a service that hosts
repositories for your projects, and provides other project management tools such
as an issue tracker, project wiki, feature requests... and also very importantly
continuous integration. Don’t worry if this all sounds very abstract: by the end
of the next chapter you will have all the basic knowledge to use Git and
Github.com for your projects.
Git is a tool that you must install on your computer to get started. Once Git is
installed, you can immediately start using it; you don’t need to open an account
on Github (or a similar service), but it is recommended to make collaboration
easier (it is possible to collaborate with several people using Git without a
service like Github, by setting up a bare repository on a server or on a network
drive you control, but this is outside the scope of this book).
You should know that Github offers private repositories for free, so if you
don’t want your work to be accessible to the public, that is possible. Only
people that you invite to your private repositories will be able to see the code
and collaborate with you. It is also possible that your work place has set up a
self-hosted Git platform, ask your IT department! Usually these self-hosted
platforms are Gitea or Gitlab instances. Gitea, Gitlab, Bitbucket, Codeberg,
these are all similar services to Github. All have their advantages and
disadvantages.
The advantages of Github are twofold:
- It has a very large community of users;
- Its continuous integration service is incredibly useful, and free for up to 2000 minutes a month.
Disadvantages are:
- It has been bought by Microsoft in 2018;
- It is not possible to self-host an instance of Github (not for free at least).
The fact it is owned by Microsoft may not seem like an issue, but Microsoft’s
track record of previous acquisitions is open to question (Nokia, Skype), and the
[recent discussions about using source code hosted on Github to train machine
learning models
(Copilot)](https://web.archive.org/web/20230130103241/https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code)^[https://is.gd/rQgCj8]
can make one uneasy about relying too much on Github.
So while we are going to use Github to host our projects in the remainder of
this book, almost everything you are going to learn will be easily transferable
to another code hosting platform such as Gitlab or Bitbucket, should you want to
switch (or if your workplace has a self-hosted instance from one of Github’s
competitors). Installing and configuring Git will be exactly the same regardless
of the hosting service we use, and all the commands we will use to actually
interact with our repositories will be the same as well. So why did I write
*almost everything* is the same across any of the code hosting platforms? Well,
the two advantages I cited above really give Github an edge; many developers,
researchers and data scientists have a Github account already and so if one day
you need to collaborate with people, chances are they have an account on Github
and not on another code hosting platform.
But what really sets up Github.com apart is Github Actions, Github’s continuous
integration service. Github Actions is literally a computer in the cloud that
you can use to run a set of actions each time you interact with the repository
(or at defined moments as well). For example, it would be possible to run
automated tests each time a collaborator uploads some changes to the project.
This way, we can make sure that no change introduced a bug. Or take this book;
each time I write and push a new section or chapter to Github, the website, PDF
and Epub of this book get re-generated and updated automatically. Each Github
account gets 2000 minutes a month of free computing time, which is
really a lot. In part 2, we will make use of Github Actions to run our RAP in
the cloud, by simply pushing updates to our code on Github.
By the way, if you're using a cloud service like Dropbox, Onedrive, and the
like, DO NOT put projects tracked by Git in them! I really need to stress this:
do not track projects with both something like Dropbox and Git. This is because
Dropbox and similar services do not deal gracefully with conflicts: if two
collaborators change the same file, Dropbox makes two copies of the files. One
of the collaborators then has to manually deal with the conflict. The issue is
that inside a project that is being tracked by Git, there is a hidden folder
with many files that get used for synching the project and making sure that
everything runs smoothly. If you put a Git-enabled project inside a Dropbox
folder, these files will get accessed simultaneously by different people, and
Dropbox will start making copies of these because of conflicts. This really
messes up the project and can lead to data loss. Let Git handle the tracking and
the collaborating for you. It might seem more complex than a service like
Dropbox, and it is, but it is immensely more powerful, and what steep learning
curve it might have, it more than makes up for it with the many features it
makes available at your fingertips. Unlike Dropbox (or similar services), Git
deals with conflicts not on a per-file basis, but on a per-line basis. So if two
collaborators change the same file, but different lines of this same file, there
will be no conflict: Git will handle the merge on its own.
Finally, before starting, there is something important that you need to
understand, and people sometimes get confused by it: if a repository is public,
this does not mean that anyone can make changes to the code. What this means is
that anyone can fork the repository (essentially making a copy of the repository
to their Github account) and then *suggest* some changes in a so-called pull
request. The maintainer and owner of the original project can then accept these
edits or not.
In the remainder of this chapter, you are going to learn how to set up Git on
your machine, open a Github account and start using it right away. Then, I’m
going to discuss several scenarios:
- how to collaborate, as a team, on a project;
- how to contribute to someone else’s project.
## Installing Git and opening a Github account
Git is a program that you install on your computer. If you’re running a Linux
distribution, chances are Git is already installed. Try to run the following
command in a terminal to see if this is the case:
```bash
which git
```
If a path like `/usr/bin/git` gets shown, congratulations, you can skip the rest
of this paragraph. If something like:
::: {.content-hidden when-format="pdf"}
```bash
/usr/bin/which: no git in (/home/username/.local/bin:/home/username/bin:etc...)
```
:::
::: {.content-visible when-format="pdf"}
```bash
/usr/bin/which: no git in (/home/username/.local/bin: /home/username/bin:etc...)
```
:::
gets shown instead, then this means that Git is not installed on your system. To
install Git, use your distribution’s package manager, as it is very likely that
Git is packaged for your system. On Ubuntu, arguably the most popular Linux
distribution, this means running:
```bash
sudo apt-get update
sudo apt-get install git
```
If you're using Ubuntu, you may use `apt` instead of `apt-get`. Both commands
are basically interchangeable, use whatever you're used to. I've first used
Ubuntu in 2008, and even though I don't use it anymore as my daily Linux distro
(that honor goes to openSUSE), I still use `apt-get` out of habit.
On macOS and Windows, follow the instructions from the [Git
Book](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)^[https://is.gd/9HZqW4].
It should be as easy as running an installer for any program.
Depending on your operating system, a graphical user interface might have been
installed with Git, making it possible to interact with Git outside of the
command line. It is also possible to use Git from within RStudio and many other
editors have interfaces to Git as well. We are not going to use any graphical
user interface, however. This is because there is no common, universal graphical
user interface; they all work slightly differently. The only universal is the
command line. Also, learning how to use Git via the command line will make it
easier the day you will need to use it from a server, which will very likely
happen. It also makes my job easier: it is simpler to tell you which commands
to run and explain them to you than littering the book with dozens upon dozens
of screenshots that might get outdated as soon as a new version of the interface
gets released.
Don’t worry, using the command line is not as hard as it sounds.
If you don't already have a Github account, now is the time to create one. Just
go over to [https://github.com/](https://github.com/) and simply follow the
instructions and select the free tier to open your account.
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/git_1.png"
alt="This is your Github dashboard."></img>
<figcaption>This is your Github dashboard.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "This is your Github dashboard."
knitr::include_graphics("images/git_1.png")
```
:::
In the next section, we are going to learn some basic Git commands by versioning
the two scripts that we wrote before.
## Git superbasics
We are going to use the two scripts that we wrote in the previous section. If
you want to follow along, create a folder called `housing` and put the two
scripts we developed before in there:
- save_data.R: [https://is.gd/7PhUjd](https://is.gd/7PhUjd)
- analysis.R: [https://is.gd/qCJEbi](https://is.gd/qCJEbi)
Open the folder that contains the two scripts in a file explorer. On most Linux
desktop environments you should be able to right-click inside that folder
anywhere on a blank space and select an option titled something like "Open
Terminal here". If you're using Windows, you can pretty much do the same but
look instead for the option titled "Open Git Bash here". On macOS, you need to
first activate this option. Simply google for "open terminal at folder macOS"
and follow the instructions. It is also possible to drag and drop a folder into
a terminal which will then open the correct path in the terminal. Another
option, of course, is to simply open a terminal and navigate to the correct
folder using `cd` (**c**hange **d**irectory, this should work the same on
Windows, macOS and Linux):
```bash
cd /home/user/housing/
```
Make sure that you are in the right folder by **l**i**s**ting the contents of
the folder:
```bash
ls
```
From now on, make sure to type the commands you see in the terminal (on Linux
and macOS) or in the Git Bash terminal on Windows. To distinguish the terminal
from the R command line prompt, the prompt of a terminal (or Git Bash terminal
on Windows) will start with `owner@localhost`. `owner` is the username of the
project manager in our examples from now on, and the computer `owner` used by
this project manager is called `localhost` (this prompt can look different on
your machine, sometimes the full path to the current working directory is listed
instead). So here is what happens when `owner` runs `ls` on the root directory
of the project:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ ls
analysis.R save_data.R
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ ls
analysis.R save_data.R
```
:::
(On Linux you could also try `ll` which is often available. It is an alias for
`ls -l` which provides a more detailed view. There's also `ls -la` which also
lists hidden files.)
Make sure that you see the two scripts being listed when running `ls`. If not,
this means that you are in the wrong directory, so make sure that you open the
terminal in the correct folder.
It's now time to start tracking these files using Git. In the same terminal in
which we ran `ls`, run now the following `git` command:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git init
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git init
```
:::
```bash
hint: Using 'master' as the name for the initial branch.
hint: This default branch name is subject to change.
hint: To configure the initial branch name to use in all of your
hint: new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main',
hint: 'trunk' and 'development'. The just-created branch can be
hint: renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /home/user/housing/.git/
```
Take some time to read the hints. Many `git` commands give you hints and it's
always a good idea to read them. This hint here tells us that the default branch
name is "master" and that this is subject to change. Think of a branch as a
*version* of your code. The "master" branch will hold the default version of
your code. But you could create a branch called "dev" that would contain a
version of the code with features that are still in development. There is
nothing special about the default, "master" branch, and it could have been
called anything else. For example, if you create a repository on Github first,
instead of creating it on your computer, the default branch will be called
"main". You need to pay attention to this, because when we will start
interacting with our Github repository, we need to make sure that we have the
right branch name in mind. Also, note that because the "master" branch is the
most important branch, it gets sometimes referred to as the "trunk". Some teams
that use trunk-based development (which I will discuss in the next chapter) even
name this branch "trunk".
Let's now run this other `git` command:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git status
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git status
```
:::
```bash
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
analysis.R
save_data.R
nothing added to commit but untracked files present (use "git add" to track)
```
Git tells us quite clearly that it sees two files, but that they're currently
not being tracked. So if we would modify them, Git would not keep track of the
changes. So it's a good idea to just do what Git tells us to do: let's *add*
them so that Git can track them:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git add
```
```bash
Nothing specified, nothing added.
hint: Maybe you wanted to say 'git add .'?
hint: Turn this message off by running
hint: "git config advice.addEmptyPathspec false"
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git add
```
```bash
Nothing specified, nothing added.
hint: Maybe you wanted to say 'git add .'?
hint: Turn this message off by running
hint: "git config advice.addEmptyPathspec false"
```
:::
Shoot, simply running `git add` does not do us any good. We need to specify
which files we want to add. We can name them one by one, for example `git add
file1.R file2.txt`, but if we simply want to track all the files in the folder,
we can simply use the `.` placeholder:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git add .
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git add .
```
:::
No message this time... is that a good thing? Let's run `git status` and see
what's going on:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git status
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git status
```
:::
```bash
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: analysis.R
new file: save_data.R
```
Nice! Our two files are being tracked now, so we can commit the changes.
*Committing* means that we are happy with our work, and we can snapshot it.
These snapshots then get uploaded to Github by pushing them. This way, the
changes will be available for our coworkers for them to pull. I’ll explain what
this means later, so don't worry if this is confusing, it won't be by the end of
the chapter. Also, you should know that there is a special file, called
`.gitignore`, that allows you to list files or folders that you want Git to
ignore. This can be useful in cases where you are working with sensitive data
and don’t want it to be uploaded to Github. We will not use the `.gitignore`
file just yet, but will do so in part two of the book. So for now, just remember
that this is an option.
We are now ready to commit our files. Each commit must have a commit message,
and we can write this message as an option to the `git commit` command:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git commit -m "Project start"
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git commit -m "Project start"
```
:::
The `-m` option is there to specify the message for the commit. Before
pushing the commit, let's run `git status` again:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git status
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git status
```
:::
```bash
On branch master
nothing to commit, working tree clean
```
This means that every change is accounted for in a commit. So if we were to push
now, we could then set our computer on fire: every change would be safely backed
up on Github.com. We can also choose to not push yet, and keep working and
committing. For example, we could commit 5 times and just push once: all of the
5 commits would be pushed to Github.com.
Let's do just that by changing one file. Open `analysis.R` in any editor and
simply change the start of the script by adding one line. So go from:
```r
library(dplyr)
library(ggplot2)
library(purrr)
library(tidyr)
```
To:
```r
# This script analyses housing data for Luxembourg
library(dplyr)
library(ggplot2)
library(purrr)
library(tidyr)
```
and now run `git status` again:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git status
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git status
```
:::
```bash
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: analysis.R
no changes added to commit (use "git add" and/or "git commit -a")
```
Because the file is being tracked, Git can now tell us that something changed
and that we did not commit this change. So if our computer would self-combust,
these changes would get lost forever. Better commit them and push them to
Github.com as soon as possible!
Remember, first, we need to add these changes to a commit using `git add .`:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git add .
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git add .
```
:::
(You can run `git status` at this point to check if the file was correctly added
to be committed.)
Then, we need to commit the changes and add a nice commit message:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git commit -m "Added a comment to analysis.R"
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git commit -m "Added a comment to analysis.R"
```
:::
Try to keep commit messages as short and as explicit as possible. This is not
always easy, but it really pays off to strive for short, clear messages. Also,
ideally, you would want to keep commits as small as possible, ideally one commit
per change. For example, if you're adding and amending comments in scripts, once
you're done with that make this a commit. Then, maybe clean up some code. That's
another, separate commit. This makes rolling back changes or reviewing them much
easier. This will be crucial later on when we will use trunk-based development
to collaborate with our teammates on a project. It is generally not a good idea
to code all day and then only push one single big fat commit at the end of the
day, but that is what happens very often...
By the way, even if our changes are still not on Github.com, we can still
roll back to previous commits. For example, suppose that I delete the file
accidentally by running `rm analysis.R`:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ rm analysis.R
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ rm analysis.R
```
:::
Let’s run `git status` and look for the changes (it’s a line starting with the
word `deleted`):
```bash
On branch master
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: analysis.R
no changes added to commit (use "git add" and/or "git commit -a")
```
Yep, `analysis.R` is gone. And deleting on the console usually means that the
file is gone forever. Well technically no, there are still ways to recover
deleted files using certain tools, but since we were using Git we can use it to
recover the files! Because we did not commit the deletion of the file, we can
simple tell Git to ignore our changes. A simple way to achieve this is to stash
the changes, and then *drop* (or delete) the stash:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git stash
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git stash
```
:::
```bash
Saved working directory and index state WIP on master: \
ab43b4b Added a comment to analysis.R
```
So the deletion was stashed away, (so in case we want it back we could get it
back with `git stash pop`) and our project was rolled back to the previous
commit. Simply take a look at the files:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ ls
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ ls
```
:::
```bash
analysis.R save_data.R
```
There it is! You can get rid of the stash with `git stash drop`. But what if we
had deleted the file and committed the change? In this scenario, we could not
use `git stash`, but we would need to revert to a commit. Let's try, first let
me remove the file:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ rm analysis.R
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ rm analysis.R
```
:::
and check the status with `git status`:
```bash
On branch master
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: analysis.R
no changes added to commit (use "git add" and/or "git commit -a")
```
Let's add these changes and commit them:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git add .
owner@localhost ➤ git commit -m "Removed analysis.R"
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git add .
owner@localhost $ git commit -m "Removed analysis.R"
```
:::
```bash
[master 8e51867] Removed analysis.R
1 file changed, 131 deletions(-)
delete mode 100644 analysis.R
```
What’s the status now?
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git status
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git status
```
:::
```bash
On branch master
nothing to commit, working tree clean
```
Now, we've done it! `git stash` won't be of any help now. So how to recover our
file? For this, we need to know to which commit we want to roll back. Each
commit not only has a message, but also an unique identifier that you can access
with `git log`:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git log
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git log
```
:::
```bash
commit 8e51867dc5ae89e5f2ab2798be8920e703f73455 (HEAD -> master)
Author: User <[email protected]>
Date: Sun Feb 5 17:54:30 2023 +0100
Removed analysis.R
commit ab43b4b1069cd987685253632827f19d7a402b27
Author: User <[email protected]>
Date: Sun Feb 5 17:41:52 2023 +0100
Added a comment to analysis.R
commit df2beecba0101304f1b56e300a3cd713ce7366e5
Author: User <[email protected]>
Date: Sun Feb 5 17:32:26 2023 +0100
Project start
```
The first one from the top is the last commit we've made. We would like to go
back to the one with the message "Added a comment to analysis.R". See the very
long string of characters after "commit"? That's the commit’s unique identifier,
called hash. You need to copy it (or only like the first 10 or so characters,
that's enough as well). By the way, depending on your terminal and operating
system, `git log` may open `less` to view the log. `less` is a program that
makes it easy to view long documents. Quit it by simply pressing `q` on your
keyboard. We are now ready to revert to the right commit with the following
command:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git revert ab43b4b1069cd98768..HEAD
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git revert ab43b4b1069cd98768..HEAD
```
:::
and we're done! Check that all is right by running `ls` to see that the file
magically returned, and `git log` to read the log of what happened:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git log
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git log
```
:::
```bash
commit b7f82ee119df52550e9ca1a8da2d81281e6aac58 (HEAD -> master)
Author: User <[email protected]>
Date: Sun Feb 5 18:03:37 2023 +0100
Revert "Removed analysis.R"
This reverts commit 8e51867dc5ae89e5f2ab2798be8920e703f73455.
commit 8e51867dc5ae89e5f2ab2798be8920e703f73455 (HEAD -> master)
Author: User <[email protected]>
Date: Sun Feb 5 17:54:30 2023 +0100
Removed analysis.R
commit ab43b4b1069cd987685253632827f19d7a402b27
Author: User <[email protected]>
Date: Sun Feb 5 17:41:52 2023 +0100
Added a comment to analysis.R
commit df2beecba0101304f1b56e300a3cd713ce7366e5
Author: User <[email protected]>
Date: Sun Feb 5 17:32:26 2023 +0100
Project start
```
Using a range of commits in `git revert` reverts all the commits from the
starting commit (not included) to the last commit. In this example, because only
the commit starting with `8e51867dc5` was included in that range, only this
commit was reverted. You could have achieved the same result with `git revert 8e51867dc5`.
This small example illustrates how useful Git is, even without using Github, and
even if working alone on a project. At the very least it offers you a way to
simply walk back changes and gives you a nice timeline of your project. Maybe
this does not impress you much, because we live in a world where cloud services
like Dropbox made things like this very accessible. But where Git (with the help
of a service like Github) really shines is when collaboration is needed. Git and
code hosting services like Github make it possible to collaborate at very large
scale: thousands of developers contribute to the Linux kernel, arguably the most
successful open-source project ever, powering most of today’s smartphones,
servers, supercomputers and embedded
computers,^[https://www.zdnet.com/article/who-writes-linux-almost-10000-developers/]
and you can use these tools to collaborate at a smaller scale very efficiently
as well.
## Git and Github
So we got some work done on our machine and made some commits. We are now ready
to push these commits to Github. "Pushing" means essentially uploading these
changes to Github. This makes them available to your coworkers if you're pushing
to a private repository, or makes them available to the world if you're pushing
to a public repository.
Before pushing anything to Github though, we need to create a new repository.
This repository will contain the code for our project, as well as all the
changes that Git has been tracking on our machine. So if, for example, a new
team member joins, he or she will be able to clone the repository to his or her
computer and have access to every change, every commit message and every single
bit of history of the project. If it's a public repository, anyone will be able
to clone the repository and contribute code to it. We are going to walk you
through some examples of how to collaborate with Git using Github in the
remainder of this chapter.
So, let's first go back to [https://github.com/](https://github.com/) and create
a new repository:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/git_new_repo.png"
alt="Creating a new repository from your dashboard."></img>
<figcaption>Creating a new repository from your dashboard.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "Creating a new repository from your dashboard."
knitr::include_graphics("images/git_new_repo.png")
```
:::
::: {.content-visible when-format="pdf"}
\newpage
:::
You will then land on this page:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/git_new_repo_2.png"
alt="Name your repository and choose whether it's a public or private repository."></img>
<figcaption>Name your repository and choose whether it's a public or private repository.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "Name your repository and choose whether it's a public or private repository."
knitr::include_graphics("images/git_new_repo_2.png")
```
:::
Name your repository (1), and choose whether it should be open to the world or
if it should be private and only accessible to your coworkers (2). We are going
to make it a public repository, but you could make it private and follow along,
this would change nothing in what we're going to learn.
Click on *Create repository* (3). You then land on this page:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/git_repo_start.png"
alt="Some instructions to get you started."></img>
<figcaption>Some instructions to get you started.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "Some instructions to get you started."
knitr::include_graphics("images/git_repo_start.png")
```
:::
::: {.content-visible when-format="pdf"}
\newpage
:::
We get some instructions on how to actually get started with our project. The
first thing you need to do though is to click on "SSH":
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/git_repo_start_ssh.png"
alt="Make sure to select 'SSH'."></img>
<figcaption>Make sure to select 'SSH'.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "Make sure to select 'SSH'."
knitr::include_graphics("images/git_repo_start_ssh.png")
```
:::
This will change the links in the instructions from `https` to `ssh`. I will
explain why this is important in a couple of paragraphs. For now, let's read the
instructions. Since we have already started working, we need to follow the
instructions titled "...or push an existing repository from the command line".
Let's review these commands. This is what Github suggests we run:
```bash
git remote add origin [email protected]:rap4all/housing.git
git branch -M main
git push -u origin main
```
What's really important is the first command and last command. The first command
adds a remote (referred to as *origin*) that points to our repository. If you’re
following along, you should copy the link from your repository here. It would
look exactly the same, but the user name `rap4all` would be replaced by your
Github username. So now, every time I push, my changes will get uploaded to
Github. The second line renames the branch from "master" to "main". You are of
course free to do so. I don’t like changing the defaults from Git, so I will
keep using the name "master". The last command pushes our changes to the "main"
branch (but we need to change "main" to "master").
Let's do just that:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git remote add origin [email protected]:rap4all/housing.git
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git remote add origin [email protected]:rap4all/housing.git
```
:::
This produces no output. We're now ready to push:
::: {.content-hidden when-format="pdf"}
```bash
owner@localhost ➤ git push -u origin master
```
:::
::: {.content-visible when-format="pdf"}
```bash
owner@localhost $ git push -u origin master
```
:::
and it fails:
```bash
ERROR: Permission to rap4all/housing.git denied to b-rodrigues.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.