-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcurvature.tex
1110 lines (721 loc) · 81.7 KB
/
curvature.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Curvature} \label{ch:curvature}
Having applied the idea of vectors (and tensors) to flat surfaces, we now ponder how to apply the same idea to curved ones. This presents a new set of challenges that throw the whole concept into fresh paroxysms of doubt.
Much of the mathematical equipment introduced here may be somewhat baffling without an understanding of the motivations, which have to do with the way that so many of the assumptions that we can safely make in flat space are wildly wrong in a curved space. Rather than building the equipment up from the most minimal low-level axioms, we will instead try to naively use our flat space assumptions on curved surfaces and see what problems we encounter. Those problems will require us to rethink our assumptions and drive the introduction of new mathematical tools.
We will then work our way back to more fundamental concepts, as when working with examples there is always a risk that we depend on some special feature of that example --- the dreaded "loss of generality."
\section{Mapping the Globe}
The most symmetrical example of a curved surface is the sphere, and the Earth's global coordinate system provides an example of how to label the points on it. It may be tempting to think that because there is evidently a practical way to associate pairs of coordinates with points on the surface of the planet, therefore we can treat it much the same as a flat geometrical space, or that the standard global coordinate system is fundamental.
But that coordinate system is only a pragmatic compromise, as is any such attempt to map the sphere. There is no single "right" way to cover a sphere with coordinates. Furthermore, as we shall see, even on a surface as symmetrical and simple as a sphere, normally safe assumptions of geometry are overturned, and vectors fail us altogether.
On our way to resolving these problems, we will have to invent a generalisation of the idea of a space that can be associated with coordinates, of which the familiar flat plane is only one example.
\subsection{Coordinates on the sphere}
The Earth, controversially, is roughly spherical. It is only very slightly oblate so we'll talk about it as an ideal sphere.
\begin{figure}[h]
\caption{Standard global coordinate system}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}[tdplot_main_coords, scale = 2]
\shade[ball color = lightgray,
opacity = 0.4
] (0,0,0) circle (1cm);
\tdplotsetrotatedcoords{0}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{70}{230}{}{}
\tdplotsetrotatedcoords{203}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{270}{80}{}{}
\tdplotsetrotatedcoords{225}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{220}{70}{}{}
\tdplotsetrotatedcoords{247}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{220}{50}{}{}
\tdplotsetrotatedcoords{270}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{292}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{315}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{337}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\end{tikzpicture}
\caption{Longitude} \label{fig:globe-longitude}
\end{subfigure}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}[tdplot_main_coords, scale = 2]
\shade[ball color = lightgray,
opacity = 0.4
] (0,0,0) circle (1cm);
\tdplotsetrotatedcoords{0}{0}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0.924)}{0.383}{-130}{180}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0.707)}{0.707}{-80}{140}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0.383)}{0.924}{-70}{130}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{-50}{110}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,-0.383)}{0.924}{-50}{110}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,-0.707)}{0.707}{-30}{90}{}{}
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,-0.924)}{0.383}{-20}{50}{}{}
\end{tikzpicture}
\caption{Latitude} \label{fig:globe-latitude}
\end{subfigure}
\end{figure}
If you examine a globe you'll find it is marked with circular paths. Some pass through both the North and South Poles, and are called lines of equal \textit{longitude} (Figure \ref{fig:globe-longitude}). Some, including the equator, do not pass through either pole, and are commonly called lines of equal \textit{latitude} (Figure \ref{fig:globe-latitude}). Of course in modern geometry the word \textit{line} has a stricter meaning\footnote{In modern geometrical terminology, a line is always straight and extends to infinity in both directions. What Euclid called a straight line (according to his English translators) is nowadays called a line segment.} so we'll call them curves. Along these curves one of our coordinates is held constant and the other is allowed to vary. (For greater clarify we'll also say constant rather than equal.)
There is a mapping from coordinate pairs onto points on the surface. It is \textit{surjective}, meaning that no point is without a coordinate pair. But it is not \textit{injective}, because some points have multiple coordinate pairs. These troublesome points are the poles, and in fact they have infinitely many coordinate pairs because the latitude coordinate must have a specific value but the longitude can have any value. If we could find a way to match coordinate pairs up with points on the sphere that was one-to-one in both directions, it would be a \textit{bijection}, but this is impossible on a sphere.
The two kinds of curve commonly used to map the globe, latitude and longitude, are very different in nature. The difference is that every curve of longitude is the same length, the longest a circular path can be on the surface of a sphere, called a \textit{great circle}, whereas curves of latitude vary in length. The equator is the only curve of latitude that is a great circle; the others are all smaller circles and thus shorter routes back to any starting point. At the poles the circles of latitude vanish: if you vary your longitude coordinate at the poles, you don't move at all.
Suppose latitude worked like longitude, in that the equator remains a great circle, and some nearby curve of latitude is another great circle tilted so that one side rises to the north and the other side dips to the south (Figure \ref{fig:globe-alt-latitude}.)
\begin{figure}[h]
\caption{Non-standard global coordinate system}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}[tdplot_main_coords, scale = 2]
\shade[ball color = lightgray,
opacity = 0.4
] (0,0,0) circle (1cm);
\tdplotsetrotatedcoords{90}{0}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{210}{390}{}{}
\tdplotsetrotatedcoords{90}{22}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{190}{360}{}{}
\tdplotsetrotatedcoords{90}{45}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{180}{350}{}{}
\tdplotsetrotatedcoords{90}{67}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{170}{340}{}{}
\tdplotsetrotatedcoords{90}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{160}{340}{}{}
\tdplotsetrotatedcoords{90}{113}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{140}{320}{}{}
\tdplotsetrotatedcoords{90}{135}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{140}{320}{}{}
\tdplotsetrotatedcoords{90}{157}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{150}{320}{}{}
\end{tikzpicture}
\caption{Alternative latitude} \label{fig:globe-alt-latitude}
\end{subfigure}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}[tdplot_main_coords, scale = 2]
\shade[ball color = lightgray,
opacity = 0.4
] (0,0,0) circle (1cm);
\tdplotsetrotatedcoords{0}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{70}{230}{}{}
\tdplotsetrotatedcoords{203}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{270}{80}{}{}
\tdplotsetrotatedcoords{225}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{220}{70}{}{}
\tdplotsetrotatedcoords{247}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{220}{50}{}{}
\tdplotsetrotatedcoords{270}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{292}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{315}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{337}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{90}{0}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{210}{390}{}{}
\tdplotsetrotatedcoords{90}{22}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{190}{360}{}{}
\tdplotsetrotatedcoords{90}{45}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{180}{350}{}{}
\tdplotsetrotatedcoords{90}{67}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{170}{340}{}{}
\tdplotsetrotatedcoords{90}{113}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{140}{320}{}{}
\tdplotsetrotatedcoords{90}{135}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{140}{320}{}{}
\tdplotsetrotatedcoords{90}{157}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed]{(0,0,0)}{1}{150}{320}{}{}
\end{tikzpicture}
\caption{Four poles} \label{fig:globe-terrible}
\end{subfigure}
\end{figure}
The result would be another pair of poles. The Earth would have four poles, two in the usual locations where the longitude curves all cross, and another two on opposite points of the equator, where the latitude curves all cross. Let's suppose these poles to be where the equator intersects the International Dateline. This would leave two non-polar regions of the Earth where the curves locally take on the appearance of a helpful coordinate grid, one centred near a group of islands to the west of Ecuador, the other south of the Bay of Bengal.
The deal breaker for this system is that the International Dateline (the solid curve in Figure \ref{fig:globe-terrible}) is simultaneously a curve of constant latitude and constant longitude, so that it is impossible to give the coordinates of any specific point on that curve. The mapping between points and coordinates is neither injective nor surjective.
In any case, the regular approach to drawing curves of constant latitude has an obvious practical motivation: the Earth rotates, and the poles are positioned on the axis of rotation. If you stand rooted to one spot on the surface, over the course of 24 hours, relative to the Sun, you will travel along a curve of constant latitude. If you stand on either of the poles, you won't move at all.\footnote{You will rotate on the spot, but the direction you are facing is really a different degree of freedom, available wherever you might find yourself.}
In addition, if a planet's axis of rotation is orientated roughly normal to the plane of its orbit, the poles will receive less light and so be much colder and less hospitable. As few if any people live permanently on the poles, they are the ideal place to hide problems with your coordinate system.
\subsection{Drawing a rectangle}
On a flat Euclidean plane, drawing a rectangle is the simplest of challenges. But what about on the sphere?
Returning to the deceptive simplicity of the standard global coordinate system, we might naively attempt to trace a pseudo-rectangular route on the surface by moving first north (keeping longitude constant), then east (latitude constant), then south (longitude again) and finally west (latitude). There is one thing this gets right: each of the corners, locally, is a right-angle, which is one of the things we expect of a rectangle. But what about the edges?
As always, we are more interested in geometrically reality, distinct from any choice of coordinate system. There is something intrinsically important about a great circle, which is that if you pick any two points on a sphere and draw a great circle through them, it will have two segments, and the shorter of the two is the shortest possible path on the surface between those two points (if the two points are \textit{antipodes}, exact opposite points, both segments are the shortest path). This gives it something in common with a straight line between two points on a flat surface. If you are flying an aircraft around the globe, there is such a thing as "flying straight", steering neither left nor right. Such a flightpath will trace out part of a great circle.
For this reason, we give special significance to curves on a sphere that follow part of a great circle, as they are the closest thing we have to a straight line in that environment. We also regard them as "locally straight", in the sense that to a small creature (such as a person) walking on the surface of the sphere, at a scale where it appears locally flat, and such that they are walking in what they regard as a straight line, turning neither left nor right, they will in fact be following the curve of a great circle. The technical name for such a curve is a \textit{geodesic}, from the Greek word for surveying the Earth, but used in the same general sense on all curved surfaces regardless of shape.
With the concept of the geodesic, or locally straight path, we can now be clear about what a person has to do to follow a curve of latitude in our usual global coordinate system: unless they do this at the equator, they can't be walking in a straight path even from their own local perspective. They have to constantly veer from the locally straight path.
This is comically clear when the person is standing a metre away from the North Pole, where if they walk due west (constant latitude, increasing longitude) they will be walking in a circle two metres in diameter, and will arrive back where they started after perhaps ten paces. To go east, they walk around the same circle in the opposite direction. Were they to walk (and swim) straight forwards, they would follow a great circle that eventually brought them to a location exactly one metre from the South Pole.
So our first attempt at a rectangle had right-angle corners, but a person travelling on the latitude-following edges would need to continuously steer away from their locally straight path, so we cannot seriously accept this as a rectangle.
Perhaps we can get closer to a rectangle by tracing a shape that has four sides that are all parts of geodesics. Our non-standard coordinate system will help, as long as we stick to one of the areas away from the four poles.
But it's no use: we just trade one problem for another.
\begin{figure}[h]
\caption{Hardly a rectangle}
\centering
\begin{tikzpicture}[tdplot_main_coords, scale = 2]
\shade[ball color = lightgray,
opacity = 0.4
] (0,0,0) circle (1cm);
\tdplotsetrotatedcoords{0}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{70}{230}{}{}
\tdplotsetrotatedcoords{203}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{270}{80}{}{}
\tdplotsetrotatedcoords{225}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{220}{70}{}{}
\tdplotsetrotatedcoords{247}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{220}{50}{}{}
\tdplotsetrotatedcoords{270}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{292}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{315}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{337}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{200}{40}{}{}
\tdplotsetrotatedcoords{90}{0}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{210}{390}{}{}
\tdplotsetrotatedcoords{30}{22}{450};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{190}{360}{}{}
\tdplotsetrotatedcoords{30}{45}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{300}{410}{}{}
\tdplotsetrotatedcoords{30}{113}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{100}{270}{}{}
\tdplotsetrotatedcoords{30}{135}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{100}{270}{}{}
\tdplotsetrotatedcoords{30}{157}{0};
\tdplotdrawarc[tdplot_rotated_coords, dashed, color = gray]{(0,0,0)}{1}{100}{270}{}{}
\tdplotsetrotatedcoords{30}{113}{0};
\tdplotdrawarc[tdplot_rotated_coords, solid]{(0,0,0)}{1}{163.5}{207}{}
\tdplotsetrotatedcoords{30}{22}{450};
\tdplotdrawarc[tdplot_rotated_coords, solid]{(0,0,0)}{1}{219}{305}{}{}
\tdplotsetrotatedcoords{247}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, solid]{(0,0,0)}{1}{145}{76}{}{}
\tdplotsetrotatedcoords{337}{90}{0};
\tdplotdrawarc[tdplot_rotated_coords, solid]{(0,0,0)}{1}{152}{72}{}{}
\end{tikzpicture}
\end{figure}
To gain geodesic edges, we've lost all our right-angles. Also both our attempts have the problem that the opposite sides aren't the same length: equal changes in one of the coordinates does not imply equal distances travelled.
The shocking truth is that on a sphere there are no rectangles. There aren't even parallelograms, which is a disaster for vectors.
\subsection{Failure of vectors on the sphere}
Our intuitive picture of vectors as arrows is only going to mislead us now. An arrow has direction and length. Clearly at any point on the sphere we can choose a direction leading away from that point, and we know that the closest thing to a straight line on the surface of a sphere is a great circle, and we can measure distance around such a circle if we know the radius, which for a great circle is the same as the radius of the sphere itself. So what if we think of vectors as arcs of great circles? These curved arrows will have a direction and a length, and this suggests we can add them by placing them head to tail as usual, so points on the sphere correspond to elements of a vector space.
But this fails even the simplest requirement of a vector space. Vector addition must be commutative: $\vec{a} + \vec{b} = \vec{b} + \vec{a}$, and we usually picture this as a parallelogram giving use two equivalent routes between opposite corners. Suppose you stand on the equator facing north. You are going to make a journey by turning 90 degrees to the right and walking straight for 100 metres, then 90 degrees to the left and travelling 10,000 metres. That is, you will walk along the equator a short distance and then north along a curve of longitude for a much longer distance. You will therefore end up at a latitude that is exactly 10,000 metres from the equator. This is like adding two of our would-be vectors: the first leg is $\vec{a}$, the second is $\vec{b}$, and so the final destination is $\vec{a} + \vec{b}$. This is version 1 of the journey.
What if we create a version 2 by swapping the stages? There is no need to make any initial turn; just start walking 10,000 metres north (that is, along $\vec{b}$), then turn 90 degrees to the right before travelling 100 metres (along $\vec{a}$), which means walking "straight" along a geodesic far from the equator, and so you will diverge from the local curve of latitude and end up somewhat less than 10,000 metres from the equator, further south than the end of the first version of the journey. You will also be further to the east, because curves of longitude converge, so in version 1 the eastward translation by 100m was shrunk significantly by the journey north.
We are forced to conclude that $\vec{a} + \vec{b} \ne \vec{b} + \vec{a}$. Curved arrows are not elements of a vector space. There is no such thing as a curved vector space.
However, we began with what felt like a promising idea: at any point there is a real choice of directions we can face in, which is equivalent to choosing one of the infinite set of great circles through that point. There is nothing uncertain or subjective about this. If we can't have vectors, can we salvage something objective about directions?
To associate a number with a direction, we need to pick an origin direction that we label as zero, and $\pi$ is the exact opposite direction. An angle is a piece of objective information. We can carry that kind of information with us as we travel. That is, we move in direction $x$ while always holding an arrow that points in direction $y$, or we can note the angle between $x$ and $y$ and thus easily recreate $y$ at any point along our journey by using a protractor to measure the angle from the direction we're facing.
So is the angle $y$ an objective fact at every location? Absolutely not, and this time we don't even to compare the results of two journeys. Once again, start on the equator and head north. Keep going until you go right through the North Pole and hit the equator again on the other side of the globe, heading south. This whole time you've been carrying an arrow pointing in the direction you're travelling. Now you're going to turn 90 degrees to the right, but you take care to not change the direction of the arrow you're carrying, so now it points to your left. You travel around the equator back to your starting point. When you set off, the arrow you were carrying was pointing north, but now after two semicircular journeys it's pointing south. You were very careful never to rotate it even slightly, and as a result it is pointing the opposite way.
Alternatively you could start on the equator and head north but this time stop at the North Pole. Turn 90 degrees to the right (but keeping the arrow you're carrying pointing the same way as before, so now it's pointing to your left) and move down to the equator. One more turn 90 degrees to the right (the arrow is now pointing backward) and then travel back along the equator to your starting point. In this scenario the arrow will have rotated 90 degrees during the journey. Note how the effect is altered by the path we took.
Directions are not objective facts that can be moved around the space. If you take a vector on a journey, it's not possible to objectively say that you have taken the same vector to all the places you visit, no matter how careful you are to avoid rotating it. To return the vector to your starting point unchanged, you would need to exactly retrace your steps in reverse. If you took a different path back, your vector would twist out of recognition.
To add one vector to another, we have to transport the origin of one of the vectors to where the other vector ends, so they are head-to-tail. On what basis can we claim that the transported vector is still the same vector, given that we can't reliably say it continues to point in the same direction?
\section{Rebuilding vectors}
It seems for the moment we have to abandon any attempt to relate vectors at different locations to one another on a curved surface. But we can recover a form of this capability by stealth if we tread carefully. For now, forget any notion of distances, geodesics, trying to travel in a locally straight line or carrying a vector around on a journey like a pet being taken out for a day-trip to the beach, however adorable that may seem.
\subsection{Coordinate systems as scalar fields}
Scalars are reliable, simple, uncontroversial and easy to work with. We can certainly accept that there could be a scalar field on a curved surface, that is, a scalar-valued function of points, $x(\mathcal{P})$ that varies smoothly with position. Such fields describe familiar physical phenomena, such as temperature. The scalar field's value at a point can be measured and agreed upon by independent observers, regardless of how they arrived there, and obviously you can carry a scalar value with you on any journey over the surface without it mysteriously having changed when you get home --- a reassuring contrast with the twistiness of vectors.
One way to recognise that you are moving on a surface is to note how the environment is changing. If it is completely featureless then this will be difficult; this was surely a problem faced by early seafarers, as once you have sailed out of sight of land, one region of the ocean looks much like any other, so how do you know where you are? Even on a clear night with the map of the stars above you, you still need to know the time accurately to know your longitude, and unfortunately the behaviour of pendulum clocks was erratic at sea.
If you could measure various scalar values at each point that depended only on position, you might find that two of these fixed scalar fields vary with position in such a way that each point in some region has its own unique pair of scalar values. The values measured at a point would label that point; that is, they could be used as the coordinates of that point. Want to know where you are? Measure those two observable scalars --- now you know. Why do you need two? If you measure one scalar observable, that is likely not enough to say where you are, because there will probably be a contour consisting of all the points with the same value. If you measure a second observable, that will have its own contour that crosses the first one at your location. Both numbers together identify that crossing point. Note that the coordinate system may fail in various ways over a large area: suppose the contours for your two coordinates are two circles that cross each other in two places; if you wrote down those coordinates in your log book, there are two places you might have been. But we can assume such problems do not occur over a large enough area such that the coordinate system is practically useful.
A two-dimensional coordinate system is just a pair of scalar fields $x^i(\mathcal{P})$ where $i$ is $1$ or $2$. Obviously the curves along which $x^1$ is constant (the contours of $x^1$) are also curves along which only $x^2$ is changing, there being only two coordinates. In three dimensions we can hold one of the three coordinates constant and this will highlight a single two-dimensional sheet of points (a contour surface), on which we can again play with the remaining two coordinates to find contour curves and ultimately their crossing points.
The way these coordinate fields vary has nothing necessarily to do with any intrinsic curvature of the surface. So the coordinates don't necessarily tell us anything about surface itself at all. A polar coordinate system can be overlaid on a perfectly flat surface, and may be a helpful way to analyse some situations, especially those involving rotating systems. Intuitively we can say that the coordinate system is free to be "more curved than necessary", regardless of the surface.
\subsection{Each point has its own vector space}
As our scalar fields vary smoothly with position, at each point there is a direction in which a field's value increases most steeply. So we have a natural reason to think of not only of directions existing at each point, but of each scalar field providing us with an objective direction that any observer in that vicinity can determine by measurement: the direction of that field's steepest increase.
If there is a smoothly varying scalar field in the neighbourhood of a point, there is \textit{unavoidably} a gradient vector at that (and every) point. Furthermore there are infinitely many possible scalar fields that could be overlaid on the surface, each causing a different gradient to exist at a given point. We say that there is an entirely separate, isolated vector space at each point called the \textit{tangent space} at $\mathcal{P}$, whose elements are the vector gradients that would be associated with every possible scalar field around that point. This is sometimes pictured as a flat plane touching the surface at the point, and thus tangent to it. It is a perfectly good intuitive picture, but it's important to remember that we are interested here in describing the curved surface without reference to anything "outside" it. By tangent we really mean the idea in calculus that the derivative of a function gives us the linear approximation to the function around some location, the gradient of a tangent line. A vector in the tangent space is a direction along which we could take the derivative of a scalar field over the actual space.
Even before we choose our coordinate fields $x^i$, a vector space already "exists" at every point merely because scalar fields are a theoretical possibility. When two particular scalar fields have been overlaid on the surface, this picks out two basis vectors $\vec{e}_i$ at each point, pointing along the direction of motion due to an infinitesimal increase of the $i$th coordinate, and we can use these as the basis vectors within each point's tangent space.
Near to some area of interest these basis vectors are hopefully almost orthonormal, although further away, if the surface is curved, they will have to diverge more and more from orthonormality (either by a change in angles, or relative length, or both.) But in any case, they provide each point's vector space with a pair of basis vectors, as we assume they are linearly independent (at least over some useful area of the surface) and can therefore be blended in a linear combination to generate every possible vector in that point's vector space.
We can make all these concept more concrete by imagining that every point on the surface of a sphere has a pair of tangent arrows sprouting from it: a red arrow (the first basis vector) and a blue arrow (the second). At some location they may happen to be perfectly orthonormal, but at neighbouring points they will gradually diverge from orthonormality. In some places they may even become colinear, or one of them may become the zero vector, either of which is a disaster, but this kind of problem is inescapable when trying to cover a whole sphere with one coordinate system. It doesn't matter; we just need there to be a point where the red arrow points in a different direction from the blue arrow, and for it to be surrounded by similar such points.
What is the magnitude of these basis vectors? Of course it depends what you are measuring them against. If you measure them against themselves (and what else do you have, so far?) evidently they must be unit vectors in the coordinate system they arose from. But they can be pictured as getting longer in areas where it is necessary to travel farther to make a coordinate increase by 1.
\subsection{Basis vector notation}
One fairly standard notation for these basis vectors is curiously familiar, but is liable to trigger confusion if used everywhere:
$$
\frac{\partial}{\partial x^i} = \vec{e}_i
$$
It's exactly the same notation as a partial derivative with respect to one of the coordinates, except with nothing to differentiate, so it's just an operator. This very precisely captures the description of the $i$th basis vector as pointing along the direction of motion due to an infinitesimal increase of the $i$th coordinate. Such a motion could cause a corresponding infinitesimal change in the value of some unspecified field of scalars.
Some authors stick doggedly to this notation and never relinquish it, so any vector $\vec{v}$ in a point's tangent space can be constructed in terms of the basis written like this (in summation convention):
$$
\vec{v}
=
v^i \frac{\partial}{\partial x^i}
=
v^i \vec{e}_i
$$
Note that the resulting $\vec{v}$ is also an operator waiting for a field to operate on --- nothing has been differentiated, not even partially! Compare these two entirely different things:
$$
a \frac{\partial}{\partial x^1}
\quad \quad
\frac{\partial a}{\partial x^1}
$$
On the left, we are scaling the first basis vector by the factor $a$, the result being a vector at each point that is colinear with that basis vector. On the right we are finding the rate of change of some scalar $a$ as we change only the first coordinate, holding any other coordinate(s) constant, the result being a scalar at each point. It may be that $a$ doesn't vary due to that coordinate so the result is always $0$. It may be that $a = x^1$ everywhere, so the result everywhere is $1$. But it is a scalar, not a vector.
Also note that in this notation there is an $x^i$ on the bottom of the partial derivative. At first glance, due to the presence of a superscript index, you might mistake it for a basis covector $\vec{e}^i$, but it's not. It's a basis vector $\vec{e}_i$. For the purposes of tensor index manipulation, this $i$ must be treated as a lower index. In a topic already ripe with opportunities for confusion, this feels like an act of aggression, but we won't be able to avoid it when digging further into partial derivative expressions.
Some authors prefer the abbreviation:
$$
\partial_i = \frac{\partial}{\partial x^i} = \vec{e}_i
$$
which has the advantage of being slightly less effort to write out, and clarifies the index's lower position. But this notation is also freely used as an abbreviation for taking the partial derivative of whatever is on the right with respect to the $i$th coordinate, so is subject to much of the same potential for confusion.
To sidestep these issues, we'll stick with $\vec{e}_i$ for basis vectors except when we actually need to remember that they are the partial derivative operation with respect to a coordinate. This is a fact that can be easily substituted when required. There is such a thing as "cognitive load", the number of things you have to hold in your head at the same time to understand a topic. Feel free to lighten the load by mostly forgetting the origin story of our vector spaces as derivatives.
Speaking of covectors, we know they are needed to pull a vector $\vec{v}$ apart to get the coordinates $v^i$, at least in an awkward basis as ours is bound to be. For these (again, it's going to look familiar) some authors use the notation:
$$
dx^i
$$
It looks like what you'd see in an integral, but in this context it's a linear function that takes a vector from the tangent space and returns a scalar, essentially measuring it along the $i$th coordinate direction. As always with basis covectors, when they act on a basis vector the result is either $0$ or $1$, according to the rule:
$$
dx^i \left( \frac{\partial}{\partial x^j} \right)
=
\delta^i_j
$$
We may as well stick with $\vec{e}^{i}$ as our notation for basis covectors too, so we can express the relationship between the dual bases in the usual way:
$$
\langle \vec{e}^i , \vec{e}_i \rangle = \delta^i_j
$$
\subsection{The metric as a field}
For there to be an isomorphism between vectors and covectors, we need a metric tensor (§\ref{inner-product}). More precisely, as every point has its own independent vector space with its own basis vectors, we need a different metric tensor for each point: a metric tensor field. Indeed, this only makes sense because each point has its own separate vector space, because a vector space can only have one metric.
But at each point, in that point's private metric tensor space, it works as usual. Given a metric $g_{ij}$ at some point, we can convert the coordinates of any vector $\vec{v}$ chosen from that point's tangent space into the corresponding covector coordinates:
$$
\omega_i = g_{ij} v^j
$$
Thus we have all the regular equipment of the vector space, the dual covector space, and any tensor spaces we want to invent by combining these, and if we prefer, we can use abstract index notation to avoid even needing a notation for basis vectors: $v^i$ is interpreted as a vector, with no need to write $v^i \vec{e}_i$ explicitly. Some authors use Latin indices $a$, $b$ ... for this abstract notation, and reserve Greek indices (typically $\mu$, $\nu$...) for denoting actual numerical coordinates.
The metric will turn out to be the key to connecting our coordinate system to the physical reality. A change of the $x^i$ coordinate induces a real physical displacement by some distance. The metric at the starting point gives us that distance, or rather the linear approximation of it at the starting point, which becomes less reliable the larger the change of the coordinate.
One crucial distinction is that metric field in geometric terms may be constant everywhere in the surface, and yet its description in coordinates will depend on position. This may cause confusion, so be sure to play with this concept.
Over an arbitrarily short distance, the effect of any curvature becomes arbitrarily small. To put it another way, all surfaces are flat if you zoom in close enough! This means that the one metric at an exact point, which only describes that point and has no extent, does not describe curvature at all. A vector space has only one metric, and is always flat. We know that in a vector space with a metric, we can always choose a basis in which the metric is represented by the identity matrix.
But the symmetry of the sphere dictates that the metric is the same everywhere. If the metric could be represented by the identity matrix everywhere, how is the sphere any different from a flat space? The resolution to this apparent paradox lies in all the agonising we experienced when trying to cover the sphere with a consistent coordinate system spanning long distances. At some point on the sphere we can make something that is locally a square coordinate grid, and at that point the metric could be described by the identity matrix.
But as we move away from that point the coordinate grid becomes distorted by the curvature, so the same metric has to be described by a different matrix. The nature of the curvature is described by the change in the metric's matrix representation as we move around the surface. Geometrically speaking, it's the same metric everywhere, but described by different numbers due to the necessary warping of the coordinate grid --- it can't be square everywhere. If the metric is described by the identity matrix in one location, it will have to diverge from that description in neighbouring locations.
In the (latitude, longitude) coordinate system the longitude basis vectors get shorter toward the poles, and this has to be captured in the metric's representation in coordinates.
Alternatively in a more haphazardly warped surface, the metric itself may really be changing.
\subsection{Other potential sources of confusion}
We've previously thought of covectors as functions acting on vectors, and vectors as something more basic and elemental. Now (dreging up their underlying meaning briefly) the vectors also appear ready to act on something, but it's important to note that they are not functions that act on vectors to produce scalars, so they cannot be confused with covectors. In this case, a vector is able to operate on a scalar field in a way that a mere function cannot. The result depends on how the field changes in the immediate vicinity of the point, not just its value exactly at the point.
The tangent space is the set of all possible directions along which we can take the gradient of \textit{anything}. The basis tangent vectors are just a couple of directions from which we can construct all other possible directions. Vectors have a backstory as partial derivative operators, but we can set that aside when it feels confusing. The important thing is that we've found fully functioning vector spaces, but a completely separate one at every point. We still have nothing to relate their vectors, and no way to take a vector on a journey away from a point.
Another cause of potential confusion, stemming from mental habits acquired in thinking about flat surfaces that can be equated with a single vector space, is to think of a vector as being able to reach between distant points on any surface. After all, points have coordinates, and we can take the difference in the coordinates between two points, and that pair of numbers could be regarded as a vector in $R^N$. But this is just the failed "curved arrow" idea again. The metric allows us to convert coordinate changes into real distances, but the metric may also vary smoothly from place to place, so it is not able to accurately tell us anything if we take a single quantum jump from one coordinate pair to another. Everything must be done in a smooth, infinitesimal way.
If we have functions $p_i(t)$ giving the $N$ coordinates of a moving object on the surface, we can obtain the coordinates of velocity vector $\vec{v}$ at time $t$ by taking the time derivative of the position coordinates:
$$
v_i(t) = \frac{d p^i}{dt}
$$
This is an entirely valid vector, but only at the point $p_i(t)$. We could define the path as:
$$
p_1(t) = t \quad \quad p_2(t) = 0
$$
That is, the object is travelling along the first coordinate's curve, and its speed happens to be just so that the first coordinate tells you how many seconds have elapsed. Therefore at any point on the curve, the curve is aligned with the tangent vector given by the coordinates $(1, 0)$, which is just the first basis vector.
Alternatively the motion given by $p_i(t)$ could be far more complicated, and the derivative velocity would vary in both coordinates, but at each point the velocity vector would be an arrow tangent to the path as it passes through that point, expressed in tangent vector coordinates that are the individual derivatives of the position coordinates with respect to $t$.
\subsection{Distance along a curve}
The distance between two substantially separated points will depend on what happens along the journey between them. There must be some chosen curve that connects the two points, and we want to find the length of that curve.
The parameter $t$ that guides our journey along a curve $\mathcal{P}(t)$ can be thought of as time passing as we move. Of course, we could speed up and slow down during our journey, but actually it doesn't matter whether we travel at constant speed or not.\footnote{If we somehow knew that our speed was constant, we wouldn't need the metric to find the total distance for our journey --- we'd just need a clock.}
In our journey along some curve, in the time interval $\Delta t$ we move from $\mathcal{P}(t)$ to $\mathcal{P}(t + \Delta t)$. Our coordinates, given by $x^i(\mathcal{P})$, change by:
$$
x^i(\mathcal{P}(t + \Delta t))
-
x^i(\mathcal{P}(t))
$$
As we're immediately converting points $\mathcal{P}$ to coordinates $x^i$, we may as well directly say that the coordinates along the curve are functions $x^i(t)$, and so the coordinates change by:
$$
x^i(t + \Delta t)
-
x^i(t)
$$
But if we try to invoke the metric to convert these separate coordinate changes into a single scalar distance, we have a problem, because there are two metrics in play: the one at $t$ and the one at $t + \Delta t$. Which metric do we use to convert this coordinate changes into a distance? This is a job for calculus. We shrink $\Delta t$ toward zero so that we can use the metric at $t$, so the time interval involved is the infinitesimal $dt$.
We differentiate the $i$th coordinate with respect to $t$:
$$
v^i(t) = \frac{d}{dt} x^i(t)
$$
In keeping with the interpretation of $t$ as the elapsed time, and $x^i(t)$ as the coordinates of something moving along the curve, $v^i(t)$ would be the velocity at $t$. Equivalently it is the tangent vector to the curve at $x^i(t)$.
The metric $g_{ij}$ at each point is a $(0, 2)$ tensor, evaluating to a scalar when supplied with two vector inputs. To find the squared-length of a vector, we simply supply that vector as the input for both slots:
$$
g_{ij}(t)
v^i(t)
v^j(t)
$$
Here we're using the notation $g_{ij}(t)$ to remind ourselves that $g_{ij}$ is not a constant but is smoothly varying across our journey. Also note that we're using Einstein summation notation as usual, so the above is the sum over all pairs of $(i, j)$ values.
The integral of the distances yielded by the smoothly changing metric along the curve will be the total length. The infinitesimal contributions need only be the linear approximation at each point, so the metric at a point can be fully characterised by a matrix, $g_{ij}$, though this is necessarily a function of $t$, $g_{ij}(t)$, and \textit{that} function is decidedly not linear in general. Even so, we can write the integral down as:
$$
L =
\int
\sqrt{
\mathop{g_{ij}(t)}
\mathop{v^i(t)}
\mathop{v^j(t)}
}
\mathop{dt}
$$
That is, we've put the same vector into both the $i$ and $j$ slots of the metric tensor, and both that vector (the infinitesimal tangent displacement, the velocity) and the metric tensor are smoothly varying functions of the position along the curve. The scalar result of evaluating the tensor is the square of the infinitesimal distance, so we take the square root to get the infinitesimal distance. We can then integrate that over some portion of the curve, say $0 \le t \le 1$, to find the distance traversed.
\section{Calculus on a curved surface}
Our goal will be to formalise the idea of a geodesic, which we've previously likened to "flying straight" on a globe. To get to that goal we'll need a way of figuring out how a vector changes as we move it. Now we have basis vectors from our coordinate system, we need to confront the fact that whatever coordinate system we've chosen will have something arbitrary about it, and the truth about the geometry of the space is obscured by the behaviour of the coordinates, and the way the basis vectors change from place to place.
\subsection{How basis vectors change}
Ideally (for at least some good sized region) the coordinates are independent parameters that together specify a unique point. It is possible to vary one coordinate while holding other coordinates constant, and so trace out a specific path. It goes without saying that the partial derivative of one coordinate with respect to another is $0$ if they are different coordinates, or $1$ if they are the same:
$$
\frac{\partial x^i}{\partial x^j} = \delta^i_j
$$
But this independence does not apply to the basis vectors. In the standard global coordinate system, as we move north from the equator along a curve of longitude, our latitude is constant. But the latitude basis vector associated with each point becomes ever smaller, because if we were to increase the latitude coordinate, the resulting displacement would be smaller the further north we'd reached. This is a relatively simple example because at least the basis vectors remain orthogonal and only change length; in general they could twist relative to each other as well.
With the standard equipment provided by a vector space, we can describe numerically how the basis vectors vary with reference to each other. How does the $i$th basis vector change as we change only the $j$th coordinate?
$$
\frac{\partial \vec{e}_i}{\partial x^j}
$$
The answer is an ordinary vector. But that means we can describe it with ordinary numbers, by acting on it with the basis covectors $\vec{e}^k$ to fetch the coordinates in terms of the local basis:
$$
\Gamma\indices{^k_i_j} = \biggl< \vec{e}^k , \frac{\partial \vec{e}_i}{\partial x^j} \biggr>
$$
This $\Gamma$ is called the Christoffel symbol. For two dimensions, is a collection of $2^3 = 8$ numbers defined at each point. Structurally (as suggested by how we've indexed it) it is like a $(1, 2)$-tensor (one up, two down). But it's not really a tensor, because it encodes information about the coordinate basis we've chosen, rather than anything geometrically intrinsic. It will nevertheless play an essential role in accounting for the arbitrary way the basis vectors change, so we need to know what it is when working with coordinates.
In summary, the way the $i$th basis vector changes as we change only the $j$th coordinate is described by a vector that can be written as a linear combination of the basis vectors:
$$
\frac{\partial \vec{e}_i}{\partial x^j}
= \Gamma\indices{^k_i_j} \vec{e}_k
$$
It should be noted that although we've given a symbol to the object that describes how the basis vectors work, we haven't said how to determine what it is. We have great freedom to set the numbers however we like, which means we don't yet really know how to describe the twisting of our basis vectors. There is seemingly an infinity of possible ways to do so. Patience --- we are going to find a single, unique, natural definition for the Christoffel symbol that works perfectly for physics.
One thing we can observe right away, and which will be very useful in a while, is that the two lower indices can be interchanged, as is clear if you consider the underlying meaning of $\vec{e}_i$, which is itself the act of differentiating in the direction of the $i$th coordinate, so it follows that we are building an operator that differentiates twice, and the order of taking derivatives doesn't matter for smooth functions. So:
$$
\Gamma\indices{^k_i_j} = \Gamma\indices{^k_j_i}
$$
That is, among the numbers that make up this pseudo-tensor there is much duplication, in a similar manner to a symmetric matrix. If you fetch a value with the $i$ and $j$ indices reversed by accident, it wouldn't matter because you'll get the same number.
Also, like any tensor we can raise and lower indices if we have a metric. So for example:
$$
\Gamma\indices{_k_i_j}
=
g_{kl}
\Gamma\indices{^l_i_j}
$$
The $k$ index now denotes a basis vector instead of a basis covector, so instead of a covector acting on the vector derivative to extract a scalar coordinate, the inner product is used to do the same thing with a vector:
$$
\Gamma\indices{_k_i_j} = \left( \vec{e}_k , \frac{\partial \vec{e}_i}{\partial x^j} \right)
$$
\subsection{Using the metric to get the connection coefficients}
The Christoffel symbol $\Gamma\indices{^k_i_j}$ contains information about how the basis vectors change with position, whereas the metric tells you how to relate the coordinate representation of vectors to genuine distances and angles. If you know your velocity at an instant at some point in the space, but it's expressed as the rate at which each coordinate is changing (that is, as coordinates of the velocity vector in the basis), the metric allows you to convert these multiple coordinates into a single scalar: your actual speed, which has nothing to do with the structure of the coordinate system. This is of course the whole point of tensors: they compute scalars out of vectors so we can compare predictions with reality.
But like everything else in this topic, the metric is really a field: a smooth function of position, $g(\mathcal{P})\indices{_i_j}$ --- we don't bother to include the parameter because it is implicit in all the objects we're considering. Because it's a smooth function, it's possible to take a derivative of it with respect to a coordinate direction, and this produces precisely the information we need for Christoffel symbol.
The Christoffel symbol is defined as:
\begin{equation}
\Gamma\indices{^k_i_j}
=
\biggl<
\vec{e}^k,
\frac{\partial \vec{e}_i}{\partial x^j}
\biggr>
\end{equation}
That is, the $k$th basis covector acts on the vector that is the derivative of the $i$th basis vector with respect to the $j$th coordinate. Having reduced these to a pseudo-tensor, we can raise and lower the indices with the metric tensor as we saw before.
Meanwhile, the metric tensor is just the inner product of all pairs of basis vectors:
\begin{equation}
g_{ij} = \left( \vec{e}_i,\vec{e}_j \right)
\end{equation}
We can take the derivative of it with respect to coordinate direction $k$:
\begin{equation}
\frac{\partial g_{ij}}{\partial x^k}
=
\frac{\partial}{\partial x^k}
\left(
\vec{e}_i,\vec{e}_j
\right)
\end{equation}
The result of the metric tensor applied to two vectors, i.e. the inner product, is a scalar, so we are taking the derivative of a scalar function.
We will assume for the moment that something like the product rule works over the inner product. It is not obvious why, so we will return to this later:
\begin{equation}
\frac{\partial g_{ij}}{\partial x^k}
=
\left(
\frac{\partial \vec{e}_i}{\partial x^k}
,\vec{e}_j
\right)
+
\left(
\frac{\partial \vec{e}_j}{\partial x^k}
,\vec{e}_i
\right)
\end{equation}
And then we notice that these inner product expressions are just Christoffel symbols with all the indices lower (following a standard order becomes very important now, so the first index is the one that is not interchangeable with the other two):
\begin{equation}
\frac{\partial g_{ij}}{\partial x^k}
=
\Gamma_{jik}+\Gamma_{ijk}
\end{equation}
It follows that we can produce equivalents for all the combinations of $i$, $j$, $k$.
\begin{equation}
\frac{\partial g_{kj}}{\partial x^i}
=
\Gamma_{jki}+\Gamma_{kji}
\end{equation}
\begin{equation}
\frac{\partial g_{ik}}{\partial x^j}
=
\Gamma_{kij}+\Gamma_{ikj}
\end{equation}
But the last two indices are interchangeable, $\Gamma_{kij} = \Gamma_{kji}$, which means we can get most of the terms to cancel by adding and subtracting appropriately:
\begin{equation}
\begin{split}
\frac{\partial g_{kj}}{\partial x^i}
+
\frac{\partial g_{ik}}{\partial x^j}
-
\frac{\partial g_{ij}}{\partial x^k}
&=
\Gamma_{jki}+\Gamma_{kji}
+
\Gamma_{kij}+\Gamma_{ikj}
-
\Gamma_{jik}-\Gamma_{ijk} \\
&=
(\Gamma_{jik}-\Gamma_{jik})
+
(\Gamma_{ijk}-\Gamma_{ijk})
+
(\Gamma_{kij}+\Gamma_{kij}) \\
&= 2\Gamma_{kij}
\end{split}
\end{equation}
And so:
\begin{equation}
\Gamma_{kij}
=
\frac{1}{2}
\left(
\frac{\partial g_{kj}}{\partial x^i}
+
\frac{\partial g_{ik}}{\partial x^j}
-
\frac{\partial g_{ij}}{\partial x^k}
\right)
\end{equation}
And therefore we can get the original form of the Christoffel symbol by using the raising metric to hoist the first index:
\begin{equation}
\Gamma\indices{^l_i_j}
=
g^{kl}
\frac{1}{2}
\left(
\frac{\partial g_{kj}}{\partial x^i}
+
\frac{\partial g_{ik}}{\partial x^j}
-
\frac{\partial g_{ij}}{\partial x^k}
\right)
\end{equation}
The beauty of this is that if we have the metric as a function of position, it's just a collection of numbers address by indices, so each of the partial derivative terms is an ordinary scalar derivative, producing an ordinary number. We haven't yet had to take a derivative of a vector field, but are now equipped with the tool we need to tackle that challenge.
\subsection{Derivative of a vector field}
We previously took partial derivatives of scalar fields, using this to find our basis vectors, and by linear combinations of those basis vectors we can describe any vector in the tangent space existing at each point. These could correspond to the gradients of any scalar fields, beyond those that serve as our coordinate system. But this of course means that we can envisage vector fields --- indeed, that's what the basis vectors are.
We can consider how we would calculate the derivative of a vector field, or indeed any tensor field, just because the challenge is there. But there is a more urgent motivation: the ability to take the derivative of a vector field is the key to solving the problem of relating vectors in different tangent spaces. The derivative is notionally the difference between the vectors at two points separated by an infinitesimal distance. If the difference is zero, they aren't different --- they are \textit{connected}.
The normal notation for the geometrical derivative of a scalar field $a$ is $\grad a$. In simplified discussions it is loosely described as the gradient vector field. However it would be more accurate to call it a covector field, a distinction we don't care about if we have the luxury of an orthonormal basis, allowing us to interchangeably talk about the total derivative or the gradient. But now we are unable to assume orthonormality, we must remember that $\grad a$ produces the total derivative, a covector that can act on a vector $\vec{v}$ to produce the scalar change in $a$ due to a displacement by $\vec{v}$.
We can generalise the $\grad$ operator to operate on any tensor field, whether a scalar, a vector, a covector or any higher type of tensor. For every number describing the input, we will find how it changes with respect to each coordinate of the space separately. Therefore if the input field $a$ is a tensor at each point of type $(k, l)$ then $\grad a$ is of type $(k, l + 1)$. That is, in coordinate form, and N dimensions, the input field is described by $N^{(k+l)}$ numbers. Taking the derivative involves finding $N$ numbers (in an $N$ dimensional space) for every number in the input, so requiring $N^{(k+l+1)}$ numbers to describe the output.
We can indicate this increase in the covariant tensor type by writing the operator with its own lower index $\grad_i$. That is, an expression such as $\grad_i v^j$ stands for $N^2$ expressions (like a matrix), one vector (column) for each coordinate direction $i$ it takes the derivative along and one row for each vector coordinate $j$. Of course we are not limited to inquiring about the coordinate basis directions: as this extra index is covariant (lower), we can contract it with a vector (upper) to get the derivative in the direction of that vector. This is sometimes abbreviated by putting the vector of interest as the lower index:
$$
\grad_{\vec{a}} \vec{v} = a^i (\grad_i v^j) \vec{e}_j
$$
As an operator we can think of (and write down) $\grad$ geometrically, operating directly on vectors and such, or algebraicly, as a means of taking the derivatives of numerical quantities labelled by indices. Both are valid notations, as shown in the above example. The operator without a subscript (though rarely used from here on) can apply to a geometrical vector to produce a geometrical tensor, which itself may be factored into an algebraic/numeric equivalent multiplied by a tensor basis:
$$
\grad \vec{v} = (\grad_i v^j) \vec{e}^i \otimes \vec{e}_j,
$$
But using a kind of doublethink to switch between interpretations, we can merely write:
$$
\grad_i v^j
$$
Depending on our frame of mind, this is either a matrix of numbers addressed by covariant $i$ and contravariant $j$, supporting contraction with opposite-sense indices according to the usual rules, or else it's abstract index notation for a geometric object, a $V^* \otimes V$ tensor -- a two-slot machine accepting a vector and a covector to produce a scalar value.
If all this operator did was to take the partial derivative of every component in the input tensor with respect to the $i$th coordinate (sometimes called the \textit{ordinary derivative}), it would be influenced by the perhaps twisty behaviour of the coordinate basis. Consider a vector field expressed in the basis at each point:
$$
\vec{v} = v^i \vec{e}_i
$$
This is a product of two things that could \textit{both} change as we vary one coordinate. So to take the partial derivative with respect to each coordinate $j$, we need to use the product rule:
$$
\frac{\partial \vec{v}}{x^j}
= \frac{\partial v^i}{\partial x^j} \vec{e}_i
+ v^i\frac{\vec{e}_i}{\partial x^j}
$$
The second term includes the partial derivative of a basis vector, which we already found we could write as a linear combination in terms of the Christoffel symbol as:
$$
\frac{\partial \vec{e}_i}{\partial x^j}
= \Gamma\indices{^k_i_j} \vec{e}_k
$$
So we can substitute that:
$$
\frac{\partial \vec{v}}{x^j}
= \frac{\partial v^i}{\partial x^j} \vec{e}_i
+ v^i \Gamma\indices{^k_i_j} \vec{e}_k
$$
Now in the first term we're using $i$ as the summation index but that's an arbitrary label. The summation convention means that the first term is actually $N$ terms (two if we're working in two dimensions) that will be evaluated and added, and we can freely switch labels. If we switch that index to $k$:
$$
\frac{\partial \vec{v}}{x^j}
= \frac{\partial v^k}{\partial x^j} \vec{e}_k
+ v^i \Gamma\indices{^k_i_j} \vec{e}_k
$$
we can pull out the $\vec{e}_k$ basis vector common factor:
$$
\frac{\partial \vec{v}}{x^j}
= \left(
\frac{\partial v^k}{\partial x^j}
+ v^i \Gamma\indices{^k_i_j}
\right)
\vec{e}_k
$$
This is still a summation over $k$, and is now more clearly building a vector out of a linear combination of the basis vectors. The factor in parentheses is just a number, the $k$th coordinate of the vector being computed. It is the ordinary derivative but "corrected" by the information in $\Gamma$ that concerns the behaviour of the basis vectors alone. This is the true \textit{covariant derivative} of a vector field. Geometrically (though still neatly factored):
$$
\grad_j \vec{v}
= \left(
\frac{\partial v^k}{\partial x^j}
+ v^i \Gamma\indices{^k_i_j}
\right)
\vec{e}_k
$$
or leaving it as a type $(1, 1)$-tensor:
$$
\grad \vec{v}
= \left(
\frac{\partial v^k}{\partial x^j}
+ v^i \Gamma\indices{^k_i_j}
\right)
\vec{e}^j \otimes \vec{e}_k
$$
or trimming away all vestiges of explicit geometry:
$$
\grad_j v^k
=
\frac{\partial v^k}{\partial x^j}
+ v^i \Gamma\indices{^k_i_j}
$$
The covariant derivative connects vectors between different locations. Given two nearby points, $\mathcal{P}$ and $\mathcal{Q}$, and a vector $\vec{v}$ at $\mathcal{P}$ and a vector $\vec{w}$ at $\mathcal{Q}$, can we say that $\vec{v}$ and $\vec{w}$ are the same vector? No, this isn't a meaningful concept by itself. We have to specify a curve along which we travel between $\mathcal{P}$ and $\mathcal{Q}$. Each infinitesimal movement gives us an unambiguous connection to the next equivalent vector, and it could be the case that by transporting $\vec{v}$ from $\mathcal{P}$ to $\mathcal{Q}$, it arrives identical to $\vec{w}$, depending on the curve.
\subsection{Identifying geodesic curves}
Suppose that there is a vector field in the space, $\vec{f}(\mathcal{P})$, which we can represent in coordinates as $N$ smooth functions $\mathbb{R}^N \mapsto \mathbb{R}$ of position coordinates $x^i$:
$$
f^i(x) = f^i(x^1, x^2 \dots x^N)
$$
As always, feel free to assume $N = 2$ for visualisation purposes, so that this is simply a way to map a pair of position coordinates to a pair of vector coordinates at that position, a field of arrows all over the surface, with lengths and directions that smoothly vary with position.
Meanwhile, a curve is a function $\mathbb{R} \mapsto \mathcal{P}$ from some real parameter, $t$. Again, this is sometimes likened to time passing as we make the journey along the curve, but really it's just an independent variable that we can smoothly adjust however we like, like a control knob that makes a point move back and forth on the curve. But we'll say "at time $t$" for short.
The coordinates of the points on the curve are defined by $c^i(t)$, which can be thought of as both a function $\mathbb{R} \mapsto \mathbb{R}^N$ and also $N$ functions $\mathbb{R} \mapsto \mathbb{R}$, giving the coordinates of the point on the curve $c$ at time $t$.
We now have a way to get \textit{two} vectors at each point. First, we can hand the coordinates $c^i(t)$ straight to our vector field functions $f^i(x)$:
$$
f^i(c(t)) = f^i(c^1(t), c^2(t) \dots c^N(t))
$$
So we have a field vector for every point along the curve. Second, we can get the "velocity" vector $\vec{v}$ at time time $t$ in our journey, which is simply the ordinary derivative of the coordinates of our curved path:
$$
v^i(t) = \frac{d c^i(t)}{dt} = 0
$$
Note that this is not the same as taking the derivative of a vector field: the coordinates of points are not vectors, because the surface is not itself a vector space.
Now, it may be that along the curve, either of these vectors, e.g. $f$, always has the same coordinates:
$$
\frac{d f^i(t)}{dt} = 0
$$
But this would not by itself be of any significance: if we changed to a different coordinate system, naturally these same vectors would be represented by different coordinates that might not remain constant along the curve. Obviously it entirely depends what the coordinate system is doing in that region. On the global with the usual coordinate system, if the field of vectors happens to be identical to the basis vectors aligned with the curves of constant longitude, they will have the coordinates $(1, 0)$ at every point, and so along any curved path the ordinary derivative of the coordinates will be $0$. But if we change the coordinate system, the field vectors will no longer match it and the derivative of their coordinates will depend on the path taken.
Far more interesting is the covariant derivative of $\vec{f}(t)$ in the direction of $\vec{v}(t)$. What if that is zero everywhere along the curve?
$$
\grad_{\vec{v(t)}} \vec{f}(t)
=
v^i(t) \grad_i f^j(t) = 0
$$
That would be an intrinsic geometric fact about the vectors, regardless of the coordinate system. If the covariant derivative of the vectors along the curve is zero in one coordinate system, it must be zero in all coordinate systems, although the raw coordinates of the vectors themselves along the curve may be changing.
In other words, the covariant derivative may say that there is no change to the vector, even though the coordinates are changing. Conversely, the coordinates might remain constant and yet the covariant derivative could be non-zero. This is the effect of the extra Christoffel term in the definition of the derivative, but it has the effect of producing a result that is independent of the coordinate system.
If we do happen to find a curve along which the covariant derivative is always zero, then we say that as the parameter $t$ goes from (say) $0$ to $1$, and therefore we move along the curve from $c_i(0)$ to $c_i(1)$, the vector $f_i(0)$ has been \textit{parallel transported} to $f_i(1)$. It's not that we can say that $f_i(1)$ is parallel to $f_i(0)$ in some general sense, only that it is parallel along this curve.
Does this mean we've found some kind of "straight" path? Not at all. The curve we are following may have to wind this way and that, as we feel our way through a messy vector field looking for "the same" vector right next to our current position.
But we can get as close as possible to a straight path. We are at last ready to say exactly what a geodesic is:
$$
\grad_{\vec{v(t)}} \vec{v}(t)
=
v^i(t) \grad_i v^j(t) = 0
$$
We're no longer paying any attention to the field $\vec{f}$, as we're interested only in a property of the spatial geometry itself. The velocity vector $\vec{v}$ takes the place of $\vec{f}$ in our requirement. As $t$ increases, $\vec{v}(t)$ remains "the same" vector as it transports itself forward. If our curve does this, it is a geodesic.
Moving around the equator on the globe is a geodesic, but it's a misleading example in the (latitude, longitude) coordinate system because it exactly coincides with a curve of constant latitude, varying longitude, and \textit{that} curve happens to be a geodesic. As a result the velocity vector in coordinates is fixed as $(0, \lambda)$ where $\lambda$ is some constant angular speed around the globe, and the covariant derivative agrees that it is constant. It's similar if we move around any of the curves of constant longitude, which are also geodesics. This is misleading because we might conclude that constant velocity coordinates implies a geodesic path.
But what about a geodesic that we get by slightly tilting the equator, so on one side of the globe it dips below the real equator and on the other side rises above it? The latitude position coordinate is evidently undulating up and down in a sinusoidal-like way, and therefore the latitude coordinate of the velocity vector must being undulating likewise, albeit out of phase with the position. The coordinates of the velocity vector are definitely not constant around the curve. And yet the covariant derivative will insist that it remains the "same" vector, because $\grad_{\vec{v(t)}} \vec{v}(t) = 0$.
And what if we follow a curve of constant latitude north of the equator? This is the same as saying that the velocity vector in coordinates is fixed as $(0, \lambda)$ where $\lambda$ is some constant angular speed around the globe. But an aircraft following such a flight path to the east will have to continually steer slightly to the left. It is not following a geodesic, and the covariant derivative will tell us so: it will describe the way the pilot has to steer to add a sideways twist to the velocity vector as it transports the aircraft.
Furthermore if the pilot happens to be holding an arrow throughout the flight, she may very slowly turn it to the right, i.e. in the opposite direction to which the aircraft is steering, but at exactly the same angular speed. Logically this means the arrow is not turning - it's only turning relative to the aircraft, and so it is being parallel transported along the flight path. However if we observed the arrow from space, travelling along the path of constant latitude, what we'd see is an arrow slowly rotating clockwise! This is what happens if you parallel transport a vector along a curve that isn't a geodesic.
\subsection{Example: identifying geodesics on the sphere}
To make all this more concrete and eliminate any lingering mystery, it will be worth going through some examples on the sphere, working out the actual definitions of the functions and derivatives involved.
Let's use the usual global coordinate system but in radians $(\theta, \phi)$, with latitude $\theta$ as $0$ at the North Pole, $\pi/2$ at the equator and $\pi$ at the South Pole, and longitude $\phi$ taking us alway the way around the globe from $0$ to $2\pi$. There's a tangent space everywhere, and each space has a pair of basis vectors, the first pointing in the direction of increasing latitude, and the second along increasing longitude.
A note about the standard symbols: they are excellent visual mnemonics in that $\theta$ resembles a circle with a horizontal line across it, like a curve of constant latitude on a sphere with its polar axis vertically aligned, and the $\phi$ symbol is similar but with a (roughly) vertical line, like a curve of constant longitude. We can use them (instead of integers) as labels for index values, so that $v^i$ means either $v^\theta$ or $v^\phi$. But do bear in mind that the corresponding basis vectors are at right angles to their contours, so $e_{\phi}$ points in the direction of increasing longitude, which is along a curve of constant latitude. If we were just labelling the unit vectors, $(\vec{e}_{\downarrow}, \vec{e}_{\rightarrow})$ might be better than $(\vec{e}_{\theta}, \vec{e}_{\phi})$, but as long as we remember the standard meaning we'll be fine.
Also note that when giving coordinates we don't need to say $x^{\theta}$ and $x^{\phi}$ out of some dogmatic consistency. When the definition of a tensor talks about $x^i$, that means either $\theta$ or $\phi$.
The first important fact about this system is that the basis vectors are orthogonal everywhere. The second is that they are not ortho\textit{normal}, except at the equator, where a change in one of the coordinates by some angle produces the same actual displacement. But north or south of the equator, a circle of constant latitude is a smaller circle, so an angular change in longitude must produce a smaller displacment. The very fact that we can confidently assert something about the actual length implied by coordinate changes means that we have an intuitive sense of a metric and its representation in coordinates.
We can quantify this exactly. For a unit sphere, the radius of the circle of constant latitude $\theta$ is $\sin \theta$, so it is $0$ at the poles and reaches a maximum of $1$ at the equator. We can use plain old $\mathbb{R}^2$ as a canvas for the tangent space, allowing us to describe the basis vectors as pairs of numbers. It's like we're carrying a rigid square grid to every point on the globe so we can measure the local tangent $(\vec{e}_{\theta}, \vec{e}_{\phi})$ basis vectors against the grid. We'll say the latitude vector $\vec{e}_{\theta}$ is due south, $(1, 0)$ everywhere,\footnote{As long as we stay away from the poles, where things get awkward.} as there is fixed relationship between the latitude angle $\theta$ and distance travelled on the surface. The longitude vector is $(0, \sin \theta)$, due east, its length shrinking as we get further away from the equator.
Because the basis vectors here are being described numerically on $\mathbb{R}^2$, against a notional underlying orthonormal coordinate system, we can just use the dot product as the inner product. The metric being given by:
$$
g_{ij} = (\vec{e}_i,\vec{e}_j)
$$
so we have:
$$
g_{ij} = \begin{bmatrix}1 & 0 \\ 0 & \sin^2 \theta\end{bmatrix}
$$
It (or specifically $g_{\phi\phi}$) is a function of position, although it only depends on latitude $\theta$. The inverse (raising) metric will also be needed:
$$
g^{ij} = \begin{bmatrix}\frac{1}{\sin^2 \theta} & 0 \\ 0 & 1\end{bmatrix}
$$
Next we need the Christoffel symbol:
$$
\Gamma\indices{_i_j^l}
=
g^{kl}
\frac{1}{2}
\left(
\frac{\partial g_{kj}}{\partial x^i}
+
\frac{\partial g_{ik}}{\partial x^j}
-
\frac{\partial g_{ij}}{\partial x^k}
\right)
$$
It's so handy using indices to describe it, but now we face the harsh reality that we have four indices labelling a choice of coordinate, which suggests $2^4=16$ combinations to figure out. What a chore. But in this example things simplify considerably.
If we define the part in parentheses with a temporary symbol:
$$
D_{ijk}
=
\frac{\partial g_{kj}}{\partial x^i}
+
\frac{\partial g_{ik}}{\partial x^j}
-
\frac{\partial g_{ij}}{\partial x^k}
$$
The outer contraction is then just:
\begin{equation}
\begin{split}
\Gamma\indices{_i_j^l}
&= \frac{1}{2} g^{kl} D_{ijk} \\