Skip to content

Latest commit

 

History

History
367 lines (191 loc) · 21.9 KB

TheTechTree.md

File metadata and controls

367 lines (191 loc) · 21.9 KB

GitHub Arctic Code Vault: Tech Tree

Introduction: A Guide To The Tech Tree

What follows, which we call the Tech Tree, is a selection of works intended to describe how the world makes and uses software today, as well as an overview of how computers work and the foundational technologies required to make and use computers. The purpose of the GitHub Archive Program is to preserve open source software for future generations. This implies also preserving the knowledge of other technologies on which open-source software runs, along with a depiction of the open-source movement which brought this software into being.

This initial version of the Tech Tree will consist almost entirely of copies of pre-existing works, none of which were written for an unknown audience a long way into the future. As such it is not so much a guide as a collection of resources that we hope will be historically interesting and/or useful. We have tried to strike a balance between abstract/theoretical and concrete/practical work, and to provide at least an overview of the entire technical stack on which modern software engineering rests.

In addition to this technical documentation, we have also included a selection of artistic, cultural, and historical works, to help describe the overall cultural context in which this archive was created. A data dump of an entire 2020 snapshot of Wikipedia in the archive's five primary languages, along with a snapshot of Stack Overflow and sundry other smaller data collections, is also stored alongside the works itemized below.

The current iteration of the Tech Tree is loosely divided into the following thirteen sections:

  1. Fundamentals of computing and the Internet: the essentials of how computers work, and, at least as important to today's world, how they are connected together into a single planetary network which includes most of the computers on Earth.
  2. Algorithms and data structures: processes, sets of rules, and methods of arranging data to solve common categories of problems in efficient ways. Metaphorically, algorithms are the intelligence in a software program, and data structures are its storage.
  3. Compilers, assembler, and operating systems: how written source code becomes the machine code which causes the electrical signals inside a computer to change in a controlled manner, and the theory of operating systems, the software which supports a computer's basic functions and provides the fundamental, low-level functionality that all other software ultimately calls upon.
  4. Programming languages: some of the world's most popular and widely used programming languages described in detail. While, fundamentally, any program can be written in any language, certain languages are better or worse at particular tasks.
  5. Networking and connectivity: how computers connect to one another, via physical wires and radio signals, both one-on-one and in larger networks. Includes descriptions of the structure of the global "network of networks" known as the Internet, which connects most of the computers on Earth.
  6. Modern software development: the processes and procedures of dealing with software projects, tools, and services at scale, with constant monitoring and communication, at assured levels of quality.
  7. Modern software applications: in-depth description of applications such as Web development (the Web is, essentially, that part of the Internet used to display output and receive input from human beings); scientific research and analysis; image processing; pattern recognition and generation via neural networks; software distributed across many different computers; cryptocurrencies, which can be used as a platform for trustless decentralized software; and the new field of quantum computing.
  8. Hardware architectures: the concepts, structures, and layout of computer hardware. Hardware refers to physical electronic components; hardware architecture refers to how those components are structured and connected in order to run software; and software ultimately becomes ephemeral patterns of electricity within those physical components.
  9. Hardware development: how to build simple computers from collections of electronic components.
  10. Electronic components, transistors, semiconductor manufacturing: those electronic components which predated computers, along with individual transistors, the component from which computers are made, and an overview of the technologies and processes of fabricating interconnected transistors at scale.
  11. Pre-industrial technologies: technologies of eras which predated electricity.
  12. Fiction, culture, and history: human histories and changing human cultures, mostly through the lens of celebrated fictional narratives written over the last 150 years.
  13. Cultural context: information about humanity at the time the Tech Tree was created; in particular, a snapshot of Wikipedia, a collectively generated repository of all sorts of information about our world. Due to Wikipedia's enormous size, this section is provided as encoded data, like the rest of the archive, rather than as visual/readable pages.

The first seven sections are devoted to software, the purpose and content of the GitHub Arctic Code Vault, and its uses and applications. The next four sections describe the technologies required to construct computers on which software might run. The remaining two are intended to illustrate the human context in which these technologies have been developed, the stories the cultures of our era told, the languages in which we told them, and the factual background and descriptions of the world in which we lived.

The Tech Tree is part of the much larger GitHub Arctic Code Vault. As such, it also includes, as an appendix, visual copies of the Guide to the GitHub Code Vault, along with an index of the archive's fifteen thousand most significant code repositories, including brief descriptions and locations within the archive.

It is perhaps worth noting that our advisory board stressed that ours is likely to be the best-documented era in human history by far, so bundling the Tech Tree with the archive is likely to be more convenient than essential for its inheritors. As such, it is entirely possible -- indeed quite likely -- that its value will consist largely of providing context regarding the era and culture in which the archive was created, rather than as a source of new and unavailable knowledge, though of course there are imaginable futures in which it plays the latter role.

What follows is a brief summary of each section, describing both the general topics it covers, and the works the Tech Tree includes to document our current understanding of those topics.

Fundamentals of computing and the Internet

These books describe what computers are, from the silicon up -- electricity, transistors, binary logic, digital gates, bits, bytes, chips, ALUs, microprocessors, software -- as well as introducing what they can do. It also includes books which describe, at a high level, how computers can be connected together, and what that means. The works in question are:

The Pattern On The Stone by W. Daniel Hills (Basic Books)

But How Do It Know? by J. Clark Scott (John C Scott)

Code: The Hidden Language of Computer Hardware and Software by Charles Petzold (Pearson Education)

Computer Fundamentals by Anita Goel (Pearson Education)

How The Internet Really Works by Ulrike Uhlig, Mallory Knodel, Niels ten Oever, Corinne Cath, and Catnip (No Starch)

Algorithms and data structures

These are the fundamentals of computer science, and hence the foundation of software engineering; describing how data is structured and stored, and the most effective and efficient ways in which it can be processed.

The Art of Computer Programming by Donald Knuth (Pearson)

Algorithmic Thinking by Daniel Zingaro (No Starch)

Sequential and Parallel Algorithms and Data Structures by Peter Sanders, Kurt Mehlhorn, Martin Dietzfelbinger, Roman Dementiev (Springer)

Cryptography by Simon Rubinstein-Salzedo (Springer)

Mastering SciPy by Francisco J. Blanco-Silva (Packt)

Everyday Data Structures by William Smith (Packt)

Database Internals by Alex Petrov (O'Reilly)

Introduction to the Theory of Computation by Michael Sipser (Cengage)

Think Like A Programmer by V. Anton Spraul (No Starch)

Write Great Code by Randall Hyde (No Starch)

Compilers, assembler, and operating systems

The purpose of the Archive Program is to preserve software, and these are the fundamental building blocks of software. These books help to explain how high-level written software becomes low-level electrical impulses:

A Practical Approach To Compiler Construction by Des Watson (Springer)

The Secret Life Of Programs by Jonathan Steinhart (No Starch)

The Art of Assembly Language by Randall Hyde (No Starch)

Understanding the Linux Kernel by Daniel P. Bovet and Marco Cesati (O'Reilly)

Mastering Linux Kernel Development by Raghu Bharadwaj (Packt)

The Linux Programming Interface by Michael Kerrick (No Starch)

Programming languages

There are hundreds of programming languages; the enormous chart visualizing their evolution at the Computer History Museum is worth visiting if you're a developer, and we don't intend to document them all. Still, accessible book-length descriptions of a selection of the world's major languages seems desirable.

Introducing Python by Bill Lubanovic (O'Reilly)

Comprehensive Ruby Programming by Jordan Hudgens (Packt)

LISP, Lore, and Logic by W. Richard Stark (Springer)

The C Programming Language by Kernighan and Ritchie (Pearson)

Learn C The Hard Way by Zed Shaw (Pearson)

Head First C by David Griffiths, Dawn Griffiths (O'Reilly)

Effective C by Robert Seacord (No Starch)

The C++ Primer by Stanley B. Lippman, Josée Lajoie, and Barbara E. Moo (Pearson)

Programming Rust by Jim Blandy and Jason Orendorff (O'Reilly)

The Rust Programming Language by Steve Klabnik and Carol Nichols (No Starch Press)

The Go Programming Language by Alan A. A. Donovan and Brian W. Kernighan (Pearson)

Head First Go by Jay McGavren (O'Reilly)

Learning Java by Patrick Niemeyer and Daniel Leuck (O'Reilly)

The Java Virtual Machine Specification by Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley (Pearson)

JavaScript: The Definitive Guide by David Flanagan (O'Reilly)

JavaScript: The Good Parts by Douglas Crockford (O'Reilly)

Automate the Boring Stuff with Python by Al Sweigart (No Starch)

Learning Swift by Jonathon Manning, Paris Buttfield-Addison, and Tim Nugent (O'Reilly)

Introducing Erlang by Simon St. Laurent (O'Reilly)

Clojure Programming by Chas Emerick, Brian Carper, and Christophe Grand (O'Reilly)

Clojure for the Brave and True by Daniel Higginbotham (No Starch)

The Art of R Programming by Norman Matloff (No Starch)

Mastering Scientific Computing with R by Paul Gerrard and Radia M. Johnson (Packt)

Learn You A Haskell For Great Good by Miran Lipovaca (No Starch)

Networking and connectivity

Computers are great, but in a way, so 20th century; it's networked computers which are, at least arguably, the real technical revolution of the 21st. As such our networking protocols and technologies deserve considerable attention. We might hope our inheritors will either have long surpassed our networking, or will have the freedom to design anew rather than be shackled by all the compromises we've needed to make for the sake of backwards compatibility but either way, hopefully they can learn something from what we've done. Which is described by:

Cabling: The Complete Guide To Copper and Fiber-Optic Networking by Andrew Oliviero and Bill Woodward (Wiley)

Ethernet: The Definitive Guide by Charles E. Spurgeon and Joann Zimmerman (O'Reilly)

Understanding TCP/IP by Alena Kabelová and Libor Dostálek (Packt)

TCP/IP Essentials by Shivendra S. Panwar, Shiwen Mao, Jeong-dong Ryoo, and Yihan Li (Cambridge)

DNS and BIND by Cricket Liu and Paul Albitz (O'Reilly)

BGP by Iljitsch van Beijnum (O'Reilly)

HTTP: The Definitive Guide by David Gourley, Brian Totty, Marjorie Sayer, Anshu Aggarwal, and Sailu Reddy (O'Reilly)

Implementing SSL / TLS Using Cryptography and PKI by Joshua Davies (Wiley)

Nginx HTTP Server by Martin Fjordvald and Clement Nedelcu (Packt)

sendmail by Bryan Costales, Claus Assmann, George Jansen, and Gregory Neil Shapiro (O'Reilly)

Programming Internet Email by David Wood (O'Reilly)

Data Communications and Networking by Behrouz A. Forouzan (McGraw-Hill)

Computer Networking: Principles, Protocols, and Practice by Olivier Bonaventure (Pearson)

Computer Networking: A Top-down Approach by Jim Kurose (Pearson)

Modern software development

The line-by-line act of writing software is quite different from the team-by-team process of developing, testing, integrating, and deploying it. A few key approaches, tools, and roles are described here, including, for obvious reasons, unpacking Git itself.

Working in Public: The Making and Maintenance of Open Source Software by Nadia Eghbal (Stripe Press) /

The Manager's Path by Camille Fournier (O'Reilly)

The Missing README by Chris Riccomini and Dmitriy Ryaboy (No Starch)

Learning Agile by Andrew Stellman and Jennifer Greene (O'Reilly)

Professional Git by Brent Laster (Wiley)

Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski (No Starch)

Metasploit by David Kennedy, Jim O'Gorman, Devon Kearns, and Mati Aharoni (No Starch)

Effective DevOps by Jennifer Davis and Ryn Daniels (O'Reilly)

Site Reliability Engineering edited by Betsy Beyer, Chris Jone, Jennifer Petoff & Niall Richard Murphy (O'Reilly)

Designing Distributed Systems by Brendan Burns (O'Reilly)

Designing Data-Intensive Applications by Martin Kleppmann (O'Reilly)

Exercises in Programming Style by Cristina Videira Lopes (CRC Press)

Modern software applications

It would take a tech forest, not a tree, to even try to describe all of the uses to which software is put. However, some depictions of how individual projects and libraries are knit together into powerful networked applications seem valuable, as do overviews of e.g. virtualization, "big data" software, and especially machine learning.

Web development

Web Development with Node and Express by Ethan Brown (O'Reilly)

Flask Web Development by Miguel Grinberg (O'Reilly)

RESTful Web APIs by Leonard Richardson, Mike Amundsen, Sam Ruby (O'Reilly)

Ruby on Rails Tutorial by Michael Hartl (Pearson)

Django for Professionals: Production Websites with Python & Django by William S. Vincent (Still River)

Machine learning

Deep Learning from Scratch by Seth Weidman (O'Reilly)

Deep Learning: A Visual Approach by Andrew Glassner

Fundamentals of Deep Learning by Nikhil Buduma and Nicholas Locascio (O'Reilly)

Practical Convolutional Neural Networks by Mohit Sewak, Md. Rezaul Karim, and Pradeep Pujari (Packt)

Pattern Recognition and Machine Learning by Christopher Bishop (Springer)

Generative Deep Learning by David Foster (O'Reilly)

Strengthening Deep Neural Networks by Katy Warr (O'Reilly)

Virtualization and containers

Mastering Docker by Scott Gallagher (Packt)

Kubernetes: Up and Running by Brendan Burns, Joe Beda, and Kelsey Hightower (O'Reilly)

Spark: The Definitive Guide by Bill Chambers, Matei Zaharia (O'Reilly)

Reliability and scaling

Database Reliability Engineering by Laine Campbell and Charity Majors (O'Reilly)

The Art of Capacity Planning by Arun Kejariwal and John Allspaw (O'Reilly)

Economics and sociotechnical systems

The Economics of Information Technology by Hal Varian, Joseph Farrell and Carl Shapiro (Cambridge University Press)

Mastering Bitcoin by Andreas Antonopoulos

Hardware architectures

The spectrum of complexity from a single analog transistor to a modern multicore processor is, needless to say, difficult to summarize. This section tries to describe the basics of digital circuits and microprocessors, along with a few key references, before going on to hardware architectures and hardware design languages.

Modern hardware architecture

Microprocessor Design by Grant McFarland (McGraw-Hill)

Microprocessor Architecture by Jean-Loup Baer (Cambridge)

Inside the Machine by Jon Stokes

Introduction to Parallel Processing by Behrooz Parhami (Springer)

HDLs

IEEE Standard VHDL Language Reference Manual (IEEE)

IEEE Standard for SystemVerilog (IEEE)

Example architecture details

Arduino: A Technical Reference by J. M. Hughes (O'Reilly)

RISC-V Specifications by the RISC-V International Technical Committee

Learning FPGAs by Justin Rajewski (O'Reilly)

Hardware development

Digital Computer Electronics by Albert P. Malvino and Jerald A Brown (Career Education)

Computer Time Travel by JS Walker (Oldfangled)

Theory, Design, and Applications of Unmanned Aerial Vehicles by A.R. Jha (CRC Press)

Modern Robotics by Kevin Lynch and Frank Park (Cambridge University Press)

Mastering ROS for Robotics Programming by Lentin Joseph (Packt)

Electronic components, transistors, semiconductor manufacturing

A more low-level analysis of fundamental electronic components and transistor-based circuitry, along with textbooks describing lithography and chip manufacturing. Obviously such manufacturing is essentially impossible to recreate from scratch (Moore's lesser-known second law described how fabricator costs increase just as chip density decreases) but these works could conceivably be of historical or even practical significance.

Fundamentals of Semiconductor Manufacturing by Gary S. May and Simon M. Sze (Wiley)

Semiconductor Manufacturing Handbook (both editions) by Hwaiyu Geng (McGraw-Hill)

Pre-industrial technologies

These are the works which address the "romantic catastrophe" image of the archive's inheritors, who seek to reboot all of modern technological civilization from pre-industrial scratch. Such possible futures do exist, although they seem unlikely; furthermore, it seems possible that these works might help fill in gaps which arise in historical knowledge.

The Knowledge by Lewis Dartnell (Penguin)

Caveman Chemistry by Kevin Dunn (Universal)

Practical Blacksmithing by M.T. Richardson (Weathervane)

Materials Handbook by George S. Brady Henry R. Clauser, and John A. Vaccari (McGraw-Hill)

Practical Self-Sufficiency by Dick and James Strawbridge (DK)

Oxford Handbook of Infectious Diseases and Microbiology by Estée Török, Ed Moran, and Fiona Cooke (OUP)

Fiction, culture, and history

It is our belief that culture is often best expressed through great works of fiction. As such, we sought to assemble a list of notable literary works (including / beginning with a few books of nonfiction) to convey, on a human level, the history and culture of our time. These are:

Chapman's Homer

The Complete Works of William Shakespeare

The Tale of Genji by Murasaki Shikibu

Crime and Punishment by Fyodor Dostoevsky

Extraordinary Popular Delusions and the Madness Of Crowds by Charles Mackay

I, Claudius by Robert Graves

Brave New World by Aldous Huxley (Harper Perennial)

1984 by George Orwell (Signet Classics)

Cyberiad by Stanislaw Lem (Mariner)

One Hundred Years Of Solitude by Gabriel García Márquez (Harper Perennial)

Foucault's Pendulum by Umberto Eco (Mariner)

Anathem by Neal Stephenson (William Morrow)

Magic for Beginners by Kelly Link

Cultural context

This section of the Tech Tree is intended to convey both useful practical information from our culture, and a depiction of what it was like at the time the archive was written. It will consist of encoded data, rather than imaged pages, largely because its centerpiece, a snapshot of Wikipedia, is far too large for the latter format.

Wikipedia, while not without its flaws and omissions, is the most readily available proxy for "a written summary of our world." Note that this section is by no means intended as a complete depiction of humanity today: as our advisors stressed, this era is likely to be the best documented in all of human history, and such information is very unlikely to be difficult to find. Rather, it is intended as a convenience to indicate to the archive's inheritors the specific, particular context of the era in which the archive was written.

This section will also include several other data sources recommended by GitHub's community:

  • Wiktionary
  • Wikispecies
  • The File Formats Archive

The GitHub Arctic Code Vault

As the Tech Tree is a companion piece to the GitHub Arctic Code Vault, it will contain an index with the name, brief description, and film reel number for all of the GitHub repositories stored in the Arctic Code Vault, i.e. every active public GitHub repo as of 02/02/2020.

This index will also highlight the 15,000 GitHub repositories which are the most-starred or most-depended-on at the time the archive was written. (These are also the repositories which will be stored in the two-reel "greatest hits" subsets of the archive, to be kept with partners such as Oxford's Bodleian Library and others.)

It is worth noting that every individual reel of the Arctic Code Vault also has its own index itemizing its contents, along with all of the instructions and information required to decode the information stored in that reel. This master index will be a superset of all of those indexes, to serve as a backup and a convenience for the archive's inheritors.