Mwasplund/another (#40)

Create proposal doc
soup-build · May 18, 2020 · 70a478f · 70a478f
1 parent b66881a
commit 70a478f
Show file tree

Hide file tree

Showing 2 changed files with 79 additions and 6 deletions.
diff --git a/Docs/Proposal.md b/Docs/Proposal.md
@@ -0,0 +1,73 @@
+# WORKING DRAFT - Another Build System
+You are right to be skeptical. Why do we need yet another competing solution for building our code that could fragment our community even further? I have spent a majority of my career as a software engineer trying to convince others not to reinvent what can instead be can borrowed and extended from others. I think that is why it has taken me so long to put down in words my reasoning for creating this project. However, I truly believe that we are at a unique point in the lifetime of C++ that we can finally create a build system that resolves all of the major issues holding the language back from being a best in class collaborative experience for everyone, from the programmer writing their very first "Hello World!" to the most weathered of coders.
+
+With C++ 20 coming out this year we will finally be getting our hands on the long awaited (and controversial) Modules support. It is this feature that will allow C++ builds to finally have a clean binary separation between individual projects that will open the door to fixing many of the problems present in building and sharing C++ code today. At the same time, migrating our code to support Modules will require a substantial amount of work, which means this is the ideal time to consider a major shift in what tooling we use as a community. In this post I outline a general overview of the key issues present in building and sharing our code today and then present a new build system that leverages Modules at its core to create a new way to collaborate around the open source community.
+
+## Sharing Code
+Beyond the normal complexity that will be present in building any programming language, C++ has extra aspects that make it especially hard to build and even harder to share those builds with others. There are three primary issues that make C++ a hard language to have shared builds; it has a single specification with multiple compiler implementations, it is a compiled language and it inherited the legacy of the C preprocessor.
+
+### Specification
+Unlike many other languages out there today that have both a language specification and a single first party implementation, the C++ language is only the specification, and has no first party compiler. This means that we get to have multiple compilers from different vendors that allows for targeting many different architectures. It also means that if I want to share my code with the C++ community as a whole I have to take care around platform specific logic and have a unique setup for each compiler to ensure the build works correctly. This is not too difficult of a problem to handle with a good build system, but does require some integration work to support new compiler vendors. This is also perhaps where C++ has made the largest improvements with the continued evolution of the standard library as an abstraction over common platform specific functionality.
+
+### Compiled
+The overhead of having many different compiler implementations is compounded by the fact that C++ is compiled directly to the assembly for the target machine that will execute the code. C++ puts no constraints on how a compiler does this mapping which means that the [Application Binary Interface (ABI)](https://en.wikipedia.org/wiki/Application_binary_interface) between two compilers (and sometimes between versions of the same compiler) are not compatible with each other. Because of this we have to ensure that all of our objects were generated using the same compiler or take special care to work around this incompatibilities using strict design practices. There have been multiple approaches in the past to combat binary compatibility issues when sharing C++ code.
+
+Perhaps the oldest way to share native code is to pre-build the binaries and distribute a single dynamic library along with a set of public header files. One way to get around the binary compatibility issue is to only expose a C style public binary layer that takes advantage of the fact that C **does** have a standardized binary layer. This requires that all C++ implementation code be wrapped in a public C layer and if a client wishes to use modern C++ practices the C layer can then be wrapped in yet another C++ layer that is compiled within the consumer project itself. A second pattern that allows for the distribution of pre-built native binaries is to expose a single C style entry point, and from then on, use only interfaces when communicating across the boundary (Note: beware of exceptions or standard library objects passing over the boundary!). While not technically a requirement that C++ interface definitions have a standard ABI, Microsoft has effectively standardized this approach through the sheer number of projects that utilize it through [COM](https://en.wikipedia.org/wiki/Component_Object_Model). Both of these approaches will produce fully compatible binaries that can be distributed to others, however the overhead of either approach is often not worth the effort unless your shared component is very large.
+
+Another approach to binary compatibility issues is to have no binaries at all. Some communities that maintain smaller projects have taken to embedding both the definition and implementation into header only libraries. When including the headers in your project you are effectively building the project for them. Due to the constraint that you must now place all of your source in your public headers these headers can grow unwieldy and will be unmanageable for large projects. These large headers can also have a negative impact on build performance as they are re-parsed multiple times in every translation unit that consumes them.
+
+A relatively new approach to consuming external dependencies is through package managers. A package manager distributes either the raw source along with the build definition required to integrate with your project and as long as your two systems are compatible it will automatically inject the child dependency into your build or download the pre-built binaries that were carefully cataloged to have the same compiler, architecture and configurations. This approach works well, but does require that the package manager be able to generate the required build definitions to be used by consumers or be directly integrated within a build system.
+
+### Preprocessor
+The preprocessor is, until now, a point of failure that could not be protected against by any build system when integrating with external source. Until C++ 20 the only way to share a symbol was to place a declaration in a header file that would be included by both the implementation and all of the translation units that wish to use it. This can lead to unforeseen compatibility issues when the pre-processor comes into play. When a header file is included with a different set of preprocessor definitions between usage and implementation bad things can often occur. At best this will result in a compiler or linker error, and at worst you will have a fun [one definition rule](https://en.wikipedia.org/wiki/One_Definition_Rule) violation to track down! This is where Modules shine, and the primary driver behind why I believe we can finally make C++ the best open source, collaborative language!
+
+Another major issue with sharing code between different projects is incompatible language standards. In general it is straightforward to pull source that targets an earlier versions of the language into a project with a newer version (unless the old code uses a removed standard library feature). Header only libraries can have preprocessor conditionals for different language versions and using a C layer can help alleviate this issue. However, this may be another instance where C++ Modules can utilize the Binary Interface layer to allow for inter-module libraries to maintain a compatible layer and still allow for different language versions to be used internally. (Epochs anyone?!)
+
+## Proposal
+It is not enough to say that Modules will solve all of our problems. We will also have to define clear priorities for a collaboration first build system. The remainder of this document outlines the core Requirements and Goals for the proposed build system and gives a brief overview of the core design.
+
+### Requirements
+The set of requirements cannot be compromised. They do not necessarily have a priority order since they cannot conflict with each other, if the concepts are incompatible then the final system would be deemed a failure.
+
+1) Reproducible - Core to any build system is the requirement that builds be deterministic and reproducible. This design requirement is highest on the list because no matter how well a system is designed and implemented, teams will not be able to utilize it unless they can trust that it will always produce the same result no matter who builds it and when.
+
+2) Extensible - A build system should be able to support the requirements of all projects. It must have an extensibility framework that allows build architects to write their own custom build logic when the built in functionality does not meet their needs.
+
+3) Isolation - This is a unique requirement for C++ as a result of the above overview of sharing C++ code today. Isolated builds means that one project cannot influence or be influenced by another build, intentionally or by accident, except through explicit structured channels.
+
+### Goals
+While the goals are not hard requirements they are always kept front of mind when making any design or implementation decision. These items are in priority order.
+
+1) Collaborative - Writing code is very rarely done in isolation. The largest goal for this build system is to be able to work seamlessly within a team and with external dependencies.
+
+2) Simple - When fulfilling the above requirements the secondary priority is always simplicity and usability. This means that the standard user will get the best experience possibly for both setup and usage. Some extra complexity is allowed in exchange for performance gains in the internal implementation and the extensibility framework.
+
+3) Fast - The inner developer loop is very important to the productivity of an engineer. To this end, the build system should focus heavily on the performance of an incremental build and, to a lesser extent, ensure the full build is as fast as possible.
+
+4) Customizable - How a project is build is often a very personal matter of preference (or legacy requirement). Where allowable, the build system should be customizable to allow for overriding default settings where it does not conflict with the ability to easily build single projects as a part of the greater ecosystem.
+
+## Design
+This build system, called Soup, will utilize a declarative Recipe file as an easy to understand definition for an individual Package. This file will be the main way to tell Soup about your project. The core command line application will be used to invoke the build and provide extra configuration parameters. Internally, Soup uses a Task execution engine to run Tasks in their correct order and exposes a registration mechanism to allow for C++ "Extension" DLLs to run arbitrary code during the build. The Tasks are expected to generate a [Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of build Operations that make up the actual build. Theses Operations will be executed to produce the final build result. The primary design consists five key components; the Command Line interface (CLI), the build definition, the build engine, the operation evaluation engine and the package repository.
+
+### CLI
+The [Command Line Interface](CLI.md) is the first thing a user sees when they interact with the build system. The CLI is primarily there to take user input through a set of parameters and flags to pass temporary configuration values into the build execution. While important it is fairly straightforward to design and will be left to open to evolve through use. 
+
+### Definition
+The build definition, which will call a Recipe, is how the user will configure their project through a declarative configuration file. The Recipe file will utilize the [toml](https://github.com/toml-lang/toml) language as a clean, human readable configuration definition that supports a core set of data types. The file can be thought of as a simple property bag for getting shared parameters passed into the build system for an individual package. There are a few "known" property values that will be used within the build engine itself, however the entire contents will be provided as initial input to the build engine.
+
+### Engine
+The build Engine has two jobs; to recursively build all transitive dependencies, and performing the registration and execution of build Tasks that make up the core build functionality. All build functionality will be contained in a Task. A Build Task will consist of a unique name, lists of other Tasks that must be run before or after it, and a single Execute entry point. These build Tasks will be registered through Dynamic Libraries that expose a single pre-defined C method. The build Tasks will then communicate with the build Engine itself through a strict interface layer to provide a compatible ABI that will allow the CLI executable that contains the build Engine implementation to work with the source compiled development dependencies. This work can be broken down into five phases.
+
+1. **Parse Recipe** - The Recipe toml file is read from disk and parsed into a property bag.
+2. **Build Dependencies** - The Engine will use the known property lists "Dependencies" and "DevDependencies" to recursively builds all transitive runtime and development dependencies starting at phase one of the build. The Engine will maintain a communication channel between parent and children project builds to allow for passing shared parameters down and output objects back up. 
+3. **Build Extensions** - The Engine will then invoke the predefined C method that is exported from all known Extension DLLs. The Engine will discover these DLLs from the Development Dependencies list as well as a single predefined Extension DLL that is distributed with the CLI executable and contains the Tasks that can take a standard Recipe definition and convert it into the required compile commands with the initial known set of Compiler implementations.
+4. **Run Tasks** - The build Engine will invoke all registered build Tasks in their requested order. The Tasks can influence each other by reading and writing properties to and from the active state (a shared property bag). A build Task should not actually perform any commands itself, it will instead generate build Operations which are self contained executable definitions with input/output files.
+5. **Run Operations** - The final stage of the build is to execute the build Operations that were generated from the build Tasks. These commands contain the executable and parameters to invoke as well as the input and output files that will be used to perform incremental builds. (Note: There is currently a very simple time-stamp based incremental build that relies on the compiler generated include list. There is an open question of which project will be used to replace this temporary solution. The current best choices are either [BuildXL](https://github.com/microsoft/BuildXL) or possibly [Ninja](https://github.com/ninja-build/ninja)).
+
+### Repository
+You may have noticed that nothing about the build explicitly knows about integrating with a public feed of packages. The key concept is that because each individual projects build is isolated and self contained a dependency reference can easily be migrated from a direct directory reference for local projects to a name/version pair that will be resolved to a published snapshot of a public project. The CLI application can then consume a rest API from a service that allows for users to install other projects and publish the code they would like to share with ease.
+
+Check of some [Samples](Samples.md) to get a better idea of how all of this would work in practice!
+
+## Summary
+It would take a huge amount of time and effort to transition the entire C++ community to a new ecosystem of build tooling. However, C++ 20 presents a unique opportunity. Migrating to take advantage of Modules is a non-trivial breaking change. I believe that by transitioning at the same time to a build system that was designed explicitly for use in this new world we can finally get to a place where C++ is an amazing language for collaborating with others.
diff --git a/README.md b/README.md
@@ -12,18 +12,18 @@ Soup is a build system that was created to simplify many aspects of developing C
 
 ## Design Goals
 
+### Reproducible
+Core to any build system is the requirement that a build be deterministic and reproducible. By integrating the entire build system as extensions of a simple core build authoring engine a Soup build will easily be able to recreate the exact environment that was used to compile past versions of any project. (Note: While Soup holds this requirement for it's own Build Extensions, it will fall on the community to ensure that we keep this goal top of mind when writing custom builds).
+
+### Shareable
+Because C++ is a fully compiled language with ABI compatibility issues, sharing libraries has always been painful. By writing the entire build system in an integrated extension mechanism that relies on the ability to compile C++ extensions at build time all packages can be shared through **sou**rce **p**ackages. This will allow for seamless integration with the package manager.
+
 ### Simplicity
 Soups primary goal is to make building C++ simple. It strives to have the smallest number of steps to get a new project up and running, while having an extensibility framework to support the complexity of large projects with unique requirements.
 
 ### Isolation
 A common problem with C++ builds today is the leaking of one individual components internals into downstream dependencies through header includes. Soup natively supports binary module interfaces as the default mechanism for sharing public symbols between components. This alleviates the need to match preprocessor definitions between a project and it's dependencies. The isolation also allows for better compartmentalization of components which leads to better architected code with a clean separation between public and internal symbols.
 
-### Shareable
-Because C++ is a fully compiled language with ABI compatibility issues, sharing libraries has always been painful. By writing the entire build system in an integrated extension mechanism that relies on the ability to compile C++ extensions at build time all packages can be shared through **sou**rce **p**ackages. This will allow for future integration of a Package Manager with a public hosted feed of packages.
-
-### Reproducible
-Core to any build system is the requirement that a build be deterministic and reproducible. By integrating the entire build system as extensions of a simple core build authoring engine a Soup build will easily be able to recreate the exact environment that was used to compile past versions of any project. (Note: While Soup holds this requirement for it's own Build Extensions, it will fall on the community to ensure that we keep this goal top of mind when writing custom builds).
-
 ## Contributing
 Soup is currently in active prototyping and testing. If you are interested in contributing to the project feel free to submit a PR or file an issue with a suggestion/bug. Otherwise download the latest release to give it a try! We are always open to feedback, good or bad :smile:.
 * [Getting Started](./Docs/GettingStarted.md)