2022/11/23: just, a build system

The project I've been working on over the last years has recently been open sourced. It is a new build system, called just.

One new technical idea is to use intensional equality, i.e., a target is completely determined by its definition (which includes the rule, as well as the definition of any direct or implicit dependencies), irrespectively of where it is defined. One consequence of that approach is, that each target has to define for itself where in the logical space it wants to place its artifacts; as the place of the definition must not matter, it cannot inherit it from there. There, we differ from most build systems, as usually (probably for backwards compatibility with make) they insist that each artifact be assigned a place in the file system. In fact, we treat these path fully logically and do not require that output paths of different targets do not overlap; a target consuming dependencies with conflicting output paths can still decide what to do with them, e.g., move each dependency to a subdirectory, restage individual files, or error out. An example for simple restaging is given in the initial section of the tutorial.


{ "greeter":
  { "type": "generic"
  , "cmds": ["echo -n 'Hello ' > out.txt", "cat name.txt >> out.txt"]
  , "outs": ["out.txt"]
  , "deps": ["name.txt"]
  }
, "upper":
  { "type": "generic"
  , "cmds": ["cat name.txt | tr a-z A-Z > out.txt"]
  , "outs": ["out.txt"]
  , "deps": ["name.txt"]
  }
, "both":
  {"type": "install", "files": {"hello.txt": "greeter", "upper.txt": "upper"}}
}
download

Obtaining that flexibility is not that hard. In fact, using remote execution we get it for free, as the protocol requests us anyway to specify the location of each input for each action explicitly. And for building locally, well, we can equally well create a fresh directory for every build action.

A minor advantage of that flexibility of staging is that the command lines of actions become a lot simpler, as the inputs can be staged to and outputs picked up from whatever place is convenient. No need to interpolate paths. This happens, e.g., in the patch rule that is also used in a tutorial section.

A more important advantage, however, is that combining many repositories in a build becomes fully transparent—from a target we can't tell anyway where it's coming from; in particular, no paths like external/libfoo leak into action inputs. So consumers of a repository don't have to agree on how they call it (each one can use their own name to point to it) and it is the same, regardless whether it is the "main" repository or a dependency (literally the same: they produce identical actions).

As physical location of the sources doesn't matter, we don't even have to have a checkout of the repositories we're building; specifying a git tree identifier (together with a, possibly bare, repository containing that tree) is good enough. In fact, it is even better, as we immediately get an identifier of the whole content of the repository. Using this, together with the fact that a repository always looks the same regardless of the caller, we can take the reachable part of the repository graph (together with the target and its configuration, factoring out bisimulation (i.e., taking the minimal DFA)) as cache key for high-level caching. In this way we can even refrain from analyzing dependencies from repositories that haven't changed. In this way, short analysis and build times can be achieved, while still having a standard command-line tool (that does not need any daemon in the background or similar tricks).

Finally, if the name of a target does not matter, there isn't even the need for a "proper" name. It can equally well be a pair of an abstract dependency graph and bindings for rule names—as it is the case for our anonymous targets that provide a clean solution for language-independent interface generation, as provided, e.g., by protobuf.

Quite nice consequences we get simply by taking serious the concept of intensional logic (known from the introductory lectures to mathematical logic).



Cross-referenced by: