2023/10/25: Generating git trees

For fetching repositories for the build system I'm currently working on, the generic way of specifying a repository is the "git tree" repository, basically specified by "I want this git tree and in case you don't know how it looks, here is a command that, when executed, gets it somewhere below the working directory". Credentials provided, this works for essentially any version control system.

I noticed that people seem to wonder how one comes up and maintains such a repository description. The answer is simple: the file is generated, as it basically acts as a lock file anyway. Now, if we generate the repos.json anyway, it is enough to maintain what actually describes the tree to checkout,

Such a decription might, e.g., look as follows.
{ "foo":
  { "cmd":
    [ "cvs"
    , "-d"
    , ":ext:me@cvs.example.org:/var/cvs/fooproject"
    , "export"
    , "-D"
    , "Oct 23, 2023 08:00 UTC"
    , "foo"
    ]
  , "subdir": "foo"
  , "env": {"CVS_RSH": "ssh"}
  }
, "bar":
  { "cmd":
    [ "cvs"
    , "-d"
    , ":ext:cvs@cvs.example.org:/var/cvs/barproject"
    , "export"
    , "-r"
    , "RELEASE_1_0"
    , "bar-stable"
    ]
  , "subdir": "bar-stable"
  , "env": {"CVS_RSH": "ssh"}
  }
}

Of course, the tags and dates have to be updated manually, but it still is pretty maintainable. The rest is then done by a simple python script
#!/usr/bin/env python3

import json
import subprocess
import sys
import tempfile

def log(*args, **kwargs):
    print(*args, file=sys.stderr, **kwargs)

def fail(s, exit_code=1):
    log(f"Error: {s}")
    sys.exit(exit_code)

def run_cmd(cmd,
            *,
            env=None,
            stdout=subprocess.DEVNULL,
            stdin=None,
            cwd):
    result = subprocess.run(cmd,
                            cwd=cwd,
                            env=env,
                            stdout=stdout,
                            stdin=stdin)
    if result.returncode != 0:
        fail("Command %s in %s failed" % (cmd, cwd))
    return result.stdout

def get_root(desc):
    cmd = desc["cmd"]
    with tempfile.TemporaryDirectory() as d:
        run_cmd(cmd, cwd=d)
        run_cmd(["git", "init"], cwd=d)
        run_cmd(["git", "add", "."], cwd=d)
        run_cmd(["git", "commit", "-m", "%r" % (cmd,)], cwd=d)
        tree = run_cmd(["git", "log", "-n", "1", "--format=%T"],
                       cwd=d, stdout=subprocess.PIPE).decode('utf-8').strip()
        subdir = desc.get("subdir")
        if subdir not in [None, "", "."]:
            tree = subprocess.Popen(
                ["git", "cat-file", "--batch-check=%(objectname)"],
                stdout=subprocess.PIPE,
                stdin=subprocess.PIPE,
                cwd=d).communicate(
                    input=("%s:%s" %
                           (tree, subdir)).encode())[0].decode('utf-8').strip()
    root ={"type": "git tree",
           "id": tree,
           "cmd": cmd}
    if "env" in desc:
        root["env"] = desc["env"]
    return {"repository": root }

def main():
    config = json.load(sys.stdin)
    with open(sys.argv[1]) as f:
        tasks = json.load(f)
    for name, desc in tasks.items():
        root = get_root(desc)
        config["repositories"][name] = root
    print(json.dumps(config))


if __name__ == "__main__":
    main()
download

That script is simply chained as
... |  "${ROOT}/bin/import-actions.py" "${ROOT}/etc/actions.template" | ...

in the script describing how to get the direct dependencies.



Cross-referenced by: