Difference between revisions of "AMBuild Tutorial"

From AlliedModders Wiki
Jump to: navigation, search
m
Line 266: Line 266:
 
==Reconfiguring==
 
==Reconfiguring==
  
Reconfiguring can happen for two reasons. One is that you want to change some properties of the build, for example, switching from an optimized build to a debug build. Another is that a build script changes, and AMBuild will automatically reparse any affected build scripts to reshape the dependency graph.
+
Reconfiguring can happen for two reasons. One is if you change some properties of the build, for example, configuring a debug build over an existing optimized build. Another is if a build script changes, AMBuild will automatically reconfigure the build using the previous configure options.
  
Currently, when any tracked AMBuild script changes, all AMBuild scripts are reparsed using the previous configure options (if the source tree is the same). AMBuild will produce a new dependency graph alongside the old dependency graph, and these graphs are then merged. Any generated files in the old graph that are not present in the new graph, are removed from the file system. This is necessary to ensure that builds do not become corrupt. This has an unusual side effect of making it possible to reconfigure an old build over entirely new build scripts - AMBuild will dutifully clean all previously compiled or generated files and folders.
+
When a reconfigure occurs, AMBuild will produce a new dependency graph alongside the old dependency graph, and these graphs are then merged. Any generated files in the old graph that are not present in the new graph are removed from the file system. This is necessary to ensure that builds do not become inconsistent or corrupt. It is worth mentioning - many build systems do not remove stale files like this, and sometimes developers will rely on stale files for rolling back or testing something. AMBuild, by design, aggressively removes stale build files.  
  
 
Unlike the normal dependency graph, AMBuild scripts can depend on each other. For example, a nested script may propagate a graph object back to the root script, which then propagates it down again. This creates an implicit cycle: if either script changes, the entire cycle must be reparsed. Avoiding this is extremely difficult as it is easy to construct situations in which a minimal reparse algorithm would fail. Thus, at the moment, AMBuild performs full reparses instead.
 
Unlike the normal dependency graph, AMBuild scripts can depend on each other. For example, a nested script may propagate a graph object back to the root script, which then propagates it down again. This creates an implicit cycle: if either script changes, the entire cycle must be reparsed. Avoiding this is extremely difficult as it is easy to construct situations in which a minimal reparse algorithm would fail. Thus, at the moment, AMBuild performs full reparses instead.

Revision as of 03:08, 17 October 2013

Writing project files with AMBuild is fairly easy. This tutorial will guide you through making simple AMBuild scripts to compile and package a C++ project.

Simple Project

To begin, let's say we have a sample project with the following files:

$ ls
goodbye.cpp  helpers.cpp  README.txt

To start, we need to generate a default AMBuild configure script. This is the script that will perform the "configure" step for your build. You can generate one with the following command:

$ ambuild --gen-configure
$ ls
configure.py  goodbye.cpp  helpers.cpp  README.txt

The configure script simply invokes AMBuild. It can be modified (as we'll see later) to take extra command line options.

$ cat configure.py
# vim: set sts=2 ts=8 sw=2 tw=99 noet:
import sys, ambuild2.run

prep = run.PrepareBuild(sys.path[0])
prep.Configure()

Now, we're ready to actually make a build script for our project. The master build script must be a file called AMBuildScript, and it must be written in Python syntax. The full Python API on your system is available to AMBuild scripts, but the important aspect we'll deal with here is the AMBuild API.

The first step is to tell AMBuild to detect the first available C or C++ compiler. This is done with the following line:

builder.DetectCompilers()

With just this line in your build script, you can now try to configure build. You should see something like:

$ mkdir build
$ cd build
$ python ../configure.py
Checking CC compiler (vendor test gcc)... ['cc', 'test.c', '-o', 'test']
found gcc version 4.7
Checking CXX compiler (vendor test gcc)... ['c++', '-fno-exceptions', '-fno-rtti', 'test.cpp', '-o', 'testp']
found gcc version 4.7
$

If you get an error - either you don't have a compatible C or C++ compiler installed, or AMBuild has a bug (please report it!).

Now, we're ready to complete our AMBuildScript:

program = builder.compiler.Program("hello")
program.sources = [
  'main.cpp',
  'helpers.cpp',
]
builder.Add(program)

The builder object is an instance of an AMBuild context - more about this is in the AMBuild API documentation. Every AMBuild script has access to a builder. The builder.compiler object has information about the C/C++ compiler for the configure session. The Program() method will return an object used to create C++ compilation tasks. In this case, we're asking to build an executable that will be named 'hello' (or hello.exe on Windows). You can also specified shared libraries with Library, and static libraries with StaticLibrary.

You can attach a list of source files to your Program via the sources attribute. Finally, use builder.Add to take your C++ configuration and construct the necessary dependency graph and build steps.

Now, we can actually attempt to build. First, let's make sure AMBuild computed our graph and dependencies correctly:

$ python ../configure.py
$ ambuild --show-graph
 : mkdir "hello"
 - hello/hello
   - c++ main.o helpers.o -o hello
     - hello/main.o
       - [gcc] -> c++ -H -c /home/dvander/projects/ambuild/ambuild2/main.cpp -o main.o
         - /home/dvander/projects/ambuild/ambuild2/main.cpp
     - hello/helpers.o
       - [gcc] -> c++ -H -c /home/dvander/projects/ambuild/ambuild2/helpers.cpp -o helpers.o
         - /home/dvander/projects/ambuild/ambuild2/helpers.cpp
$ ambuild --show-steps
mkdir -p hello
task 0: [gcc] -> c++ -H -c /home/dvander/projects/ambuild/ambuild2/main.cpp -o main.o
  -> hello/main.o
task 1: [gcc] -> c++ -H -c /home/dvander/projects/ambuild/ambuild2/helpers.cpp -o helpers.o
  -> hello/helpers.o
task 2: c++ main.o helpers.o -o hello
  -> hello/hello

It looks good! Now we can build:

$ ambuild
mkdir -p hello
Spawned task master (pid: 15563)
Spawned worker (pid: 15564)
Spawned worker (pid: 15565)
[15564] c++ -H -c /home/dvander/projects/ambuild/ambuild2/helpers.cpp -o helpers.o
[15565] c++ -H -c /home/dvander/projects/ambuild/ambuild2/main.cpp -o main.o
[15565] c++ main.o helpers.o -o hello
[15565] Child process terminating normally.
[15564] Child process terminating normally.
[15563] Child process terminating normally.
Build succeeded.
$ ./hello/hello
Hello!

Note that AMBuild gives each C++ binary its own folder. For example, if you build a static library called egg.a, a shared library called egg.so, and an executable called egg all in the same folder, AMBuild will actually perform each of these builds in separate folders, and the binary paths will look like:

  • egg.a/egg.a
  • egg.so/egg.so
  • egg/egg

This is to allow complex build scenarios where the same files are rebuilt multiple times.

Packaging

Now that our project builds, let's add to our build script so that we can create a build package. We'd like to make a folder we can zip or tar for distribution, with the following files:

  • README.txt, our readme
  • hello, our final binary

First, we have to add a step to the build to create the distribution folder:

dist_folder = builder.AddFolder('dist')

The return value from AddFolder is a dependency graph node, that we can use as an input to future steps.

Now we can copy our files:

outputs = builder.Add(program)
 
folder = builder.AddFolder('dist')
builder.AddCopy(os.path.join(builder.sourcePath, 'README.txt'), folder)
builder.AddCopy(outputs.binary, folder)

README.txt can be copied directly from the source tree. To copy the executable, we use the return value of builder.Add(). We could construct its path ourselves, but having the dependency object already available is much more convenient.

Now, when we build, we see:

[5952] cp "/home/dvander/projects/ambuild/ambuild2/README.txt" "./dist/README.txt"
Spawned worker (pid: 5953)
[5952] c++ -H -c /home/dvander/projects/ambuild/ambuild2/helpers.cpp -o helpers.o
[5954] c++ -H -c /home/dvander/projects/ambuild/ambuild2/main.cpp -o main.o
[5954] c++ main.o helpers.o -o hello
[5954] cp "hello/hello" "./dist/hello"

Since copying README.txt has no dependencies, it can execute in parallel with other jobs, even before compilation has finished. It won't be copied again unless README.txt changes. However the copy of hello/hello has to occur last. We can see that it succeeded with:

$ ls -l dist/
total 12
-rwxr-xr-x 1 dvander dvander 7036 Oct 16 22:32 hello
-rw-r--r-- 1 dvander dvander   23 Oct 16 22:32 README.txt

It is also possible to add a step to execute a command like "tar" or "zip", but there's a complication. There must be a dependency to every file that would be included in the command, otherwise, the commands might occur out of order. We are still looking into easier ways to automate this.

Multiple Scripts

Non-trivial projects usually need more than one build script. AMBuild allows build scripts to nest; any script can run another script. Each script gets its own builder, known internally as a context. All jobs are created within a context and associated with that context. This allows AMBuild to reparse a minimal number of build scripts when a build script changes.

Contexts, by default, are associated with the folder they exist in relative to the source tree. For example, a build script in /source-tree/src/game/AMBuild will have a context associated with src/game. This folder structure is mirrored in the build folder, and all jobs occur within the context's local folder. For example, let's move our packaging into a separate script, PackageScript:

# PackageScript
import os
 
builder.SetBuildFolder('dist')
builder.AddCopy(os.path.join(builder.sourcePath, 'README.txt'), '.')
builder.AddCopy(Hello.binary, '.')

Then we modify our main AMBuildScript:

# AMBuildScript
builder.DetectCompilers()
 
program = builder.compiler.Program("hello")
program.sources = [
  'main.cpp',
  'helpers.cpp',
]
outputs = builder.Add(program)
 
builder.RunBuildScripts(
  ['PackageScript'],
  { 'Hello': outputs }
)

The first parameter is an array of script paths to run, and the second is a dictionary of global variables to give each script. Note that since our PackageScript is in the root of the source tree, by default its build folder is '.', so we manually override its build folder.

When PackageScript is parsed during the configure step, all of its jobs will automatically be configured to occur inside a dist folder within the build folder, so '.' actually refers to ./dist/.

Custom Options

It is possible to add custom options to the configure step using Python's optparse module. Recall the default configure.py that AMBuild generates:

# vim: set sts=2 ts=8 sw=2 tw=99 noet:
import sys, ambuild2.run
 
prep = run.PrepareBuild(sys.path[0])
prep.Configure()

The prep object has an options attribute, which is an instance of optparse.OptionParser. You can add to it, for example,

# vim: set sts=2 ts=8 sw=2 tw=99 noet:
import sys, ambuild2.run
 
prep = run.PrepareBuild(sys.path[0])
run.options.add_option('--enable-debug', action='store_true', dest='debug', default=False,
                       help='Enable debugging symbols')
run.options.add_option('--enable-optimize', action='store_true', dest='opt', default=False,
                       help='Enable optimization')
prep.Configure()

These options can be accessed from any builder object, like so:

if builder.options.debug:
  builder.compiler.cflags += ['-O0', '-ggdb3']
  builder.compiler.cdefines += ['DEBUG']
if builder.options.opt:
  builder.compiler.cflags += ['-O3']
  builder.compiler.cdefines += ['NDEBUG']

Task Groups

Sometimes it is useful to force build steps to occur in distinct phases. Normally, this would be the antithesis of what we want: the dependency graph should precisely and perfectly represent dependencies, and there should be no need to enforce order manually. That's true, but there are situations what warrant relaxing how we construct the graph.

Generated headers in particular pose a problem. If we created dependencies on a "generate headers" task, then generating new headers would trigger recompiling every source file - even ones that never included those headers. Furthermore, if we created dependencies on each individual generated header, we'd have a huge dependency graph - 50 includes and 800 source files would mean 80,000 dependency links. AMBuild solves these problem in the same way tup does.

First, we introduce the concept of a weak dependency. A weak dependency is one that theoretically exists, and must exist for ordering, but does not propagate damage. For example, let's say that hello.cpp has a weak dependency on generated.h. If hello.cpp doesn't #include "generated.h", then no changes to generated.h should ever trigger a rebuild of hello.cpp. However, if hello.cpp is changed to include generated.h, then the weak dependency ensures those jobs are executed in the right order. (It is illegal in AMBuild to depend on a generated file without having an explicit dependency.) The weak dependency can then be upgraded to a strong dependency, and possibly downgraded again later if the #include is removed.

Second, we introduce the concept of groups. A group associates a set of jobs together so they can function as one logical dependency. In the example earlier, our 50 include files would be represented by one meta-job, and the number of dependency edges would be reduced to the number of source files. Groups do not imply that all tasks in the group run at the same time, to the exclusion of all other tasks. They are simply an optimization in building weak dependencies, which enforce ordering along those dependencies.

Task groups are constructed via the AddGroup function:

header_files = ['sourcemod_auto_version.h']
 
header_outputs = builder.AddCommand(
  argv = ['python', os.path.join(builder.buildPath, 'tools', 'buildbot', 'generate_headers.py')],
  outputs = header_files
)
 
headers = builder.AddGroup("headers")
for header_output in header_outputs:
  headers.Add(header_output)

In another build script, assuming we communicated the headers object through, we could add:

library = builder.compiler.Library('cstrike')
library.sources = ['cstrike.cpp', 'smsdk_ext.cpp']
library.weak_deps += [headers]
builder.Add(library)

And now when our generated headers change, we are guaranteed that if our library's sources need to be recompiled, they will be compiled after the headers are generated.

Reconfiguring

Reconfiguring can happen for two reasons. One is if you change some properties of the build, for example, configuring a debug build over an existing optimized build. Another is if a build script changes, AMBuild will automatically reconfigure the build using the previous configure options.

When a reconfigure occurs, AMBuild will produce a new dependency graph alongside the old dependency graph, and these graphs are then merged. Any generated files in the old graph that are not present in the new graph are removed from the file system. This is necessary to ensure that builds do not become inconsistent or corrupt. It is worth mentioning - many build systems do not remove stale files like this, and sometimes developers will rely on stale files for rolling back or testing something. AMBuild, by design, aggressively removes stale build files.

Unlike the normal dependency graph, AMBuild scripts can depend on each other. For example, a nested script may propagate a graph object back to the root script, which then propagates it down again. This creates an implicit cycle: if either script changes, the entire cycle must be reparsed. Avoiding this is extremely difficult as it is easy to construct situations in which a minimal reparse algorithm would fail. Thus, at the moment, AMBuild performs full reparses instead.

If this becomes a performance bottleneck, which it might in a huge project with hundreds of build scripts, we would like to implement minimal reparsing. It would likely require transitioning to a more restricted API that disallows (or discourages) arbitrary dataflow between build scripts. To solve the problem mentioned earlier, we would need specific functions that allow AMBuild to track script inter-dependencies.

Even with full reparsing however, AMBuild will only remove or rebuild individual jobs that have changed. If you add a single .cpp file to a source list, a full reparse will only result in the existing linking step (and any depending steps) being retriggered.