makeThis is the content These are the slides of a talk originally given by Eddy to the Bergen Linux Users Group, on 2007/January/25th at 7pm. Its permanent URL (which I'm apt to update if I have further ideas to add to it) is – if you can persuade your user agent (a.k.a. browser) to display this as CSS media-type screen (which is currently what you're using), you'll see the relatively verbose text; in CSS media-type projection (which Opera uses when in full-screen mode – currently active) you'll see its appearance as a presentation.
I'd like to thank Eira for giving me the excuse to visit Bergen again: I spent most of a year here from late 1994 to late 1995. If any of you remember the short-lived Jugglers' cafe at Sigurds Gate 5 that summer, I was the one serving the food most of the time. However, I normally earn my living by programming computers; I started in my first such job just a few weeks over 25 years ago, at the start of January 1982; I made the transition from ForTran to C in 1988 and from VAX to Unix in about 1990.
Over that quarter century I've had occasion to struggle with a varied assortment of problems, some of which are doubtless familiar to those here today. Over the last decade I've had more than my fair share of experience with build systems, so my chosen topic is the care and feeding of make – one of the various things that I've become intimately involved with during my nearly five years in the Linux^W Unix team at Opera Software.
Improvingmake
make make when RTFM would suffice Recursive Make Considered Harmfulalso helps
make generally don't make easier to configure generally don't 11½ years ago, I started in a new job. Due
to illness, I missed most of my first week there, which constituted my overlap
with a colleague who'd spent the previous several months coping with one of
those ghastly projects where the objectives are coarsely stated, except for the
part about and we want it by yesterday
. They'd been using recursive make
and hadn't read Recursive
Make Considered Harmful
. Instead, they'd bodged and bashed, in all the
industry-standard (but not best practice) ways to make it only be wrong
occasionally; and the result was taking unacceptably long to build their
product. So Kevan was asked to write a tool which would replace
make. Thankfully he did so in python so the result was in fact
maintainable by the new recruit who got to take it over when Kevan left (even if
I did need to learn python first). I am fairly sure that their
problems could have been solved better and in less time if they'd simply read
Recursive Make Considered Harmful
and followed up on is advice.
Plenty of teams have responded to problems with make by
deciding to write something else to replace it. I would contend that nearly all
of them would have been better off installing GNU make on all their
build machines and taking the time to read its manual. Far too many developers
have learned a small number of simple tricks with make and suppose
that any problem they cannot solve with those tricks is a deficiency of
make when, if they would but read the manual, they'd find the
problem easy to solve. The attempts at improving
make that
I've seen have, generally, not been as good or as powerful as GNU
make. Attempts at making it easier
to configure
make have tended to solve one or two particular problems –
which could have been better solved by using make more competently
– while obstructing my access to features of make that I need
in order to solve problems they didn't think of.
To take one example, we had a make file at Opera which was
generated for us by a tool that was meant to make our lives easier. It only had
to look after a small bunch of source files used for an example program, but
wc reported 201 lines, 476 words and 4947 bytes. After cleaning up
by using make properly, I reduced it to 55 lines, 170 words and
1333 bytes. Running bzip2 on the original only compressed it to
1429 bytes. I dread to think how much waste that helper
tool would have
subjected us to on make files for a large-scale compilation, rather than a noddy
example program.
I don't claim to be an expert on the alternatives and helpers available for
make (when I've met them I've usually given up in disgust –
and solved the problem using make – before getting familiar
enough with them to do them justice); nor do I deny that make
(even GNU make) has its deficiencies; and the software
development toolchain desperately needs something better so that we can make it
redundant. I am glad to see that there is ample activity in this area, for
example within the
Software Carpentry project. However, make (particularly GNU
make) is actually very good at its job if you take the time to
learn how to use it properly.
One of the central purposes of make is tracking dependencies
among resources so that it's possible to know when it's necessary (and
possible) to make them. For direct and obvious dependencies, make
makes this very easy. However, complex networks of dependencies arise in real
software projects; tracking these requires rather more care and effort. What
follows is a journey through some of the kinks and knots that can arise,
particularly in very large projects; ultimately, I intend to turn this into a
patch to the part of GNU make's manual that I quote below; however,
it seems worth-while to share with others before I've got it into the right form
for that, if only to give many eyeballs a chance to spot any mistakes.
make One configures make by writing plain
text files (always better than some impenetrable magic format intelligible only
to one's IDE): these specify how to go about turning some source code (and other
materials) into a final deliverable product. The make utility has
become pretty complex over the three decades or so since it was invented –
I shalln't be surprised if GNU make is Turing complete – but the
basic idea behind it hasn't changed much: each file you need to generate is
created by running a command; that command uses some pre-existing files; when
those files are newer than the file created by the command (or the latter
doesn't exist) you need to run the command. The file to be created is called a
target
, the files it's made from are called prerequisites
(which
may be source
files or the targets of other rules); and the part of a
make file that says the former depends on the latter, and how to build it when
needed, is called a rule
.
To save a lot of repetition, make supports some pattern
rules, using a % to stand for an arbitrary text to appear both in the
target's name and in the name of a prerequisite; and the command in a rule can
refer to the names of target, $@, and prerequisites in various ways, so
that the command for a pattern rule can specify a general recipe for building
targets matching a given pattern from sources matching a related pattern,
without the make file needing to be encumbered with information about what exact
text the % matched in both cases. Thus, for example, a rule like
%.o: %.c
$(COMPILE.c) -o $@ -c $<
specifies (subject to some configuration causing
$(COMPILE.c) to expand so something suitable) how to build an
object file from a C source file. The space before the command has to be an
actual tab character (not some equivalent number of spaces) for
make to recognise that line as a command; hopefully, it's styled
such that you can distinguish it from the ordinary spaces elsewhere on the
example – this page uses that style consistently for tab characters.
However, to take the above rule as example, the generated object file may
actually depends on a great deal more than the C source file that's referenced
in the command to build the object file. The C source file can, by way of
#include directives, pull in code from diverse other files. If any
of these changes, even when the C source file hasn't changed, you (potentially)
need to regenerate the object file. A make rule is actually
allowed to emit the command used for building the target and merely declare that
the target depends on some other file: so one approach to this problem is to
supplement the above pattern rule (which provides the command we'll need) with a
set of dependency declarations, like
this.o: that.h ../other/thing.h
Then, if either that.h or
../other/thing.h changes, make shall know that it need
to run the command it gets from its %.o: %.c rule to regenerate
this.o; however, on a large project, it gets quite laborious to
keep track of all the things that each object file depends on – especially
as this changes and varies extensively as the project's various .h
files (which can themselves use #include directives) change which
of one another they pull in to each compilation. In practice, the only sensible
way to handle this is to generate this dependency information automatically.
Fortunately, with a little help from one's compiler, this is entirely
possible.
The GNU make manual (4.14: Generating Prerequisites Automatically) has this to say (inter alia):
The practice we recommend for automatic prerequisite generation is to have one makefile corresponding to each source file. For each source file `NAME.c' there is a makefile `NAME.d' which lists what files the object file `NAME.o' depends on. That way only the source files that have changed need to be rescanned to produce the new prerequisites. Here is the pattern rule to generate a file of prerequisites (i.e., a makefile) called `NAME.d' from a C source file called `NAME.c': %.d: %.c @set -e; rm -f $@; \ $(CC) -M $(CPPFLAGS) $< > $@.$$$$; \ sed 's,\($*\)\.o[ :]*,\1.o $@ : ,g' < $@.$$$$ > $@; \ rm -f $@.$$$$ *Note Pattern Rules::, for information on defining pattern rules. The `-e' flag to the shell causes it to exit immediately if the `$(CC)' command (or any other command) fails (exits with a nonzero status). With the GNU C compiler, you may wish to use the `-MM' flag instead of `-M'. This omits prerequisites on system header files. *Note Options Controlling the Preprocessor: (gcc.info)Preprocessor Options, for details. The purpose of the `sed' command is to translate (for example): main.o : main.c defs.h into: main.o main.d : main.c defs.h This makes each `.d' file depend on all the source and header files that the corresponding `.o' file depends on. `make' then knows it must regenerate the prerequisites whenever any of the source or header files changes. Once you've defined the rule to remake the `.d' files, you then use the `include' directive to read them all in. *Note Include::. For example: sources = foo.c bar.c include $(sources:.c=.d)
.d if we already know we need to build its .o sed just
when the above almost made it redundant ! gcc can emit dependency file as side-effect of compiling .o sed a separate stage – but this is good That's all nice, but anyone who's read Recursive Make Considered Harmful and followed its advice wants to make files in a directory hierarchy under the control of a single make process (albeit configured by files spread throughout the hierarchy). The %.d rule we've got only does the job for a file in the current directory, since the dependency file gcc emits only names the .o file's basename, without any path. This is sensible enough: the compiler has no idea where you are planning to put your generated files – alongside your source files, or elsewhere for an out-of-source build. So we'll have to tweak that sed command a bit to take account of this.
The approach above calls for the .d file for every .o file you might need to compile – but, when the .o file doesn't exist, you don't need its .d file at all. So it's actually worth trimming the list of sources to only the ones we need. You'll still need to generate the skipped .d files for your next run of make, but this skips them on the first – and newer versions of gcc give us a way to generate the .d files as a side-effect of compiling their matching .o files.
When creating each .d file as above, we can dispense with the $@.$$$$ intermediate file by using a pipe downstream of the compilation step with -M. However, when generating dependency information as a side-effect, you don't have the option of sending it down a pipe; you just pass a filename to -MMD -MF and pick it up after the compilation has generated its .o file. So now you need a separate extension – I use .D – for the direct dependency file, which you subsequently process to produce the .d file. But this is good in any case, since it lets you put that ugly sed command (which I'm going to make a lot uglier later) in one rule, instead of having to repeat it for each type of source file – %.d: %.cpp for C++ code, as well as the rule we already have for plain C source, for example. We still need a rule to produce the .D file from sources in case it gets deleted somehow while the .o file exists, but at least we can isolate the hairy sed.
As an incidental bonus, generating .d as a side-effect of
compiling its .o ensures that we'll regenerate the former whenever
we regenerate the latter, so we don't need to declare the former to depend on
everything the latter depends on – which would have forced wanton
regeneration of the .d during make file parsing, when the
.o exists. The .d is out of date, but it tells us we
need to re-build the .o, which is all we needed to know.
So, now we need:
%.o: CPPFLAGS += -MMD -MF $(@:.o=.D) %.d: %.D sed 's!.*$(@F:.d=.o) *:*!$(@:.d=.o): !g' $< > $@ $(GENROOT)/%.D: $(SRCROOT)/%.c $(CC) -M $(CPPFLAGS) $< >$@ $(GENROOT)/%.D: $(SRCROOT)/%.cpp $(CXX) -M $(CPPFLAGS) $< >$@ object := $(patsubst $(SRCROOT)/%.cpp, $(GENROOT)/%.o, \ $(sources:$(SRCROOT)/%.c=$(GENROOT)/%.o)) gotobj := $(wildcard $(object)) include $(gotobj:.o=.d)
.h and all #include of it make barf on the missing prerequisite GENSRC variable $(wildcard ...) and sed to the rescue: OK, so you've done a build. When you make changes, only the files that need it get re-compiled. Everything is nice. So, time to update your source tree and find out what your colleagues have broken today; cvs up. But what if one of the header files gets removed ? Obviously, whoever removed it has probably also removed all the #include directives that referenced it, so we should have no problem. However, our .d files say they and their .o files depend on the lost header file. We run make and it pulls in our .d files; then it checks to see if any of them need to be regenerated, due to changes in things they depend on. It wants to rebuild any that need it and re-start loading its make files with up-to-date versions. But it finds that there has been a change in something some .d files depend on, and it can't regenerate them because they depend on something that's gone missing. So make barfs.
Of course, you can remove the offending .d files and regenerate them; they won't depend on the missing header after that, so it'll all be fine. However, you do need to remove them, since they're what's saying they depend on the missing files. This is fine in a tiny project, but not in a large-scale project. So we need to hack our .d files a bit more. The file tells make about things that, if they change, need us to recreate our target. So we still need to respond to changes in any of these that does exist, but we need to be able to ignore any that have gone missing. Helpfully, make provides a function to do that: $(wildcard ...) expands to just the files that exist, among those listed as its parameters. So we just need to wrap the list of files our .d and .o depend on in that. The list of files may be spread over many lines, using \ on the end of each to continue onto the next; so we put $(wildcard after the : that follows the names of our two targets, and a final ) on the line that doesn't end in a \.
But hang on, what if some of the things we depend on can be regenerated ? They may go missing when we make clean and we need to exercise some rule to bring them back. If we try to compile our .o, or regenerate our .D, without them the compiler's going to fail. So we still need to regenerate them – which means we need our targets to depend on them even if they don't exist. So we need to leave them out of our $(wildcard ...), which means we need to close the parenthesis before, and re-open the wild-card after, each file we know how to regenerate. So set up a variable, GENSRC, that lists them. Then we can hack that sed command a bit:
SPACE := $(EMPTY) # a single space character
%.d: %.D
sed \
-e 's|^.*$(@F:.d=).*: *|$(@:.d=.o): $$(wildcard |' \
-e 's!\([^ ]*$(*F)\.cp*$(subst $(SPACE),,$(GENSRC:%=\|%))\) *!) \1 $$(wildcard !g' \
-e 's|\([^\\]\)$$|\1 )|' -e 's|\$$(wildcard *) *||g' $< > $@
Pretty it ain't, but it works. (But the second
-e's parameter can get quite long – FreeBSD's
sed silently truncates expressions over 1066 bytes long, it would
appear, which forced me to restructure this expression, making it even uglier.)
Note that the last bit is taking out any stray instances of empty $(wildcard
) that've resulted from all our hackery; they did no harm, but we may as
well clean them away.
SRCROOT and GENROOT to roots of source
and generated directory trees While we're at it, it's nice to keep one's generated
files separate from one's source tree; for example, you can then switch between
debug builds (under one generated directory) and optimised ones (under another)
without having to make clean and re-build everything in between.
So the .o files and .d files should go somewhere other than
where the source files are. You might remember I put paths on the source and
dependency files earlier:
$(GENROOT)/%.D: $(SRCROOT)/%.c $(CC) -M $(CPPFLAGS) $> >$@ $(GENROOT)/%.D: $(SRCROOT)/%.cpp $(CXX) -M $(CPPFLAGS) $> >$@
$(GENROOT)
mirroring that under $(SRCROOT)Your rules for .o and other generated files need similar work, of course. That's all nice and simple, but it can't possibly work unless something is going to make all the directories it calls for – which means a directory tree under $(GENROOT) mirroring the one under $(SRCROOT). You could do that brutally by running
(cd $(SRCROOT); find . -type d -print0) | (cd $(GENROOT); xargs -0r mkdir -p)
but there may be revision-control subdirectories, documentation, test data or any manner of other cruft in your source tree – it'd be cleaner to only generate the directories we need. So naturally we want to use dependencies in make to drive that for us.
The obvious approach is for each output file to depend on the directory it needs to go in; we then have a mkdir rule for each directory, and we're done. However, this doesn't work the way you'd like: it forces you to remake everything all the time. Last time you ran make, you created the directory and added a bunch of files in it. But adding a file to a directory changes the modified time of the directory. So now the directory is more recent than all but at most one of the files in it: each of which depends on the directory, so now thinks it needs to be re-built. That won't do.
So, instead, make each output file depend on a .exists touch-file in its own directory; then the rule for the touch file makes the directory before touching its target. It just remains to make everything depend on suitable touch-files. While we're at it, if we make each touch-file rule depend on its parent directory's touch-file rule, we'll be able to skip the -p flag to mkdir. The remaining problem is simply to construct the rules we need to say that each file depends on its directory's touch file; which turns out to be a bit fiddly.
%/.exists:
mkdir -p $(@D) && touch $@
define ObjDirTemplate
$(addprefix $1, .o .D): $(dir $1).exists
endef
$(foreach D, $(object:%.o=%), $(eval $(call ObjDirTemplate,$D)))
define DirDirTemplate
$1 $1/.exists: $$(if $$(wildcard $(dir $1)),,$(dir $1).exists)
endef
$(foreach D, $(patsubst %/,%,$(sort $(dir $(object)))), \
$(eval $(call DirDirTemplate,$D)))
(and you really don't want to see the evil bodge I
needed, to achieve equivalent results in versions of make too old
to support the $(eval ...) construct relied on here).
.o files: In a big project, one can have so many .o
files (e.g. > 2300) that the command-line to ar ends up being
too long for the shell (> 100 kB). This is particularly apt to happen when
doing out-of-source builds because, even when using relative paths, your
$(GENDIR) is apt to add quite a lot (49 bytes in my case, adding
another >100 kB) to the length of each object file's name – and to the
library file's name.
$(GENDIR)/libhuge.a: $(object)
$(AR) $(ARFLAGS) $@ $^
ends up being over 240 kB of text. One solution is to use make's magic library file syntax:
$(GENDIR)/libhuge.a: $(object:%=$(GENDIR)/libhuge.a(%)) $(RANLIB) $@ $(GENDIR)/libhuge.a(%): % $(AR) $(ARFLAGS) $@ $<
(and actually the second rule here is one of
make's built-ins, so we could skip it) which solves the
command-line length problem – but it's disgustingly slow, even when
leaving out the r flag from $(ARFLAGS) (which is why
we need to run $(RANLIB) once we're done). If you've got some
sensible way to split up your $(object) list into smaller chunks,
you can add each chunk to the library as a single command to get a solution
part-way between the two above; but it's pretty ghastly to implement and still
fairly slow. My colleague Joakim Bengtsson deserves credit for the following
inspired piece of hackery:
$(GENDIR)/libhuge.a: libhuge-objtmpdir $(object) $(foreach O,$(?:libhuge-objtmpdir=),$(shell ln -f $O $(@D)/objtmp/)) \ cd $(@D)/objtmp; \ $(AR) $(ARFLAGS) ../$(@F) $(notdir $(^:libhuge-objtmpdir=)) || failed=yes; \ cd ..; rm -fr objtmp; [ -z "$$failed" ] libhuge-objtmpdir: $(GENDIR)/.exists rm -fr $(<D)/objtmp; mkdir $(<D)/objtmp
Note that the $(foreach ...) is
evaluated by make in the cource of running the command; this causes
make to run one ln -f process per (changed) object
file, collecting the resulting (empty) output and including it as part of the
command it executes (where it gets ignored). Even with this hack, our
command-line to ar is over 41½ kB.
One problem I still haven't solved for libraries is what
happens when a .o file goes away. If a .c or
.cpp file is removed from your $(sources), your
existing libhuge.a still contains the corresponding .o
even though it no longer should. Since some of the code from the removed source
file has usually moved elsewhere (indeed, the source file may simply have been
renamed), this can lead to duplicate symbol
errors from your linker. The
removed source file may have referenced symbols no longer supplied by other
files that were updated when it was removed; this can lead to missing
symbol
errors from the linker. But there doesn't seem to be any natural way
to remove the lost .o file from the library, other than by manual
intervention. The brave can use ar d libhuge.a delenda.o but it's
probably safer to just rm libhuge.a and let make
regenerate it.
GNU make is mighty.
Unix is user friendly, it's just picky about who its friends are. — Tollef Fog Heen
Assorted folk had favourable things to say about various other tools to do similar jobs. Dag subsequently sent me a link to a page about CMake. I might add further links to other related tools here, if I find them interesting.
Make deals poorly with commands that
and the combination is particularly painful. The latter leaves files
out-of-date relative to what they depend on, so the command is always run, even
when it isn't needed. The former shall run the command once per file if one
makes each generated file a target of a rule that runs the command; so one has
to have a .PHONY rule on which they all depend, which runs the
command. It would be nice to support only write on change
, but doing so
would require an extra time-stame on each file; make would need to
keep track of both the last time the file was changed and the latest time at
which the file was known to be up-to-date.

Written by Eddy.