Thursday, August 2, 2007

C preprocessors

Section 10. C Preprocessor

10.2: Here are some cute preprocessor macros:

#define begin {
#define end }

What do y'all think?

A: Bleah. See also section 17.

10.3: How can I write a generic macro to swap two values?

A: There is no good answer to this question. If the values are
integers, a well-known trick using exclusive-OR could perhaps be
used, but it will not work for floating-point values or
pointers, or if the two values are the same variable (and the
"obvious" supercompressed implementation for integral types
a^=b^=a^=b is illegal due to multiple side-effects; see question
3.2). If the macro is intended to be used on values of
arbitrary type (the usual goal), it cannot use a temporary,
since it does not know what type of temporary it needs (and
would have a hard time naming it if it did), and standard C does
not provide a typeof operator.

The best all-around solution is probably to forget about using a
macro, unless you're willing to pass in the type as a third
argument.

10.4: What's the best way to write a multi-statement macro?

A: The usual goal is to write a macro that can be invoked as if it
were a statement consisting of a single function call. This
means that the "caller" will be supplying the final semicolon,
so the macro body should not. The macro body cannot therefore
be a simple brace-enclosed compound statement, because syntax
errors would result if it were invoked (apparently as a single
statement, but with a resultant extra semicolon) as the if
branch of an if/else statement with an explicit else clause.

The traditional solution, therefore, is to use

#define MACRO(arg1, arg2) do { \
/* declarations */ \
stmt1; \
stmt2; \
/* ... */ \
} while(0) /* (no trailing ; ) */

When the caller appends a semicolon, this expansion becomes a
single statement regardless of context. (An optimizing compiler
will remove any "dead" tests or branches on the constant
condition 0, although lint may complain.)

If all of the statements in the intended macro are simple
expressions, with no declarations or loops, another technique is
to write a single, parenthesized expression using one or more
comma operators. (For an example, see the first DEBUG() macro
in question 10.26.) This technique also allows a value to be
"returned."

References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.

10.6: I'm splitting up a program into multiple source files for the
first time, and I'm wondering what to put in .c files and what
to put in .h files. (What does ".h" mean, anyway?)

A: As a general rule, you should put these things in header (.h)
files:

macro definitions (preprocessor #defines)
structure, union, and enumeration declarations
typedef declarations
external function declarations (see also question 1.11)
global variable declarations

It's especially important to put a declaration or definition in
a header file when it will be shared between several other
files. (In particular, never put external function prototypes
in .c files. See also question 1.7.)

On the other hand, when a definition or declaration should
remain private to one source file, it's fine to leave it there.

See also questions 1.7 and 10.7.

References: K&R2 Sec. 4.5 pp. 81-2; H&S Sec. 9.2.3 p. 267; CT&P
Sec. 4.6 pp. 66-7.

10.7: Is it acceptable for one header file to #include another?

A: It's a question of style, and thus receives considerable debate.
Many people believe that "nested #include files" are to be
avoided: the prestigious Indian Hill Style Guide (see question
17.9) disparages them; they can make it harder to find relevant
definitions; they can lead to multiple-definition errors if a
file is #included twice; and they make manual Makefile
maintenance very difficult. On the other hand, they make it
possible to use header files in a modular way (a header file can
#include what it needs itself, rather than requiring each
#includer to do so); a tool like grep (or a tags file) makes it
easy to find definitions no matter where they are; a popular
trick along the lines of:

#ifndef HFILENAME_USED
#define HFILENAME_USED
...header file contents...
#endif

(where a different bracketing macro name is used for each header
file) makes a header file "idempotent" so that it can safely be
#included multiple times; and automated Makefile maintenance
tools (which are a virtual necessity in large projects anyway;
see question 18.1) handle dependency generation in the face of
nested #include files easily. See also question 17.10.

References: Rationale Sec. 4.1.2.

10.8: Where are header ("#include") files searched for?

A: The exact behavior is implementation-defined (which means that
it is supposed to be documented; see question 11.33).
Typically, headers named with <> syntax are searched for in one
or more standard places. Header files named with "" syntax are
first searched for in the "current directory," then (if not
found) in the same standard places.

Traditionally (especially under Unix compilers), the current
directory is taken to be the directory containing the file
containing the #include directive. Under other compilers,
however, the current directory (if any) is the directory in
which the compiler was initially invoked. Check your compiler
documentation.

References: K&R2 Sec. A12.4 p. 231; ANSI Sec. 3.8.2; ISO
Sec. 6.8.2; H&S Sec. 3.4 p. 55.

10.9: I'm getting strange syntax errors on the very first declaration
in a file, but it looks fine.

A: Perhaps there's a missing semicolon at the end of the last
declaration in the last header file you're #including. See also
questions 2.18 and 11.29.

10.11: I seem to be missing the system header file . Can
someone send me a copy?

A: Standard headers exist in part so that definitions appropriate
to your compiler, operating system, and processor can be
supplied. You cannot just pick up a copy of someone else's
header file and expect it to work, unless that person is using
exactly the same environment. Ask your compiler vendor why the
file was not provided (or to send a replacement copy).

10.12: How can I construct preprocessor #if expressions which compare
strings?

A: You can't do it directly; preprocessor #if arithmetic uses only
integers. You can #define several manifest constants, however,
and implement conditionals on those.

See also question 20.17.

References: K&R2 Sec. 4.11.3 p. 91; ANSI Sec. 3.8.1; ISO
Sec. 6.8.1; H&S Sec. 7.11.1 p. 225.

10.13: Does the sizeof operator work in preprocessor #if directives?

A: No. Preprocessing happens during an earlier phase of
compilation, before type names have been parsed. Instead of
sizeof, consider using the predefined constants in ANSI's
, if applicable, or perhaps a "configure" script.
(Better yet, try to write code which is inherently insensitive
to type sizes.)

References: ANSI Sec. 2.1.1.2, Sec. 3.8.1 footnote 83; ISO
Sec. 5.1.1.2, Sec. 6.8.1; H&S Sec. 7.11.1 p. 225.

10.14: Can I use an #ifdef in a #define line, to define something two
different ways?

A: No. You can't "run the preprocessor on itself," so to speak.
What you can do is use one of two completely separate #define
lines, depending on the #ifdef setting.

References: ANSI Sec. 3.8.3, Sec. 3.8.3.4; ISO Sec. 6.8.3,
Sec. 6.8.3.4; H&S Sec. 3.2 pp. 40-1.

10.15: Is there anything like an #ifdef for typedefs?

A: Unfortunately, no. (See also question 10.13.)

References: ANSI Sec. 2.1.1.2, Sec. 3.8.1 footnote 83; ISO
Sec. 5.1.1.2, Sec. 6.8.1; H&S Sec. 7.11.1 p. 225.

10.16: How can I use a preprocessor #if expression to tell if a machine
is big-endian or little-endian?

A: You probably can't. (Preprocessor arithmetic uses only long
integers, and there is no concept of addressing. ) Are you
sure you need to know the machine's endianness explicitly?
Usually it's better to write code which doesn't care ). See
also question 20.9.

References: ANSI Sec. 3.8.1; ISO Sec. 6.8.1; H&S Sec. 7.11.1
p. 225.

10.18: I inherited some code which contains far too many #ifdef's for
my taste. How can I preprocess the code to leave only one
conditional compilation set, without running it through the
preprocessor and expanding all of the #include's and #define's
as well?

A: There are programs floating around called unifdef, rmifdef, and
scpp ("selective C preprocessor") which do exactly this. See
question 18.16.

10.19: How can I list all of the pre#defined identifiers?

A: There's no standard way, although it is a common need. If the
compiler documentation is unhelpful, the most expedient way is
probably to extract printable strings from the compiler or
preprocessor executable with something like the Unix strings
utility. Beware that many traditional system-specific
pre#defined identifiers (e.g. "unix") are non-Standard (because
they clash with the user's namespace) and are being removed or
renamed.

10.20: I have some old code that tries to construct identifiers with a
macro like

#define Paste(a, b) a/**/b

but it doesn't work any more.

A: It was an undocumented feature of some early preprocessor
implementations (notably John Reiser's) that comments
disappeared entirely and could therefore be used for token
pasting. ANSI affirms (as did K&R1) that comments are replaced
with white space. However, since the need for pasting tokens
was demonstrated and real, ANSI introduced a well-defined token-
pasting operator, ##, which can be used like this:

#define Paste(a, b) a##b

See also question 11.17.

References: ANSI Sec. 3.8.3.3; ISO Sec. 6.8.3.3; Rationale
Sec. 3.8.3.3; H&S Sec. 3.3.9 p. 52.

10.22: Why is the macro

#define TRACE(n) printf("TRACE: %d\n", n)

giving me the warning "macro replacement within a string
literal"? It seems to be expanding

TRACE(count);
as
printf("TRACE: %d\count", count);

A: See question 11.18.

10.23: How can I use a macro argument inside a string literal in the
macro expansion?

A: See question 11.18.

10.25: I've got this tricky preprocessing I want to do and I can't
figure out a way to do it.

A: C's preprocessor is not intended as a general-purpose tool.
(Note also that it is not guaranteed to be available as a
separate program.) Rather than forcing it to do something
inappropriate, consider writing your own little special-purpose
preprocessing tool, instead. You can easily get a utility like
make(1) to run it for you automatically.

If you are trying to preprocess something other than C, consider
using a general-purpose preprocessor. (One older one available
on most Unix systems is m4.)

10.26: How can I write a macro which takes a variable number of
arguments?

A: One popular trick is to define and invoke the macro with a
single, parenthesized "argument" which in the macro expansion
becomes the entire argument list, parentheses and all, for a
function such as printf():

#define DEBUG(args) (printf("DEBUG: "), printf args)

if(n != 0) DEBUG(("n is %d\n", n));

The obvious disadvantage is that the caller must always remember
to use the extra parentheses.

gcc has an extension which allows a function-like macro to
accept a variable number of arguments, but it's not standard.
Other possible solutions are to use different macros (DEBUG1,
DEBUG2, etc.) depending on the number of arguments, to play
games with commas:

#define DEBUG(args) (printf("DEBUG: "), printf(args))
#define _ ,

DEBUG("i = %d" _ i)

It is often better to use a bona-fide function, which can take a
variable number of arguments in a well-defined way. See
questions 15.4 and 15.5.

No comments: