Thursday, August 2, 2007

Structures, Unions, and Enumerations

Section 2. Structures, Unions, and Enumerations

2.1: What's the difference between these two declarations?

struct x1 { ... };
typedef struct { ... } x2;

A: The first form declares a "structure tag"; the second declares a
"typedef". The main difference is that the second declaration
is of a slightly more abstract type -- its users don't
necessarily know that it is a structure, and the keyword struct
is not used when declaring instances of it.

2.2: Why doesn't

struct x { ... };
x thestruct;

work?

A: C is not C++. Typedef names are not automatically generated for
structure tags. See also question 2.1 above.

2.3: Can a structure contain a pointer to itself?

A: Most certainly. See question 1.14.

2.4: What's the best way of implementing opaque (abstract) data types
in C?

A: One good way is for clients to use structure pointers (perhaps
additionally hidden behind typedefs) which point to structure
types which are not publicly defined.

2.6: I came across some code that declared a structure like this:

struct name {
int namelen;
char namestr[1];
};

and then did some tricky allocation to make the namestr array
act like it had several elements. Is this legal or portable?

A: This technique is popular, although Dennis Ritchie has called it
"unwarranted chumminess with the C implementation." An official
interpretation has deemed that it is not strictly conforming
with the C Standard. (A thorough treatment of the arguments
surrounding the legality of the technique is beyond the scope of
this list.) It does seem to be portable to all known
implementations. (Compilers which check array bounds carefully
might issue warnings.)

Another possibility is to declare the variable-size element very
large, rather than very small; in the case of the above example:

...
char namestr[MAXSIZE];
...

where MAXSIZE is larger than any name which will be stored.
However, it looks like this technique is disallowed by a strict
interpretation of the Standard as well.

References: Rationale Sec. 3.5.4.2.

2.7: I heard that structures could be assigned to variables and
passed to and from functions, but K&R1 says not.

A: What K&R1 said was that the restrictions on structure operations
would be lifted in a forthcoming version of the compiler, and in
fact structure assignment and passing were fully functional in
Ritchie's compiler even as K&R1 was being published. Although a
few early C compilers lacked these operations, all modern
compilers support them, and they are part of the ANSI C
standard, so there should be no reluctance to use them.

(Note that when a structure is assigned, passed, or returned,
the copying is done monolithically; anything pointed to by any
pointer fields is *not* copied.)

References: K&R1 Sec. 6.2 p. 121; K&R2 Sec. 6.2 p. 129; ANSI
Sec. 3.1.2.5, Sec. 3.2.2.1, Sec. 3.3.16; ISO Sec. 6.1.2.5,
Sec. 6.2.2.1, Sec. 6.3.16; H&S Sec. 5.6.2 p. 133.

2.8: Why can't you compare structures?

A: There is no single, good way for a compiler to implement
structure comparison which is consistent with C's low-level
flavor. A simple byte-by-byte comparison could founder on
random bits present in unused "holes" in the structure (such
padding is used to keep the alignment of later fields correct;
see question 2.12). A field-by-field comparison might require
unacceptable amounts of repetitive code for large structures.

If you need to compare two structures, you'll have to write your
own function to do so, field by field.

References: K&R2 Sec. 6.2 p. 129; ANSI Sec. 4.11.4.1 footnote
136; Rationale Sec. 3.3.9; H&S Sec. 5.6.2 p. 133.

2.9: How are structure passing and returning implemented?

A: When structures are passed as arguments to functions, the entire
structure is typically pushed on the stack, using as many words
as are required. (Programmers often choose to use pointers to
structures instead, precisely to avoid this overhead.) Some
compilers merely pass a pointer to the structure, though they
may have to make a local copy to preserve pass-by-value
semantics.

Structures are often returned from functions in a location
pointed to by an extra, compiler-supplied "hidden" argument to
the function. Some older compilers used a special, static
location for structure returns, although this made structure-
valued functions non-reentrant, which ANSI C disallows.

References: ANSI Sec. 2.2.3; ISO Sec. 5.2.3.

2.10: How can I pass constant values to functions which accept
structure arguments?

A: C has no way of generating anonymous structure values. You will
have to use a temporary structure variable or a little structure-
building function. (gcc provides structure constants as an
extension, and the mechanism will probably be added to a future
revision of the C Standard.) See also question 4.10.

2.11: How can I read/write structures from/to data files?

A: It is relatively straightforward to write a structure out using
fwrite():

fwrite(&somestruct, sizeof somestruct, 1, fp);

and a corresponding fread invocation can read it back in.
(Under pre-ANSI C, a (char *) cast on the first argument is
required. What's important is that fwrite() receive a byte
pointer, not a structure pointer.) However, data files so
written will *not* be portable (see questions 2.12 and 20.5).
Note also that if the structure contains any pointers, only the
pointer values will be written, and they are most unlikely to be
valid when read back in. Finally, note that for widespread
portability you must use the "b" flag when fopening the files;
see question 12.38.

A more portable solution, though it's a bit more work initially,
is to write a pair of functions for writing and reading a
structure, field-by-field, in a portable (perhaps even human-
readable) way.

References: H&S Sec. 15.13 p. 381.

2.12: My compiler is leaving holes in structures, which is wasting
space and preventing "binary" I/O to external data files. Can I
turn off the padding, or otherwise control the alignment of
structure fields?

A: Your compiler may provide an extension to give you this control
(perhaps a #pragma; see question 11.20), but there is no
standard method.

See also question 20.5.

References: K&R2 Sec. 6.4 p. 138; H&S Sec. 5.6.4 p. 135.

2.13: Why does sizeof report a larger size than I expect for a
structure type, as if there were padding at the end?

A: Structures may have this padding (as well as internal padding),
if necessary, to ensure that alignment properties will be
preserved when an array of contiguous structures is allocated.
Even when the structure is not part of an array, the end padding
remains, so that sizeof can always return a consistent size.
See question 2.12 above.

References: H&S Sec. 5.6.7 pp. 139-40.

2.14: How can I determine the byte offset of a field within a
structure?

A: ANSI C defines the offsetof() macro, which should be used if
available; see . If you don't have it, one possible
implementation is

#define offsetof(type, mem) ((size_t) \
((char *)&((type *)0)->mem - (char *)(type *)0))

This implementation is not 100% portable; some compilers may
legitimately refuse to accept it.

See question 2.15 below for a usage hint.

References: ANSI Sec. 4.1.5; ISO Sec. 7.1.6; Rationale
Sec. 3.5.4.2; H&S Sec. 11.1 pp. 292-3.

2.15: How can I access structure fields by name at run time?

A: Build a table of names and offsets, using the offsetof() macro.
The offset of field b in struct a is

offsetb = offsetof(struct a, b)

If structp is a pointer to an instance of this structure, and
field b is an int (with offset as computed above), b's value can
be set indirectly with

*(int *)((char *)structp + offsetb) = value;

2.18: This program works correctly, but it dumps core after it
finishes. Why?

struct list {
char *item;
struct list *next;
}

/* Here is the main program. */

main(argc, argv)
{ ... }

A: A missing semicolon causes main() to be declared as returning a
structure. (The connection is hard to see because of the
intervening comment.) Since structure-valued functions are
usually implemented by adding a hidden return pointer (see
question 2.9), the generated code for main() tries to accept
three arguments, although only two are passed (in this case, by
the C start-up code). See also questions 10.9 and 16.4.

References: CT&P Sec. 2.3 pp. 21-2.

2.20: Can I initialize unions?

A: ANSI Standard C allows an initializer for the first member of a
union. There is no standard way of initializing any other
member (nor, under a pre-ANSI compiler, is there generally any
way of initializing a union at all).

References: K&R2 Sec. 6.8 pp. 148-9; ANSI Sec. 3.5.7; ISO
Sec. 6.5.7; H&S Sec. 4.6.7 p. 100.

2.22: What is the difference between an enumeration and a set of
preprocessor #defines?

A: At the present time, there is little difference. Although many
people might have wished otherwise, the C Standard says that
enumerations may be freely intermixed with other integral types,
without errors. (If such intermixing were disallowed without
explicit casts, judicious use of enumerations could catch
certain programming errors.)

Some advantages of enumerations are that the numeric values are
automatically assigned, that a debugger may be able to display
the symbolic values when enumeration variables are examined, and
that they obey block scope. (A compiler may also generate
nonfatal warnings when enumerations and integers are
indiscriminately mixed, since doing so can still be considered
bad style even though it is not strictly illegal.) A
disadvantage is that the programmer has little control over
those nonfatal warnings; some programmers also resent not having
control over the sizes of enumeration variables.

References: K&R2 Sec. 2.3 p. 39, Sec. A4.2 p. 196; ANSI
Sec. 3.1.2.5, Sec. 3.5.2, Sec. 3.5.2.2, Appendix E; ISO
Sec. 6.1.2.5, Sec. 6.5.2, Sec. 6.5.2.2, Annex F; H&S Sec. 5.5
pp. 127-9, Sec. 5.11.2 p. 153.

2.24: Is there an easy way to print enumeration values symbolically?

A: No. You can write a little function to map an enumeration
constant to a string. (If all you're worried about is
debugging, a good debugger should automatically print
enumeration constants symbolically.)

No comments: