Thursday, August 2, 2007

Expressions

Section 3. Expressions

3.1: Why doesn't this code:

a[i] = i++;

work?

A: The subexpression i++ causes a side effect -- it modifies i's
value -- which leads to undefined behavior since i is also
referenced elsewhere in the same expression. (Note that
although the language in K&R suggests that the behavior of this
expression is unspecified, the C Standard makes the stronger
statement that it is undefined -- see question 11.33.)

References: K&R1 Sec. 2.12; K&R2 Sec. 2.12; ANSI Sec. 3.3; ISO
Sec. 6.3.

3.2: Under my compiler, the code

int i = 7;
printf("%d\n", i++ * i++);

prints 49. Regardless of the order of evaluation, shouldn't it
print 56?

A: Although the postincrement and postdecrement operators ++ and --
perform their operations after yielding the former value, the
implication of "after" is often misunderstood. It is *not*
guaranteed that an increment or decrement is performed
immediately after giving up the previous value and before any
other part of the expression is evaluated. It is merely
guaranteed that the update will be performed sometime before the
expression is considered "finished" (before the next "sequence
point," in ANSI C's terminology; see question 3.8). In the
example, the compiler chose to multiply the previous value by
itself and to perform both increments afterwards.

The behavior of code which contains multiple, ambiguous side
effects has always been undefined. (Loosely speaking, by
"multiple, ambiguous side effects" we mean any combination of
++, --, =, +=, -=, etc. in a single expression which causes the
same object either to be modified twice or modified and then
inspected. This is a rough definition; see question 3.8 for a
precise one, and question 11.33 for the meaning of "undefined.")
Don't even try to find out how your compiler implements such
things (contrary to the ill-advised exercises in many C
textbooks); as K&R wisely point out, "if you don't know *how*
they are done on various machines, that innocence may help to
protect you."

References: K&R1 Sec. 2.12 p. 50; K&R2 Sec. 2.12 p. 54; ANSI
Sec. 3.3; ISO Sec. 6.3; CT&P Sec. 3.7 p. 47; PCS Sec. 9.5 pp.
120-1.

3.3: I've experimented with the code

[CENSORED]

on several compilers. Some gave i the value 3, some gave 4, but
one gave 7. I know the behavior is undefined, but how could it
give 7?

A: [I apologize for the censorship of the question, but the
expression that used to be there was indecent, and by the
newly-passed Communications Decency Act of the U.S., I am
prohibited from transmitting "indecent" material, whatever that
is. Suffice it to say that the expression tried to modify the
same variable twice between sequence points. --scs]

Undefined behavior means *anything* can happen. See questions
3.9 and 11.33. (Also, note that neither i++ nor ++i is the same
as i+1. If you want to increment i, use i=i+1 or i++ or ++i,
not some combination. See also question 3.12.)

3.4: Can I use explicit parentheses to force the order of evaluation
I want? Even if I don't, doesn't precedence dictate it?

A: Not in general.

Operator precedence and explicit parentheses impose only a
partial ordering on the evaluation of an expression. In the
expression

f() + g() * h()

although we know that the multiplication will happen before the
addition, there is no telling which of the three functions will
be called first.

When you need to ensure the order of subexpression evaluation,
you may need to use explicit temporary variables and separate
statements.

References: K&R1 Sec. 2.12 p. 49, Sec. A.7 p. 185; K&R2
Sec. 2.12 pp. 52-3, Sec. A.7 p. 200.

3.5: But what about the && and || operators?
I see code like "while((c = getchar()) != EOF && c != '\n')" ...

A: There is a special exception for those operators (as well as the
?: operator): left-to-right evaluation is guaranteed (as is an
intermediate sequence point, see question 3.8). Any book on C
should make this clear.

References: K&R1 Sec. 2.6 p. 38, Secs. A7.11-12 pp. 190-1; K&R2
Sec. 2.6 p. 41, Secs. A7.14-15 pp. 207-8; ANSI Sec. 3.3.13,
Sec. 3.3.14, Sec. 3.3.15; ISO Sec. 6.3.13, Sec. 6.3.14,
Sec. 6.3.15; H&S Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 218-20,
Sec. 7.12.1 p. 229; CT&P Sec. 3.7 pp. 46-7.

3.8: How can I understand these complex expressions? What's a
"sequence point"?

A: A sequence point is the point (at the end of a full expression,
or at the ||, &&, ?:, or comma operators, or just before a
function call) at which the dust has settled and all side
effects are guaranteed to be complete. The ANSI/ISO C Standard
states that

Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

See also question 3.9 below.

References: ANSI Sec. 2.1.2.3, Sec. 3.3, Appendix B; ISO
Sec. 5.1.2.3, Sec. 6.3, Annex C; Rationale Sec. 2.1.2.3; H&S
Sec. 7.12.1 pp. 228-9.

3.9: So given

a[i] = i++;

we don't know which cell of a[] gets written to, but i does get
incremented by one.

A: *No.* Once an expression or program becomes undefined, *all*
aspects of it become undefined. See questions 3.2, 3.3, 11.33,
and 11.35.

3.12: If I'm not using the value of the expression, should I use i++
or ++i to increment a variable?

A: Since the two forms differ only in the value yielded, they are
entirely equivalent when only their side effect is needed.

See also question 3.3.

References: K&R1 Sec. 2.8 p. 43; K&R2 Sec. 2.8 p. 47; ANSI
Sec. 3.3.2.4, Sec. 3.3.3.1; ISO Sec. 6.3.2.4, Sec. 6.3.3.1; H&S
Sec. 7.4.4 pp. 192-3, Sec. 7.5.8 pp. 199-200.


3.14: Why doesn't the code

int a = 1000, b = 1000;
long int c = a * b;

work?

A: Under C's integral promotion rules, the multiplication is
carried out using int arithmetic, and the result may overflow or
be truncated before being promoted and assigned to the long int
left-hand side. Use an explicit cast to force long arithmetic:

long int c = (long int)a * b;

Note that (long int)(a * b) would *not* have the desired effect.

A similar problem can arise when two integers are divided, with
the result assigned to a floating-point variable.

References: K&R1 Sec. 2.7 p. 41; K&R2 Sec. 2.7 p. 44; ANSI
Sec. 3.2.1.5; ISO Sec. 6.2.1.5; H&S Sec. 6.3.4 p. 176; CT&P
Sec. 3.9 pp. 49-50.

3.16: I have a complicated expression which I have to assign to one of
two variables, depending on a condition. Can I use code like
this?

((condition) ? a : b) = complicated_expression;

A: No. The ?: operator, like most operators, yields a value, and
you can't assign to a value. (In other words, ?: does not yield
an "lvalue".) If you really want to, you can try something like

*((condition) ? &a : &b) = complicated_expression;

although this is admittedly not as pretty.

References: ANSI Sec. 3.3.15 esp. footnote 50; ISO Sec. 6.3.15;
H&S Sec. 7.1 pp. 179-180.

No comments: