Author : Ian Joyner
Page : << Previous 14 Next >>
outside world to see. The ()
operator is another bookkeeping task imposed on the C programmer. Pure
functional languages such as SML remove the variable/function
distinction altogether, by not having variables at all.
The removal of the variable/function distinction would remove the
need for a common use of C++'s inline functions. Inlines clutter the
name space of a class and add work for the programmer. All that is
required is to directly export a data member as a function.
C also has pointers to functions. Function pointers are analogous to
the call by name facility in ALGOL, and this was recognised as having
pitfalls. Consistent application of the object-oriented paradigm avoids
these pitfalls. A common use of function pointers is to explicitly set
up jump tables. The mechanism behind virtual functions is a jump table
of function pointers. The design of a program can take advantage of
this fact, without resorting to explicit jump tables. Another use is to
jump to a function in a table that is indexed by an input character. A
switch statement can cater for this mechanism that makes what is meant
explicit, while keeping underlying mechanisms (and possibly
optimisations) transparent. C++ allows function pointers to member
functions to be stored in tables (via the .* and ->* operators).
6.7. Metadata in Strings
The implementation of strings in C mixes metadata with data.
Metadata is data about an object, but is not part of the data itself.
Examples of metadata are addresses, size and type information. Such
metadata is often referred to as data descriptors, and can be kept
independently of the data, with the advantage that the programmer cannot
mistakenly corrupt the metadata.
In C strings, metadata about where a string terminates is stored in
the data as a terminating byte. This means that the distinction between
data and metadata is lost. The value chosen as the terminator cannot
occur in the data itself. The common alternative implementation is to
store a length byte in a fixed location preceding the string. This
length metadata can be hidden from the programmer who does not need to
know where the length metadata is stored. This implementation also has
the advantage that the length of a string can be easily obtained,
without having to count the number of elements up to the terminating
null.
6.8. ++, --
The increment and decrement operators are often used as an example
that C was designed as a high level assembler for PDP machines. These
operators provide a shorthand convenience, but are unnecessary. There
are no less than three ways to perform the same thing -
a = a + 1
a += 1
a++
++a
For full generality, only the first form is required, the others are
a mere convenience. The last two forms a++ and ++a are the postfix and
prefix forms. They are often used in the context of another expression.
Thus several updates can be performed in one expression. This is a very
powerful and convenient feature, but introduces side effects into an
expression that sometimes have surprising effects, and can lead to
program errors. The following example is given on p.46 of the C++ ARM -
i = v[i++]; // the value of 'i' is undefined
The ARM points out that compilers should detect such cases, but the
exact interpretation appears to be left to the implementation, which
contributes to non-portability. If this can't be defined for a
sequential processor, then it is even worse for a concurrent
environment.
The shorthand += and -= are more powerful as values other than 1 can
increment the variable. It has been suggested that there should also be
&&= and ||= operators.
If it is mistakenly believed that a multiplicity of operators is
required to produce more optimal code, then it should be pointed out
that code generators, especially for expressions, can produce the best
code for a target architecture. A plethora of operators complicates the
task of an optimiser. A compiler can optimise well beyond what a
programmer can do. An optimising compiler will analyse the surrounding
code, and if an entity is used several times in a local scope, it will
keep the value of that entity handy locally at the top of a stack, or in
a register, rather than retrieve it from slow main memory several times.
The nature of such optimisations depends on the machines architecture,
which a programmer should not have to be aware of. Open systems demands
that programs can be ported amongst diverse architectures and
environments, very different to the original machine, and not only run,
but run efficiently. Optimisers work best with simple, well defined
languages.
In fact constructs such as:
while (*s1++ = *s2++);
might look optimal to C programmers, but are the antithesis of
efficiency. Such constructs preclude compiler optimisation for
processors with specific string handling instructions. A simple
assignment is better for strings, as it will allow the compiler to
generate optimal code for different target platforms. String assignment
will also hide the implementation details of strings. If the target
processor does not have string instructions, then the compiler should be
responsible for generating the above loop code, rather than requiring
the programmer to write such low level constructs. The above loop
construct for string copying is also the contrary to safety, as there is
no check that the destination does not overflow. The above code also
makes explicit the underlying C implementation of strings, that are null
terminated. Such examples show why C cannot be regarded as a high level
language, but rather as a high level assembler.
As with name overloading, memory storage update is a problematic, but
necessary part of programming. A language should provide it in a
consistent and expected way. Many languages recognise that memory
update is problematic, and typically only provide limited but sufficient
ways of updating, by an assignment operation. (Many languages have
block memory copies as well, but assignment can also provide block
copy.) Furthermore, many languages avoid side-effects by limiting
updates to only one per statement. C provides too many ways to update
memory. These add nothing to the generality of the language, increase
the opportunity for error, and complicate automatic optimisation.
Restrictive practices are justifiable in order to accomplish correctly
functioning and efficient software.
6.9. Defines
The define declaration -
#define d(<parameters>)
has a different effect to -
#define d (<parameters>)
The second form defines d as (<parameters>). Extra white space
between tokens should not affect semantics of constructs.
#defines are poorly integrated with the language. The '#define' must
be in column 1, and knows nothing about scope rules. Errors in defines
can lead to obscure errors, as the preprocessor does not detect them,
but leaves them for the compiler. Programmers must be familiar with the
particular preprocessor implementation on their system, as preprocessor
implementations are different, particularly between Classic C and ANSI
C.
6.10. NULL vs 0
[Ellemtel 92] recommends that pointers should not be compared to, or
assigned to NULL, but to 0. Stylistically, NULL would be preferable.
It would also allow for environments where null pointers have a value
other than 0. ANSI-C, however, has subtle problems with the definition
of NULL.
6.11. Case Distinction
It is good to adopt typographic conventions for names, but
distinguishing between upper and lower case in names can cause
confusion. Confusion leads to errors and systems that are difficult to
maintain and modify. Case distinction is based on the implementation
paradigm of how character codes work. Why do we have names? To give
entities identity, and aid our memory of that identity.
Philosophically, case distinction is contrary to the fundamental purpose
of names.
Case distinction in interactive systems is a poor user interface. It
is clumsy having to continually use the shift key, and will slow a good
typist. More importantly, case distinction makes names harder to
remember, and so is contrary to the purpose of aiding memory. It is
difficult enough for users to remember command mnemonics or file names,
let alone exactly the case. Names are used instead of difficult to
remember addresses. If we did not have names, we would have to retrieve
files by addresses, or call people by their social security number.
Consider the paradigm of letters and words. Words are spelt by
assembling letters in order. There are 26 distinct letters. With the
addition of digits 0 to 9, and the underscore character, we have a
complete and correct definition for identifiers. Letters can be written
in a number of styles. They can be bold, italic, upper or lower case.
Such typographic representations, however, do not change the semantics
of a word. Thus if we write ALGOL, Algol or algol, we
Page : << Previous 14 Next >>