A compiler must check that the source program follows both the syntactic and semantic
conventions of the source language. This checking, called static checking, ensures
that certain kinds of programming errors will be detected and reported.
A compiler should report an error if an
operator is applied to an incompatible operand; for example, if an array
variable and a function variable are added together.
Statements that cause flow of
control to leave a construct must have some place to which to transfer
to transfer the flow of control. For example, a break statement in C
causes control to leave the smallest enclosing while, for, or switch
statement; an error occurs if such an enclosing statement doesn’t exist.
There are situations in which an object must be defined exactly once.
For example, in Pascal, an identifier must be declared uniquely, labels
in a case statement must be distinct, and elements in a scalar type may
not be repeated.
Sometimes, the same name must appear
two or more times. For example, in Ada, a loop or block may have a name
that appears at the beginning and end of the construct. The compiler
must check that the same name is used at both places.
A
type checker verifies that the type of a construct matches that expected by its
context. For example, the built-in arithmetic operator mod in Pascal requires
integer operands, so a type checker must verify that the operands of mod have
type integer. Similarly, the type checker must verify that dereferencing is
applied only to a pointer, that indexing is done only on an array, that a
user-defined function is applied to the correct number and type of arguments,
and so forth.
Type information gathered by a type
checker may be needed when code is generated. For example, arithmetic operators
like
+ usually apply to either integers
or reals, perhaps to other types, and we have to look at the context of + to
determine the sense that is intended. A symbol that can represent different
operations in different contexts is said to be” overloaded”. Overloading may be
accompanied by coercion of types, where a compiler supplies an operator to
convert an operand into the type checked by the context.
The following excerpts from the Pascal report and the C
reference manual, respectively, are examples of information that a compiler
writer might have to start with.
. “ If both operands of the arithmetic operators of
addition, subtraction and multiplication are of type integer, then the result
is of type integer.”
. “The result of
unary & operator is a pointer to the object referred to by the operand. If
the type of the operand is ‘….’, the type of the result is ‘pointer to….’ ”
In both Pascal and C, types are either basic or constructed. Basic types are atomic types with no internal structure as
far as the programmer is concerned. In Pascal, the basic types are boolean, character, integer, and real. Subrange types, like 1..10,and enumerated types,
like(violet, indigo, blue, green, yellow, orange, red) Can be treated as basic types. Pascal allows a programmer to
construct types from the basic types and other constructed types, with
arrays, records, and sets being examples. In addition, pointers and functions
can also be treated as constructed types.
TYPE
EXPRESSIONS
The
type of a language construct will be denoted by a “type expression”.
Informally, a type expression is either a basic type or is formed by applying
an operator called a type constructor to other type expressions. The sets of
basic types and constructors depend on the language to be checked.
DEFINITION
OF TYPE EXPRESSION
1)
A basic type is a type expression. Among the basic types are Boolean, char,
integer, and real. A special basic type, type_error, will signal an error
during type checking. Finally, a basic type void denoting “the absence of a
value” allows statements to be checked.
2)
Since type expressions may be named, a type name is a type expression. An
example of the use of type names appears in © below.
3)
A type constructor applied to type expressions is a type expression. Constructors
include:
a)
Arrays. If T is a type expression, then array (I, T)
is a type expression denoting the type of an array with elements of type T and
index set I. I is often a range of integers.
For example, the Pascal declaration
` var A: array [1.. 10] of
integer;
associates the type expression array
(1.. 10,integer) with A.
b)
Products. If T1and T2 are
type expressions, then their Cartesian product T1´T2
is a type expression. We assume that ´ associates to the left.
c)
Records. The difference between a record and a product is that
the fields of a record have names. The record type constructor will be applied
to a tuple formed from field names and field types. (Technically, the field
names should be part of the type constructor, but it is convenient to keep
field names together with their associated types.)
For
example, the Pascal program fragment
type
row = record
address: integer;
lexeme: array [1…15]
of char
end;
var table:
array [1…. 101] of row;
declares
the type name row representing the type expression
record ((address´integer)´(lexeme´array (1…15,char)))
and
the variable table to be an array of records of this type.
d)
Pointers. If T is a type
expression, then pointer (T) is a type expression denoting the type
“Pointer to an object of type T.”
e)
Functions. Mathematically, a function maps elements of one
set, the domain, to another set, the range. We may treat funtions in
programming languages as mapping a domain type D to a range type
R. The type of such a function will be defined by the type expression D → R
For
example, the built in function mod of Pascal has domain type int´int,
ie. a pair of integers ,and a range type
int.Thus, we say mod has type1
int
´ int
int
As
another example ,the Pascal declaration
Function f (a, b: char): integer;....
Says
that the domain type of f is denoted by char´ char and range type
by pointer (integer). The type of f is thus denoted by the type expression
Char´ char pointer (integer)
Often,
for implementation reasons, there are limitations on the type that a function
may return; e.g, no arrays or functions may be returned. However, there are
languages, if which Lisp is the most prominent example, that allow functions to
return objects of arbitrary types, so, for example, we can define a function
g of type
(integer
integer)
(integer
integer)
That
is, g takes as argument a function that maps an integer to an integer
and g produces as a result another function of the same type.
4)
Type expressions may contain variables whose values are type expressions
TYPE SYSTEMS
A
type system is a collection of rules for assigning type expressions the various
parts of the program. A type checker implements a type system. Different type
systems may be used by different compilers or processors of the same language
.For example, in Pascal, the type of an array includes the index set of the
array, so a function with an array argument can only be applied to arrays with
that index set. Many Pascal compilers however allow the index set to be left unspecified
when an array is passed as an argument. Thus these compilers use different type
systems than that in the Pascal language definition. Similarly, in the UNIX
system, the lint command examines C programs for possible bugs using a more
detailed type system than the C compiler itself uses.
STATIC AND DYNAMIC CHECKING OF TYPES
Checking
done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic. In principle, any check can be done dynamically,
if theTarget code carries the type of an element along with the value of the
element.
A
sound type system eliminate the need for dynamic checking for type errors
because it allows us to determine statically that these errors cannot occur
when the target program runs. That is, if a sound type system assigns a type
other than type_error to a program part, then type errors cannot occur when the
target code for the program part is run. A language is strongly typed if its
compile can guarantee that the program it accepts will execute without type
errors.
In
practice, some checks can be done only dynamically, For example, if we first
declare
table:
array [0.. 255] Of char;
i:
integer
And
then compute table [i], a compiler cannot in general guarantee that during
execution, the value of I will lie in the range 0 to 255.
ERROR
RECOVERY
Since
type checking has the potential for catching errors in the programs, it is
important for a type checker to do something reasonable when an error is
discovered .At the very least, the compiler must report the nature and location
of the error. It is desirable for the type checker to recover from the errors,
so it can check the rest of the input. Since error handling affects the type
checking rules, it has to be designed into the system right from the start; the
rules must be prepared to cope with errors.
The
inclusion of error handling may result in a type system that goes beyond the
one needed to specify correct programs. For example once an error has occurred,
we may not know the type of the incorrectly formed program fragment. Coping
with missing information requires techniques similar to those needed for
languages that do not require identifiers to be declared before they are used.
Type variables can be used to ensure consistent usage of undeclared or
apparently misdeclared identifiers.