Computer Tech News: Type Checking in Compiler

TYPE CHECKING

A compiler must check that the source program follows both the syntactic and semantic conventions of the source language. This checking, called static checking, ensures that certain kinds of programming errors will be detected and reported.

Examples of static checks include:

Type checks: A compiler should report an error if an operator is applied to an incompatible operand; for example, if an array variable and a function variable are added together.
Flow-of-control checks: Statements that cause flow of control to leave a construct must have some place to which to transfer to transfer the flow of control. For example, a break statement in C causes control to leave the smallest enclosing while, for, or switch statement; an error occurs if such an enclosing statement doesn’t exist.

Uniqueness checks: There are situations in which an object must be defined exactly once. For example, in Pascal, an identifier must be declared uniquely, labels in a case statement must be distinct, and elements in a scalar type may not be repeated.
Name-related checks: Sometimes, the same name must appear two or more times. For example, in Ada, a loop or block may have a name that appears at the beginning and end of the construct. The compiler must check that the same name is used at both places.

A type checker verifies that the type of a construct matches that expected by its context. For example, the built-in arithmetic operator mod in Pascal requires integer operands, so a type checker must verify that the operands of mod have type integer. Similarly, the type checker must verify that dereferencing is applied only to a pointer, that indexing is done only on an array, that a user-defined function is applied to the correct number and type of arguments, and so forth.

Type information gathered by a type checker may be needed when code is generated. For example, arithmetic operators like + usually apply to either integers or reals, perhaps to other types, and we have to look at the context of + to determine the sense that is intended. A symbol that can represent different operations in different contexts is said to be” overloaded”. Overloading may be accompanied by coercion of types, where a compiler supplies an operator to convert an operand into the type checked by the context.

TYPE SYSTEMS

The design of a type checker for a language is based on information about the syntactic constructs in the language, the notion of types, and the rules for assigning types to language constructs.

The following excerpts from the Pascal report and the C reference manual, respectively, are examples of information that a compiler writer might have to start with.

. “ If both operands of the arithmetic operators of addition, subtraction and multiplication are of type integer, then the result is of type integer.”

. “The result of unary & operator is a pointer to the object referred to by the operand. If the type of the operand is ‘….’, the type of the result is ‘pointer to….’ ”

Implicit in the above excerpts is the idea that each expression has a type associated with it. Types have structure, the type “pointer to …” is constructed from the type that “….”refers to.

In both Pascal and C, types are either basic or constructed. Basic types are atomic types with no internal structure as far as the programmer is concerned. In Pascal, the basic types are boolean, character, integer, and real. Subrange types, like 1..10,and enumerated types, like(violet, indigo, blue, green, yellow, orange, red) Can be treated as basic types. Pascal allows a programmer to construct types from the basic types and other constructed types, with arrays, records, and sets being examples. In addition, pointers and functions can also be treated as constructed types.

TYPE EXPRESSIONS

The type of a language construct will be denoted by a “type expression”. Informally, a type expression is either a basic type or is formed by applying an operator called a type constructor to other type expressions. The sets of basic types and constructors depend on the language to be checked.

DEFINITION OF TYPE EXPRESSION

1) A basic type is a type expression. Among the basic types are Boolean, char, integer, and real. A special basic type, type_error, will signal an error during type checking. Finally, a basic type void denoting “the absence of a value” allows statements to be checked.

2) Since type expressions may be named, a type name is a type expression. An example of the use of type names appears in © below.

3) A type constructor applied to type expressions is a type expression. Constructors include:

a) Arrays. If T is a type expression, then array (I, T) is a type expression denoting the type of an array with elements of type T and index set I. I is often a range of integers.

For example, the Pascal declaration

` var A: array [1.. 10] of integer;

associates the type expression array (1.. 10,integer) with A.

b) Products. If T₁and T₂ are type expressions, then their Cartesian product T₁´T₂ is a type expression. We assume that ´ associates to the left.

c) Records. The difference between a record and a product is that the fields of a record have names. The record type constructor will be applied to a tuple formed from field names and field types. (Technically, the field names should be part of the type constructor, but it is convenient to keep field names together with their associated types.)

For example, the Pascal program fragment

type row = record

address: integer;

lexeme: array [1…15] of char

end;

var table: array [1…. 101] of row;

declares the type name row representing the type expression

record ((address´integer)´(lexeme´array (1…15,char)))

and the variable table to be an array of records of this type.

d) Pointers. If T is a type expression, then pointer (T) is a type expression denoting the type “Pointer to an object of type T.”

e) Functions. Mathematically, a function maps elements of one set, the domain, to another set, the range. We may treat funtions in programming languages as mapping a domain type D to a range type R. The type of such a function will be defined by the type expression D → R

For example, the built in function mod of Pascal has domain type int´int, ie. a pair of integers ,and a range type int.Thus, we say mod has type¹

int ´ int

int

As another example ,the Pascal declaration

Function f (a, b: char): integer;....

Says that the domain type of f is denoted by char´ char and range type by pointer (integer). The type of f is thus denoted by the type expression

Char´ char

pointer (integer)

Often, for implementation reasons, there are limitations on the type that a function may return; e.g, no arrays or functions may be returned. However, there are languages, if which Lisp is the most prominent example, that allow functions to return objects of arbitrary types, so, for example, we can define a function g of type

(integer

integer)

(integer

integer)

That is, g takes as argument a function that maps an integer to an integer and g produces as a result another function of the same type.

4) Type expressions may contain variables whose values are type expressions

TYPE SYSTEMS

A type system is a collection of rules for assigning type expressions the various parts of the program. A type checker implements a type system. Different type systems may be used by different compilers or processors of the same language .For example, in Pascal, the type of an array includes the index set of the array, so a function with an array argument can only be applied to arrays with that index set. Many Pascal compilers however allow the index set to be left unspecified when an array is passed as an argument. Thus these compilers use different type systems than that in the Pascal language definition. Similarly, in the UNIX system, the lint command examines C programs for possible bugs using a more detailed type system than the C compiler itself uses.

STATIC AND DYNAMIC CHECKING OF TYPES

Checking done by a compiler is said to be static, while checking done when the target program runs is termed dynamic. In principle, any check can be done dynamically, if theTarget code carries the type of an element along with the value of the element.

A sound type system eliminate the need for dynamic checking for type errors because it allows us to determine statically that these errors cannot occur when the target program runs. That is, if a sound type system assigns a type other than type_error to a program part, then type errors cannot occur when the target code for the program part is run. A language is strongly typed if its compile can guarantee that the program it accepts will execute without type errors.

In practice, some checks can be done only dynamically, For example, if we first declare

table: array [0.. 255] Of char;

i: integer

And then compute table [i], a compiler cannot in general guarantee that during execution, the value of I will lie in the range 0 to 255.

ERROR RECOVERY

Since type checking has the potential for catching errors in the programs, it is important for a type checker to do something reasonable when an error is discovered .At the very least, the compiler must report the nature and location of the error. It is desirable for the type checker to recover from the errors, so it can check the rest of the input. Since error handling affects the type checking rules, it has to be designed into the system right from the start; the rules must be prepared to cope with errors.

The inclusion of error handling may result in a type system that goes beyond the one needed to specify correct programs. For example once an error has occurred, we may not know the type of the incorrectly formed program fragment. Coping with missing information requires techniques similar to those needed for languages that do not require identifiers to be declared before they are used. Type variables can be used to ensure consistent usage of undeclared or apparently misdeclared identifiers.

Computer Tech News

Search This Blog

Wednesday, 27 February 2013

Type Checking in Compiler

No comments:

Post a Comment

Pages

About Me

Total Pageviews