Collector

rascal-0.40.16

typepal-0.14.8

Synopsis

A Collector collects constraints from source code and produces an initial TModel.

Description

A Collector is a statefull object that provides all the functions described below to access and change its internal state. The global services provided by a Collector are:

Register facts, calculators, and requirements as collected from the source program.
Maintain a global (key,value) store to store global information relevant for the collection process. Typical examples are: Configuration information. The files that have been imported.
Manage scopes.
Maintain a single value per scope. This enables decoupling the collection of information from separate but related language constructs. Typical examples are: While collecting information from a function declaration: create a new function scope and associate the required return type with it so that return statements in the function body can check that (a) they occur inside a function; (b) that the type of their returned value is compatible with the required return type. While collecting information from an optionally labelled loop statement: create a new loop scope and associate the label with it so that break/continue statements can check that: (a) they occur inside a loop statement; (b) which loop statement they should (dis)continue.
Reporting.

The functions provided by a Collector are summarized below:

Three dozen functions are available that fall into the following categories:

Lifecycle of Collector: create a new Collector, use it to collect information from a source program, and produce a TModel.
Configuration: retrieve or modify configuration settings.
Scoping: enter or leave a scope, ask for current scope.
Scope Info: add information to the current scope or retrieve that information.
Nested Info: maintain nested information during collection; this is only available during collection.
Composition: add another TModel to the current one.
Reporting: report errors, warnings and info messages.
Add Path: add paths to the scope graph.
Define: define identifiers in various ways.
Use: use identifiers in various ways.
Inference: create new type variables for type inference.
Facts: establish or retrieve facts.
Calculate: define calculators.
Require: define requirements.

Technically, Collector is a datatype with a single constructur and a number of functions as fields. For instance, given a Collector named c, calling the define function amounts to: c.define(_the-arguments-of-define_). All Collector functions are prefixed with /* Collector field */ to emphasize that they are a field of the Collector datatype.

LifeCycle of Collector

A new Collector is created using the function newCollector.

Collector newCollector(str modelName, Tree pt, TypePalConfig config = tconfig());   

where

modelName is the name of the TModel to be created (used for logging);
pt is the parse tree of the source program to be type checked;
config is a Configuration.

Once a Collector has been created, the user-defined collect function is invoked with the current parse tree of a source program and the Collector as arguments. The collect function is applied recursively until all information has been collected from the source program.

Finally, run creates the desired TModel that will be used by the Solver:

/* Collector field */ TModel () run;

A typical scenario is (for a given a parse tree pt of the program to be checked):

c = newCollector("my_model", pt);   // create Collector, default TypePalConfig
collect(pt, c);                     // collect constraints
model = c.run();                    // create initial TModel to be handled by the Solver

The collect function has to be supplied by the author of the type checker and looks like this:

void collect(LanguageConstruct lc, Collector c){ ... }

where:

lc is a syntactic type from the language under consideration.
c is a Collector.

::: Caution Each collect function is responsible for collecting constraints from its subtrees. :::

Configuration

The Configuration can be retrieved or adjusted by the following two functions:

/* Collector field */ TypePalConfig () getConfig;
/* Collector field */ void (TypePalConfig cfg) setConfig;

The former returns the current TypePal configuration, the latter sets the current configuration to a new configuration.

Scoping

Scope management amounts to entering a new scope, leave the current scope and retrieving the current scope:

/* Collector field */ void (Tree inner) enterScope;
/* Collector field */ void (list[Tree] trees) enterCompositeScope;
/* Collector field */ void (Tree inner) enterLubScope;
/* Collector field */ void (list[Tree] trees) enterCompositeLubScope;
/* Collector field */ void (Tree inner) leaveScope;
/* Collector field */ void (list[Tree] trees) leaveCompositeScope;
/* Collector field */ loc () getScope,

In order to check consistency, leaveScope has the inner scope that it is supposed to be leaving as argument.

Here is a simple example how the let expression in the Fun language handles subscopes:

void collect(current: (Expression) `let <Id name> : <Type tp> = <Expression exp1> in <Expression exp2> end`, Collector c) {  
     c.enterScope(current);
         c.define("<name>", variableId(), name, defType(tp));
         c.calculate("let", current, [exp2], AType(Solver s) { return s.getType(exp2); } );
         collect(tp, exp1, exp2, c);  
     c.leaveScope(current);
}

A composite scope is a scope that corresponds with nonconsecutive parts of the source code and is represented by a list of Trees.

A lub scope is a scope in which the type of not explcitly declared variables is determined by the least-upper bound (LUB) of the types of their uses. Lub scopes are, for instance, used to implement local type inference in Rascal.

A composite lub scope is a scope that corresponds with nonconsecutive parts of the source code and locally infers the types in those scopes.

Scope Info

It is possible to associate auxiliary information with each scope. This enables the downward propagation of information during the topdown traversal of the source program by collect. Typical use cases are:

recording the return type of a function declaration and checking that all return statements in the body of that function have a type that is compatible with the declared return type.
recording the label of a loop statement for the benefit of nested break or continue statements.

Scopes are identified by their source location and ScopeRole: a user-defined data type that distinsguishes possible roles such as functionScope() or labelScope().

setScopeInfo sets the information for a scope:

/* Collector field */ void (loc scope, ScopeRole scopeRole, value info) setScopeInfo;

where

scope is the scope for which information is to be set.
scopeRole is the role of that scope.
info is the associated information.

getScopeInfo retrieves associated scope information:

/* Collector field */ lrel[loc scope, value scopeInfo]  (ScopeRole scopeRole) getScopeInfo;

where

scopeRole is the role of the scope we are looking for.

getScopeInfo returns a list relation containing scope/scopeInfo pairs (ordered from innermost to outermost scope).

Let's illustrate ScopeInfo with a stripped down version of how the Rascal type checker handles while and break statements:

data LoopInfo = loopInfo(str name);     ❶  

void collect(current: (Statement) `<Label label> while( <{Expression ","}+ conditions> ) <Statement body>`,  Collector c){
    c.enterScope(current); 
        ...
        loopName = "<label.name>";
        c.setScopeInfo(c.getScope(), loopScope(), loopInfo(loopName));     ❷  
        ...
    c.leaveScope(current);
}

void collect(current:(Statement) `break <Target target>;`, Collector c){
    ...
    loopName = "<target.name>"
    for(<scope, scopeInfo> <- c.getScopeInfo(loopScope())){     ❸  
        if(loopInfo(loopName1) := scopeInfo){
            if(loopName == "" || loopName == loopName1){
                collect(target, c);
                return;
             }
        } else {
            throw rascalCheckerInternalError(getLoc(current), "Inconsistent info from loop scope: <scopeInfo>");
        }
    }
    c.report(error(current, "Break outside a while/do/for statement"));
}

❶ Introduces a data type to represent loop information.
❷ When handling a while statement, the current scope is marked as loopScope and loopInfo is associated with it.
❸ When handling a break statement, we get all available ScopeInfo for loopScopes (innermost first) and check the associated loopInfo.

Nested Info

An arbitrary number of push down stacks can be maintained during the topdown traversal of the source code that is being type checked. A use case is recording that a certain syntax type is encountered and make children aware of this, e.g. "we are inside a parameter list".

Each stack has a string name (key) and is created on demand.

/* Collector field */ void (str key, value val) push
/* Collector field */ value (str key) pop,
/* Collector field */ value (str key) top,
/* Collector field */ list[value] (str key) getStack,
/* Collector field */ void (str key) clearStack,

push, pop, and top perform standard stack operations. push creates a stack when needed, while top and pop require the existence of the named stack. getStack returns all values in the named stack, while clearStack resets it to empty.

These stacks are stored in the global store, see Solver.

caution

With the availability of a global store come serious responsibilities. Don't overuse it, do not misuse it.

Composition

TModels can be composed by adding the information from one TModel to the other. A use case is module composition.

/* Collector field */ void (TModel tm) addTModel;

addTModel adds the information in tm to the current Collector.

Reporting

One or more reports can be added by report and reports:

/* Collector field */ void (FailMessage fmsg) report;
/* Collector field */ void (list[FailMessage] fmgs) reports;

See Reporting for a description of FailMessage.

caution

If one of the messages is error the execution of the current calculator or requirement is immediately terminated.

Add Path

TypePal is based on nested scopes and path between scopes. The former represent textual nesting as present in block structure and function scopes. The latter represent non-local semantic links between program parts as present in import statements between modules or Pascal's with statement. The following functions add to the scope graph a path from the current scope to another scope.

Add a path to a definition

/* Collector field */ void (Tree occ, set[IdRole] idRoles, PathRole pathRole) addPathToDef;

addPathToDef is typically used to create an import or extend path between program parts. occ is an occurence of a name that should be defined elsewhere in one of the given roles. The effect is to add a pathRole path between the current scope and the definition.

Here is an example taken from the language ModFun:

void collect(current: (ImportDecl) `import <ModId mid> ;`, Collector c){
     c.addPathToDef(mid, {moduleId()}, importPath());
}

Add a path to a qualified definition

/* Collector field */ void (list[str] ids, Tree occ, set[IdRole] idRoles, set[IdRole] qualifierRoles, PathRole pathRole) addPathToQualifiedDef;   

Similar to addPathToDef for the occurrence of a qualified names rather than a simple name.

Add a path to a type

/* Collector field */ void (Tree occ, PathRole pathRole) addPathToType

occ is a parse tree with has a certain type. The effect is to add a pathRole path between the current scope and the definition of that type.

A prime example is type checking of Pascal's with statement which opens the definition of a record type and makes all defined fields available in the body of the with statement. Here we create a withPath between the scope of the with statement and all definitions of the record types of the given record variables:

void collect(current: (WithStatement) `with <{RecordVariable ","}+ recordVars> do <Statement withStat>`, Collector c){
    c.enterScope(current);
        for(rv <- recordVars){
            c.addPathToType(rv, withPath());
        }
        collect(recordVars, withStat, c);
    c.leaveScope(current);
}

Define

The function define adds the definition of a name in the current scope:

/* Collector field */  void (str id, IdRole idRole, value def, DefInfo info) define;

where:

id is the textual appearance of the name.
idRole is the role played by the name.
def is the part of the parse tree that corresponds to the definition of the name.
info is the definition information to be associated with this definition. See DefInfo.

The function defineInScope adds the definition of a name in a given scope:

/* Collector field */  void (value scope, str id, IdRole idRole, value def, DefInfo info) defineInScope

Predefine

Most languages depend on predefined names such as true, false, exp, and the like. The function predefine adds the definition of such a predefined name in the current scope:

/* Collector field */  void (str id, IdRole idRole, value def, DefInfo info) predefine;

The function predefineInScope adds the definition of a predefined name in a given scope:

/* Collector field */  void (value scope, str id, IdRole idRole, value def, DefInfo info) predefineInScope

Use

A use is an occurence of a name that has to be associated with a definition of that name. We distinguish the use of unqualified names (the bare name itself) and qualified names (a name that contains some prefix/qualifier).

Use an unqualified name

There are three functions to describe the occurrence of a name in a parse tree as a use. The most elementary use of a name is described by:

/* Collector field */ void (Tree occ, set[IdRole] idRoles) use,

The parse tree occ is a use to be resolved in the current scope in one of the given roles idRoles. The use of a variable in an expression is typically modelled with this use function.

Here is an example from the Calc language:

void collect(current: (Exp) `<Id name>`, Collector c){
    c.use(name, {variableId()});
}

Use a qualified name

Next we consider the use of qualified names, i.e., a list of identifiers that will be resolved from left to right. We will call these identifiers (except the last one) qualifiers and the last one the qualified identifier.

/* Collector field */ void (list[str] ids, Tree occ, set[IdRole] idRoles, set[IdRole] qualifierRoles) useQualified;  

Here ids is the list of strings that form the qualified name, occ is the actual occurrence, and there are two sets of roles: idRoles are the possible roles for the qualified identifier itself and qualifierRoles are the possible roles for the qualifiers.

Use a name via another type

Many languages support named types and names that can be defined inside such a named type. Examples are field names in records or method names in classes. useViaType handles the use of names defined in a named type:

/* Collector field */ void (Tree container, Tree selector, set[IdRole] idRolesSel) useViaType

where

container: has a named type as type.
selector: is the name to be selected from that named type.
idRolesSel: are the IdRoles allowed for the selector.

Here is an example of field selection from a record in the Struct language:

void collect(current:(Expression)`<Expression lhs> . <Id fieldName>`, Collector c) {
    c.useViaType(lhs, fieldName, {fieldId()});     ❶  
    c.fact(current, fieldName);     ❷  
    collect(lhs, c);
}

❶ Determine the type of lhs, say T. Now look for a definition of fieldName (as fieldId) in the definition of T.
❷ The type of the whole expressions becomes the type of fieldId.

useViaType can be configured with TypePalConfig's getTypeNamesAndRole and getTypeInNamelessType that determine the precise mapping between a named or unnamed type and its fields.

UseLub

In some languages (think Rascal) local type inference and subtyping are needed to determine the type of variables: when no explicit definition is present, the type of these variables is inferred from their use and the least-upper-bound (LUB) of all the uses of a variable is taken as its type. The function useLub marks variable uses for which this regime has to be applied:

/* Collector field */ void (Tree occ, set[IdRole] idRoles) useLub

See the Rascal type checker for examples.

Inference

ATypes may contain type variables and new type variables can be created using newTypeVar:

/* Collector field */ AType (value src) newTypeVar;

Type variables can be bound via unification.

Here is an example of a call expression taken from the UntypedFun language:

void collect(current: (Expression) `<Expression exp1>(<Expression exp2>)`, Collector c) { 
     tau1 = c.newTypeVar(exp1); 
     tau2 = c.newTypeVar(exp2); 
  
     c.calculateEager("application", current, [exp1, exp2],
        AType (Solver s) { 
              s.requireUnify(functionType(tau1, tau2), exp1, error(exp1, "Function type expected, found %t", exp1));
              s.requireUnify(tau1, exp2, error(exp2, "Incorrect type actual parameter"));             
              return tau2;
            });
      collect(exp1, exp2, c);
}

calculate and require are only evaluated when all their dependencies are available and they are fully instantiated, i.e., do not contain type variables.

calculateEager and requireEager are also only evaluated when all their dependencies are available but those may contain type variables.

The bindings that are accumulated during calculateEager or requireEager are effectuated upon successfull completion of that calculateEager or requireEager.

Fact

The function fact registers known type information for a program fragment src:

/* Collector field */ void (Tree src, value atype) fact;

where atype can be either an AType or a Tree. In the latter case the type of that Tree is used when available.

Here are two examples from the Calc language:

void collect(current: (Exp) `<Integer integer>`, Collector c){
    c.fact(current, intType());     ❶  
}

void collect(current: (Exp) `( <Exp e> )`, Collector c){
    c.fact(current, e);            ❷  
    collect(e, c);
}

❶ Registers the fact that the current expression has type intType.
❷ Registers the fact that the current expression has the same type as the embedded expression e.

GetType

The function getType returns the type of a program fragment src (when available) and throws the exception TypeUnavailable otherwise:

/* Collector field */ void (Tree src) getType;

Calculate

A calculator computes the type of a subtree src by way of an AType-returning function calculator. A list of dependencies is given whose types have to be known before this calculator can be computed. There are two versions: for calculate all dependencies should be fully resolved and instantiated, while calculateEager can also handle dependencies that still contain type variables.

/* Collector field */ void (str name, Tree src, list[value] dependencies, AType(Solver s) calculator) calculate;
/* Collector field */ void (str name, Tree src, list[value] dependencies, AType(Solver s) calculator) calculateEager;

See A Calculator Language and Examples of Typecheckers for examples of calculator definitions.

See Inference for details about type variables.

Require

A requirement is a predicate regarding the type or properties of a source tree fragment src. There are two versions: for require all dependencies should be fully resolved and instantiated, while requireEager can also handle dependencies that still contain type variables.

/* Collector field */ void (str name, Tree src, list[value] dependencies, void(Solver s) pred) require;
/* Collector field */ void (str name, Tree src, list[value] dependencies, void(Solver s) pred) requireEager;

where

name is the name of the requirement (for reporting purposes).
dependencies is a list of dependencies whose types have to be known before this requirement can be computed.
pred is a function that actually checks the requirement; when it is violated this will be reported via its Solver argument.

More specific requiremens can be expressed for checking that two subtrees or types are equal, comparable, that the one is a subtype of the other, or that they can be unified:

/* Collector field */ void (value l, value r, FailMessage fmsg) requireEqual;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireComparable;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireSubType;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireUnify; 

The arguments l and r should either be an AType or a subtree whose type is known.

See A Calculator Language and Examples of Typecheckers for examples of requirement definitions. See Inference for details about type variables.

Synopsis​

Description​

LifeCycle of Collector​

Configuration​

Scoping​

Scope Info​

Nested Info​

Composition​

Reporting​

Add Path​

Add a path to a definition​

Add a path to a qualified definition​

Add a path to a type​

Define​

Predefine​

Use​

Use an unqualified name​

Use a qualified name​

Use a name via another type​

UseLub​

Inference​

Fact​

GetType​

Calculate​

Require​

Synopsis

Description

LifeCycle of Collector

Configuration

Scoping

Scope Info

Nested Info

Composition

Reporting

Add Path

Add a path to a definition

Add a path to a qualified definition

Add a path to a type

Define

Predefine

Use

Use an unqualified name

Use a qualified name

Use a name via another type

UseLub

Inference

Fact

GetType

Calculate

Require