Collector
Synopsis
A Collector
collects constraints from source code and produces an initial TModel
.
Description
A Collector
is a statefull object that provides all the functions described below to access and change its internal state. The global services provided by a Collector
are:
- Register facts, calculators, and requirements as collected from the source program.
- Maintain a global (key,value) store to store global information relevant for the collection process. Typical examples are: Configuration information. The files that have been imported.
- Manage scopes.
- Maintain a single value per scope. This enables decoupling the collection of information from separate but related language constructs. Typical examples are: While collecting information from a function declaration: create a new function scope and associate the required return type with it so that return statements in the function body can check that (a) they occur inside a function; (b) that the type of their returned value is compatible with the required return type. While collecting information from an optionally labelled loop statement: create a new loop scope and associate the label with it so that break/continue statements can check that: (a) they occur inside a loop statement; (b) which loop statement they should (dis)continue.
- Reporting.
The functions provided by a Collector
are summarized below:
Three dozen functions are available that fall into the following categories:
- Lifecycle of Collector: create a new
Collector
, use it tocollect
information from a source program, and produce a TModel. - Configuration: retrieve or modify configuration settings.
- Scoping: enter or leave a scope, ask for current scope.
- Scope Info: add information to the current scope or retrieve that information.
- Nested Info: maintain nested information during collection; this is only available during collection.
- Composition: add another TModel to the current one.
- Reporting: report errors, warnings and info messages.
- Add Path: add paths to the scope graph.
- Define: define identifiers in various ways.
- Use: use identifiers in various ways.
- Inference: create new type variables for type inference.
- Facts: establish or retrieve facts.
- Calculate: define calculators.
- Require: define requirements.
Technically, Collector
is a datatype with a single constructur and a number of functions as fields. For instance, given a Collector
named c
, calling the define
function amounts to: c.define(_the-arguments-of-define_)
. All Collector functions are prefixed with /* Collector field */
to emphasize that they are a field of the Collector datatype.
LifeCycle of Collector
A new Collector
is created using the function newCollector
.
Collector newCollector(str modelName, Tree pt, TypePalConfig config = tconfig());
where
modelName
is the name of the TModel to be created (used for logging);pt
is the parse tree of the source program to be type checked;config
is a Configuration.
Once a Collector has been created, the user-defined collect
function is invoked with the current parse tree of a source program and the Collector as arguments. The collect
function is applied recursively until all information has been collected from the source program.
Finally, run
creates the desired TModel
that will be used by the Solver:
/* Collector field */ TModel () run;
A typical scenario is (for a given a parse tree pt
of the program to be checked):
c = newCollector("my_model", pt); // create Collector, default TypePalConfig
collect(pt, c); // collect constraints
model = c.run(); // create initial TModel to be handled by the Solver
The collect
function has to be supplied by the author of the type checker and looks like this:
void collect(LanguageConstruct lc, Collector c){ ... }
where:
lc
is a syntactic type from the language under consideration.c
is aCollector
.
::: Caution
Each collect
function is responsible for collecting constraints from its subtrees.
:::
Configuration
The Configuration can be retrieved or adjusted by the following two functions:
/* Collector field */ TypePalConfig () getConfig;
/* Collector field */ void (TypePalConfig cfg) setConfig;
The former returns the current TypePal configuration, the latter sets the current configuration to a new configuration.
Scoping
Scope management amounts to entering a new scope, leave the current scope and retrieving the current scope:
/* Collector field */ void (Tree inner) enterScope;
/* Collector field */ void (list[Tree] trees) enterCompositeScope;
/* Collector field */ void (Tree inner) enterLubScope;
/* Collector field */ void (list[Tree] trees) enterCompositeLubScope;
/* Collector field */ void (Tree inner) leaveScope;
/* Collector field */ void (list[Tree] trees) leaveCompositeScope;
/* Collector field */ loc () getScope,
In order to check consistency, leaveScope
has the inner scope that it is supposed to be leaving as argument.
Here is a simple example how the let
expression in the Fun language handles subscopes:
void collect(current: (Expression) `let <Id name> : <Type tp> = <Expression exp1> in <Expression exp2> end`, Collector c) {
c.enterScope(current);
c.define("<name>", variableId(), name, defType(tp));
c.calculate("let", current, [exp2], AType(Solver s) { return s.getType(exp2); } );
collect(tp, exp1, exp2, c);
c.leaveScope(current);
}
A composite scope is a scope that corresponds with nonconsecutive parts of the source code and is represented by a list of Trees.
A lub scope is a scope in which the type of not explcitly declared variables is determined by the least-upper bound (LUB) of the types of their uses. Lub scopes are, for instance, used to implement local type inference in Rascal.
A composite lub scope is a scope that corresponds with nonconsecutive parts of the source code and locally infers the types in those scopes.
Scope Info
It is possible to associate auxiliary information with each scope. This enables the downward propagation of information during the topdown traversal of the source program by collect
. Typical use cases are:
- recording the return type of a function declaration and checking that all return statements in the body of that function have a type that is compatible with the declared return type.
- recording the label of a loop statement for the benefit of nested break or continue statements.
Scopes are identified by their source location and ScopeRole
: a user-defined data type that distinsguishes possible
roles such as functionScope()
or labelScope()
.
setScopeInfo
sets the information for a scope:
/* Collector field */ void (loc scope, ScopeRole scopeRole, value info) setScopeInfo;
where
scope
is the scope for which information is to be set.scopeRole
is the role of that scope.info
is the associated information.
getScopeInfo
retrieves associated scope information:
/* Collector field */ lrel[loc scope, value scopeInfo] (ScopeRole scopeRole) getScopeInfo;
where
scopeRole
is the role of the scope we are looking for.
getScopeInfo
returns a list relation containing scope/scopeInfo pairs (ordered from innermost to outermost scope).
Let's illustrate ScopeInfo with a stripped down version of how the Rascal type checker handles while
and break
statements:
data LoopInfo = loopInfo(str name); ❶
void collect(current: (Statement) `<Label label> while( <{Expression ","}+ conditions> ) <Statement body>`, Collector c){
c.enterScope(current);
...
loopName = "<label.name>";
c.setScopeInfo(c.getScope(), loopScope(), loopInfo(loopName)); ❷
...
c.leaveScope(current);
}
void collect(current:(Statement) `break <Target target>;`, Collector c){
...
loopName = "<target.name>"
for(<scope, scopeInfo> <- c.getScopeInfo(loopScope())){ ❸
if(loopInfo(loopName1) := scopeInfo){
if(loopName == "" || loopName == loopName1){
collect(target, c);
return;
}
} else {
throw rascalCheckerInternalError(getLoc(current), "Inconsistent info from loop scope: <scopeInfo>");
}
}
c.report(error(current, "Break outside a while/do/for statement"));
}
- ❶ Introduces a data type to represent loop information.
- ❷ When handling a while statement, the current scope is marked as
loopScope
andloopInfo
is associated with it. - ❸ When handling a
break
statement, we get all available ScopeInfo for loopScopes (innermost first) and check the associated loopInfo.
Nested Info
An arbitrary number of push down stacks can be maintained during the topdown traversal of the source code that is being type checked. A use case is recording that a certain syntax type is encountered and make children aware of this, e.g. "we are inside a parameter list".
Each stack has a string name (key
) and is created on demand.
/* Collector field */ void (str key, value val) push
/* Collector field */ value (str key) pop,
/* Collector field */ value (str key) top,
/* Collector field */ list[value] (str key) getStack,
/* Collector field */ void (str key) clearStack,
push
, pop
, and top
perform standard stack operations. push
creates a stack when needed, while top
and pop
require the existence of the named stack. getStack
returns all values in the named stack, while clearStack
resets it to empty.
These stacks are stored in the global store, see Solver.
With the availability of a global store come serious responsibilities. Don't overuse it, do not misuse it.
Composition
TModels can be composed by adding the information from one TModel to the other. A use case is module composition.
/* Collector field */ void (TModel tm) addTModel;
addTModel
adds the information in tm
to the current Collector.
Reporting
One or more reports can be added by report
and reports
:
/* Collector field */ void (FailMessage fmsg) report;
/* Collector field */ void (list[FailMessage] fmgs) reports;
See Reporting for a description of FailMessage
.
If one of the messages is error
the execution of the current calculator or requirement is immediately terminated.
Add Path
TypePal is based on nested scopes and path between scopes. The former represent textual nesting as present in block structure and function scopes. The latter represent non-local semantic links between program parts as present in import statements between modules or Pascal's with statement. The following functions add to the scope graph a path from the current scope to another scope.
Add a path to a definition
/* Collector field */ void (Tree occ, set[IdRole] idRoles, PathRole pathRole) addPathToDef;
addPathToDef
is typically used to create an import or extend path between program parts. occ
is an occurence of a name that should be defined elsewhere in one of the given roles. The effect is to add a pathRole
path between the current scope and the definition.
Here is an example taken from the language ModFun:
void collect(current: (ImportDecl) `import <ModId mid> ;`, Collector c){
c.addPathToDef(mid, {moduleId()}, importPath());
}
Add a path to a qualified definition
/* Collector field */ void (list[str] ids, Tree occ, set[IdRole] idRoles, set[IdRole] qualifierRoles, PathRole pathRole) addPathToQualifiedDef;
Similar to addPathToDef
for the occurrence of a qualified names rather than a simple name.
Add a path to a type
/* Collector field */ void (Tree occ, PathRole pathRole) addPathToType
occ
is a parse tree with has a certain type. The effect is to add a pathRole
path between the current scope and the definition of that type.
A prime example is type checking of Pascal's with
statement which opens the definition of a record type and makes all defined fields available in the body of the with
statement. Here we create a withPath
between the scope of the with statement and all definitions of the record types of the given record variables:
void collect(current: (WithStatement) `with <{RecordVariable ","}+ recordVars> do <Statement withStat>`, Collector c){
c.enterScope(current);
for(rv <- recordVars){
c.addPathToType(rv, withPath());
}
collect(recordVars, withStat, c);
c.leaveScope(current);
}
Define
The function define
adds the definition of a name in the current scope:
/* Collector field */ void (str id, IdRole idRole, value def, DefInfo info) define;
where:
id
is the textual appearance of the name.idRole
is the role played by the name.def
is the part of the parse tree that corresponds to the definition of the name.info
is the definition information to be associated with this definition. See DefInfo.
The function defineInScope
adds the definition of a name in a given scope:
/* Collector field */ void (value scope, str id, IdRole idRole, value def, DefInfo info) defineInScope
Predefine
Most languages depend on predefined names such as true
, false
, exp
, and the like. The function predefine
adds the definition of such a predefined name in the current scope:
/* Collector field */ void (str id, IdRole idRole, value def, DefInfo info) predefine;
The function predefineInScope
adds the definition of a predefined name in a given scope:
/* Collector field */ void (value scope, str id, IdRole idRole, value def, DefInfo info) predefineInScope
Use
A use is an occurence of a name that has to be associated with a definition of that name. We distinguish the use of unqualified names (the bare name itself) and qualified names (a name that contains some prefix/qualifier).
Use an unqualified name
There are three functions to describe the occurrence of a name in a parse tree as a use. The most elementary use of a name is described by:
/* Collector field */ void (Tree occ, set[IdRole] idRoles) use,
The parse tree occ
is a use to be resolved in the current scope in one of the given roles idRoles
. The use of a variable in an expression is typically modelled with this use function.
Here is an example from the Calc language:
void collect(current: (Exp) `<Id name>`, Collector c){
c.use(name, {variableId()});
}
Use a qualified name
Next we consider the use of qualified names, i.e., a list of identifiers that will be resolved from left to right. We will call these identifiers (except the last one) qualifiers and the last one the qualified identifier.
/* Collector field */ void (list[str] ids, Tree occ, set[IdRole] idRoles, set[IdRole] qualifierRoles) useQualified;
Here ids
is the list of strings that form the qualified name, occ
is the actual occurrence, and there are two sets of roles:
idRoles
are the possible roles for the qualified identifier itself and qualifierRoles
are the possible roles for the qualifiers.
Use a name via another type
Many languages support named types and names that can be defined inside such a named type. Examples are field names in records or method names in classes. useViaType
handles the use of names defined in a named type:
/* Collector field */ void (Tree container, Tree selector, set[IdRole] idRolesSel) useViaType
where
container
: has a named type as type.selector
: is the name to be selected from that named type.idRolesSel
: are the IdRoles allowed for the selector.
Here is an example of field selection from a record in the Struct language:
void collect(current:(Expression)`<Expression lhs> . <Id fieldName>`, Collector c) {
c.useViaType(lhs, fieldName, {fieldId()}); ❶
c.fact(current, fieldName); ❷
collect(lhs, c);
}
- ❶ Determine the type of
lhs
, say T. Now look for a definition offieldName
(asfieldId
) in the definition of T. - ❷ The type of the whole expressions becomes the type of
fieldId
.
useViaType
can be configured with TypePalConfig's getTypeNamesAndRole
and getTypeInNamelessType
that
determine the precise mapping between a named or unnamed type and its fields.
UseLub
In some languages (think Rascal) local type inference and subtyping are needed to determine the type of variables: when no explicit definition is present, the type of these variables is inferred from their use and the least-upper-bound (LUB) of all the uses of a variable is taken as its type. The function useLub
marks variable uses for which this regime has to be applied:
/* Collector field */ void (Tree occ, set[IdRole] idRoles) useLub
See the Rascal type checker for examples.
Inference
ATypes may contain type variables and new type variables can be created using newTypeVar
:
/* Collector field */ AType (value src) newTypeVar;
Type variables can be bound via unification.
Here is an example of a call expression taken from the UntypedFun language:
void collect(current: (Expression) `<Expression exp1>(<Expression exp2>)`, Collector c) {
tau1 = c.newTypeVar(exp1);
tau2 = c.newTypeVar(exp2);
c.calculateEager("application", current, [exp1, exp2],
AType (Solver s) {
s.requireUnify(functionType(tau1, tau2), exp1, error(exp1, "Function type expected, found %t", exp1));
s.requireUnify(tau1, exp2, error(exp2, "Incorrect type actual parameter"));
return tau2;
});
collect(exp1, exp2, c);
}
calculate
and require
are only evaluated when all their dependencies are available and they are fully instantiated, i.e., do not contain type variables.
calculateEager
and requireEager
are also only evaluated when all their dependencies are available but those may contain type variables.
The bindings that are accumulated during calculateEager
or requireEager
are effectuated upon successfull completion of that calculateEager
or requireEager
.
Fact
The function fact
registers known type information for a program fragment src
:
/* Collector field */ void (Tree src, value atype) fact;
where atype
can be either an AType
or a Tree
. In the latter case the type of that Tree is used when available.
Here are two examples from the Calc language:
void collect(current: (Exp) `<Integer integer>`, Collector c){
c.fact(current, intType()); ❶
}
void collect(current: (Exp) `( <Exp e> )`, Collector c){
c.fact(current, e); ❷
collect(e, c);
}
- ❶ Registers the fact that the current expression has type
intType
. - ❷ Registers the fact that the current expression has the same type as the embedded expression
e
.
GetType
The function getType
returns the type of a program fragment src
(when available) and throws the exception TypeUnavailable
otherwise:
/* Collector field */ void (Tree src) getType;
Calculate
A calculator computes the type of a subtree src
by way of an AType-returning function calculator
. A list of dependencies is given whose types have to be known before this calculator can be computed. There are two versions: for calculate
all dependencies should be fully resolved and instantiated, while calculateEager
can also handle dependencies that still contain type variables.
/* Collector field */ void (str name, Tree src, list[value] dependencies, AType(Solver s) calculator) calculate;
/* Collector field */ void (str name, Tree src, list[value] dependencies, AType(Solver s) calculator) calculateEager;
See A Calculator Language and Examples of Typecheckers for examples of calculator definitions.
See Inference for details about type variables.
Require
A requirement is a predicate regarding the type or properties of a source tree fragment src
. There are two versions: for require
all dependencies should be fully resolved and instantiated, while requireEager
can also handle dependencies that still contain type variables.
/* Collector field */ void (str name, Tree src, list[value] dependencies, void(Solver s) pred) require;
/* Collector field */ void (str name, Tree src, list[value] dependencies, void(Solver s) pred) requireEager;
where
name
is the name of the requirement (for reporting purposes).dependencies
is a list of dependencies whose types have to be known before this requirement can be computed.pred
is a function that actually checks the requirement; when it is violated this will be reported via its Solver argument.
More specific requiremens can be expressed for checking that two subtrees or types are equal, comparable, that the one is a subtype of the other, or that they can be unified:
/* Collector field */ void (value l, value r, FailMessage fmsg) requireEqual;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireComparable;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireSubType;
/* Collector field */ void (value l, value r, FailMessage fmsg) requireUnify;
The arguments l
and r
should either be an AType or a subtree whose type is known.
See A Calculator Language and Examples of Typecheckers for examples of requirement definitions. See Inference for details about type variables.