module analysis::m3::AST
A symbolic representation for abstract syntax trees of programming languages.
Usage
import analysis::m3::AST;
Dependencies
import Message;
import Node;
import IO;
import Set;
import util::Monitor;
import analysis::m3::TypeSymbol;
Description
We provide a general set of data types for the syntactic constructs of programming languages: Expression
, Statement
, Declaration
and Type
.
Also, very common syntactic constructs are added to this, such as if
, while
, etc.
The idea is that parsers for different languages will map to common abstract syntax elements, when this can be done meaningfully. If not, then these front-ends will extend the existing types with new constructor definitions, or even new kinds of types will be added. The shared representation limits the element of surprise when working with different languages, and perhaps may make some downstream analyses reusable.
The concept of a source location is important for abstract syntax trees. The annotation src
will always point to value of type loc
, pointing to the physical location of the construct in the source code.
The concept of declaration is also relevant. A decl
annotation points from a use of a concept to its definition, but always via an indirection (i.e. fully qualified name). The decl
annotation is also of type loc
, where each location is a fully qualified name of the definition that is used.
Finally, the concept of a type is relevant for ASTs. In particular an Expression
may have a typ
annotation, or a variable declaration, etc.
Benefits
- Symbolic abstract syntax trees can be analyzed and transformed easily using Rascal primitives such as patterns, comprehensions and visit.
- By re-using recognizable names for different programming languages, it's easier to switch between languages to analyze.
- Some algorithms made be reusable on different programming languages, but please be aware of the pitfalls.
Pitfalls
- Even though different languages may map to the same syntactic construct, this does not mean that the semantics is the same. Downstream metrics or other analysis tools should still take semantic differences between programming languages into account.
data \AST
For metric purposes we can use a true AST declaration tree, a simple list of lines for generic metrics, or the reason why we do not have an AST.
data \AST (loc file = |unknown:///|)
= declaration(Declaration declaration)
| lines(list[str] contents)
| noAST(Message msg)
;
data Declaration
Uniform name for everything that is declared in programming languages: variables, functions, classes, etc.
data Declaration (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = unresolved()
)
Instances of the Declaration type represent the syntax of declarations in programming languages.
field name | description |
---|---|
src | the exact source location of the declaration in a source file |
decl | the resolved fully qualified name of the artefact that is being declared here |
typ | a symbolic representation of the static type of the declared artefact here (not the syntax of the type) |
data Statement
Uniform name for everything that is typically a statement programming languages: assignment, loops, conditionals, jumps.
data Statement (
loc src = |unknown:///|,
loc decl = |unresolved:///|
)
Instances of the Statement type represent the syntax of statements in programming languages.
field name | description |
---|---|
src | the exact source location of the statement in a source file |
decl | if the statement directly represent a usage of a declared artefact, then this points to the fully qualified name of the used artifact. |
data Expression
Uniform name for everything that is an expression in programming languages: arithmetic, comparisons, function invocations, ...
data Expression (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = \unresolved()
)
Instances of the Expression type represent the syntax of expressions in programming languages.
field name | description |
---|---|
src | the exact source location of the expression in a source file |
decl | if this expression represents a usage, decl is the resolved fully qualified name of the artefact that is being used here |
typ | a symbolic representation of the static type of the result of the expression |
data Type
Uniform name for everything that is an type in programming languages syntax: int
, void
, List<Expression>
.
data Type (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = \unresolved()
)
Instances of the Type type represent the syntax of types in programming languages.
field name | description |
---|---|
src | the exact source location of the expression in a source file |
decl | the fully qualified name of the type, if resolved and if well-defined |
typ | a symbolic representation of the static type that is the meaning of this type expression |
data Modifier
Uniform name for everything that is a modifier in programming languages syntax: public, static, final, etc.
data Modifier (
loc src = |unknown:///|
)
Instances of the Modifer type represent the syntax of modifiers in programming languages.
field name | description |
---|---|
src | the exact source location of the expression in a source file |
function astNodeSpecification
Test for the consistency characteristics of an M3 annotated abstract syntax tree.
bool astNodeSpecification(node n, str language = "java", bool checkNameResolution=false, bool checkSourceLocation=true)
function astNodeSpecification
Check the AST node specification on a (large) set of ASTs and monitor the progress.
bool astNodeSpecification(set[node] toCheck, str language = "java", bool checkNameResolution=false, bool checkSourceLocation=true)