Skip to main content

module analysis::m3::AST

rascal-0.40.16

A symbolic representation for abstract syntax trees of programming languages.

Usage

import analysis::m3::AST;

Dependencies

import Message;
import Node;
import IO;
import Set;
import util::Monitor;
import analysis::m3::TypeSymbol;

Description

We provide a general set of data types for the syntactic constructs of programming languages: Expression, Statement, Declaration and Type. Also, very common syntactic constructs are added to this, such as if, while, etc.

The idea is that parsers for different languages will map to common abstract syntax elements, when this can be done meaningfully. If not, then these front-ends will extend the existing types with new constructor definitions, or even new kinds of types will be added. The shared representation limits the element of surprise when working with different languages, and perhaps may make some downstream analyses reusable.

The concept of a source location is important for abstract syntax trees. The annotation src will always point to value of type loc, pointing to the physical location of the construct in the source code.

The concept of declaration is also relevant. A decl annotation points from a use of a concept to its definition, but always via an indirection (i.e. fully qualified name). The decl annotation is also of type loc, where each location is a fully qualified name of the definition that is used.

Finally, the concept of a type is relevant for ASTs. In particular an Expression may have a typ annotation, or a variable declaration, etc.

Benefits

  • Symbolic abstract syntax trees can be analyzed and transformed easily using Rascal primitives such as patterns, comprehensions and visit.
  • By re-using recognizable names for different programming languages, it's easier to switch between languages to analyze.
  • Some algorithms made be reusable on different programming languages, but please be aware of the pitfalls.

Pitfalls

  • Even though different languages may map to the same syntactic construct, this does not mean that the semantics is the same. Downstream metrics or other analysis tools should still take semantic differences between programming languages into account.

data \AST

For metric purposes we can use a true AST declaration tree, a simple list of lines for generic metrics, or the reason why we do not have an AST.

data \AST (loc file = |unknown:///|) 
= declaration(Declaration declaration)
| lines(list[str] contents)
| noAST(Message msg)
;

data Declaration

Uniform name for everything that is declared in programming languages: variables, functions, classes, etc.

data Declaration (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = unresolved()
)

Instances of the Declaration type represent the syntax of declarations in programming languages.

field namedescription
srcthe exact source location of the declaration in a source file
declthe resolved fully qualified name of the artefact that is being declared here
typa symbolic representation of the static type of the declared artefact here (not the syntax of the type)

data Statement

Uniform name for everything that is typically a statement programming languages: assignment, loops, conditionals, jumps.

data Statement (
loc src = |unknown:///|,
loc decl = |unresolved:///|
)

Instances of the Statement type represent the syntax of statements in programming languages.

field namedescription
srcthe exact source location of the statement in a source file
declif the statement directly represent a usage of a declared artefact, then this points to the fully qualified name of the used artifact.

data Expression

Uniform name for everything that is an expression in programming languages: arithmetic, comparisons, function invocations, ...

data Expression (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = \unresolved()
)

Instances of the Expression type represent the syntax of expressions in programming languages.

field namedescription
srcthe exact source location of the expression in a source file
declif this expression represents a usage, decl is the resolved fully qualified name of the artefact that is being used here
typa symbolic representation of the static type of the result of the expression

data Type

Uniform name for everything that is an type in programming languages syntax: int, void, List<Expression>.

data Type (
loc src = |unknown:///|,
loc decl = |unresolved:///|,
TypeSymbol typ = \unresolved()
)

Instances of the Type type represent the syntax of types in programming languages.

field namedescription
srcthe exact source location of the expression in a source file
declthe fully qualified name of the type, if resolved and if well-defined
typa symbolic representation of the static type that is the meaning of this type expression

data Modifier

Uniform name for everything that is a modifier in programming languages syntax: public, static, final, etc.

data Modifier (
loc src = |unknown:///|
)

Instances of the Modifer type represent the syntax of modifiers in programming languages.

field namedescription
srcthe exact source location of the expression in a source file

function astNodeSpecification

Test for the consistency characteristics of an M3 annotated abstract syntax tree.

bool astNodeSpecification(node n, str language = "java", bool checkNameResolution=false, bool checkSourceLocation=true)

function astNodeSpecification

Check the AST node specification on a (large) set of ASTs and monitor the progress.

bool astNodeSpecification(set[node] toCheck, str language = "java", bool checkNameResolution=false, bool checkSourceLocation=true)