Skip to main content

module lang::java::m3::AST

rascal-0.40.16

AST node declarations for Java.

Usage

import lang::java::m3::AST;

Dependencies

extend analysis::m3::AST;
extend analysis::m3::Core;
extend lang::java::m3::TypeSymbol;
import util::FileSystem;
import util::Reflective;
import IO;
import String;
import List;

Description

It helps to start reading in AST to find out what we use to model abstract syntax trees in Rascal, namely algebraic data types with specific properties.

The "M3" label stands for a standardized set of names of types and their fields that are used similarly for different programming languages.

This M3 AST model of Java features:

  • For Java the model below contains Declarations, Statements, Expressions, Types and Modifiers. The abstract grammar below describes an over approximation of the abstract syntax of Java. This means that you could construct more kinds of syntax trees programmatically than there are stictly exist Java sentences. It also means that every Java program in existence can be mapped to this simplified tree format, for downstream analysis.
  • Java 1 to 13 support
  • Name analysis, where every definition of a name and every use of a name are annotated with a fully qualified logical source location, e.g. decl=|java+interface:///java/util/List<T>
  • Type analysis, where every definition of a type and every expression that produces a type is annotated with typ=TypeSymbol
  • Annotations, all available in the syntax tree.

For a more global overview, a database, of what is declarared in Java and what related to what, see the Core model. There you will also find fact extractors from bytecode and jar files with .class files in them.

Benefits

  • Every AST modelled using M3-style is usually recognizable, even if you are an expert in a different language;
  • HiFi: This Java AST format is complete and completely informative about Java. For every language construct in existence there is a node in the tree. Also every node has a src attribute to point at the exact location in the source file where every node originated.
  • You can use handy pattern matching primitives like constructor matching, list matching and deep matching for fast analysis.
  • src and decl fields on AST nodes correspond to the M3 Core model's declarations and uses relations, and others. Combining AST analysis with lookups in an M3 core model is usually very handy.
  • One AST format for all kinds of Java versions.

Pitfalls

  • Confusing the AST type for Type syntax with the symbolic representation of types in M3: TypeSymbol.
  • Writing algorithms that "should" work for any programming language: don't do it. Although Rascal M3 ASTs are a uniform format for abstract Syntax trees, they are not a unified abstract syntax tree formalism. In other words an \if statement could have a different semantic in one language than in another. Frequently this is the case. AST nodes have the same name (between different programming languages) if they have the same general intention, but their semantics is typically different.
  • Abstracting from abstract syntax. (Abstract) Syntax is the bread and butter of (static) code analysis algorithms. If you introduce functional or object-oriented abstraction layers to hide this intrinsic complexity, the entire algorithm becomes harder to understand and harder to maintain.
    • It's almost always best to repeat syntactic constructs in patterns for pattern matching, and to repeat cases several times in different contexts, than to introduce ``reusable'' boolean predicates yourself.
    • Such reuse is typically accidentally possible and not intrinsic to the language or the algorithm. Rascal will also help with maintenance if the constructors change over time, by providing warnings and errors.
    • If find yourself writing many case distinctions over and over again, it's time to consider using or introducing a new intermediate language like TypeSymbol.
  • AST instances for older version of Java may contain empty list nodes in locations where a feature was added later (say type parameters of generics). Analysis algorithms must ignore those values, and probably should know which version they are analysing for. Example:
    • Before Java 6 there were no generics and List with an empty list of non-existent type parameters just means the list type.
    • After Java 6 there were generics and now List with an empty list of type-parameters means the "raw type" for List.
    • Conclusion: Type compatibility rules are subtly different, while the abstract syntax for both instances is the same.
    • Just like between programming languages, between programming language versions: just because two things look the same, does not mean they mean the same thing.

data Language

Datatype to configure the Java Language Standard compliance level of the parser.

data Language  
= \java(int level = 13, str version="13", bool preview=true)
;

This is the Language data-type of core M3 that we use to document the language level, as well as configure the JDK compiler before extracting the relevant facts.

function JLS1

Language JLS1()

function JLS2

Language JLS2()

function JLS3

Language JLS3()

function JLS4

Language JLS4()

function JLS5

Language JLS5()

function JLS6

Language JLS6()

function JLS7

Language JLS7()

function JLS8

Language JLS8()

function JLS9

Language JLS9()

function JLS10

Language JLS10()

function JLS11

Language JLS11()

function JLS12

Language JLS12()

function JLS13

Language JLS13()

data Declaration

All kind of declarations in Java.

data Declaration  
= \compilationUnit(list[Declaration] imports, list[Declaration] types)
| \compilationUnit(Declaration package, list[Declaration] imports, list[Declaration] types)
| \compilationUnit(Declaration \module)
| \enum(list[Modifier] modifiers, Expression name, list[Type] implements, list[Declaration] constants, list[Declaration] body)
| \enumConstant(list[Modifier] modifiers, Expression name, list[Expression] arguments, Declaration class)
| \enumConstant(list[Modifier] modifiers, Expression name, list[Expression] arguments)
| \class(list[Modifier] modifiers, Expression name, list[Declaration] typeParameters, list[Type] extends, list[Type] implements, list[Declaration] body)
| \class(list[Declaration] body)
| \interface(list[Modifier] modifiers, Expression name, list[Declaration] typeParameters, list[Type] extends, list[Type] implements, list[Declaration] body)
| \field(list[Modifier] modifiers, Type \type, list[Declaration] fragments)
| \initializer(list[Modifier] modifiers, Statement initializerBody)
| \method(list[Modifier] modifiers, list[Declaration] typeParameters, Type \return, Expression name, list[Declaration] parameters, list[Expression] exceptions, Statement impl)
| \method(list[Modifier] modifiers, list[Declaration] typeParameters, Type \return, Expression name, list[Declaration] parameters, list[Expression] exceptions)
| \constructor(list[Modifier] modifiers, Expression name, list[Declaration] parameters, list[Expression] exceptions, Statement impl)
| \import(list[Modifier] modifiers, Expression name)
| \importOnDemand(list[Modifier] modifiers, Expression name)
| \package(list[Modifier] modifiers, Expression name)
| \variables(list[Modifier] modifiers, Type \type, list[Declaration] \fragments)
| \variable(Expression name, list[Declaration] dimensionTypes)
| \variable(Expression name, list[Declaration] dimensionTypes, Expression \initializer)
| \typeParameter(Expression name, list[Type] extendsList)
| \annotationType(list[Modifier] modifiers, Expression name, list[Declaration] body)
| \annotationTypeMember(list[Modifier] modifiers, Type \type, Expression name)
| \annotationTypeMember(list[Modifier] modifiers, Type \type, Expression name, Expression defaultBlock)
| \parameter(list[Modifier] modifiers, Type \type, Expression name, list[Declaration] dimensions)
| \dimension(list[Modifier] annotations)
| \vararg(list[Modifier] modifiers, Type \type, Expression name)
;

data Declaration

These declarations types are related to the Java 9 module system.

data Declaration  
= \module(list[Modifier] open, Expression \moduleName, list[Declaration] directives)
| \opensPackage(Expression packageName, list[Expression] openedToModules)
| \providesImplementations(Expression interface, list[Expression] implementations)
| \requires(list[Modifier] mods, Expression \moduleName)
| \uses(Expression interface)
| \exports(Expression interface, list[Expression] to)
;

data Expression

Java Expressions all have a typ.

data Expression (TypeSymbol typ=\unresolved()) 
= \arrayAccess(Expression array, Expression index)
| \newArray(Type \type, list[Expression] dimensions, Expression init)
| \newArray(Type \type, list[Expression] dimensions)
| \arrayInitializer(list[Expression] elements)
| \assignment(Expression lhs, str operator, Expression rhs)
| \cast(Type \type, Expression expression)
| \characterLiteral(str charValue)
| \newObject(Expression expr, Type \type, list[Declaration] typeParameters, list[Expression] args, Declaration class)
| \newObject(Expression expr, Type \type, list[Declaration] typeParameters, list[Expression] args)
| \newObject(Type \type, list[Declaration] typeParameters, list[Expression] args, Declaration class)
| \newObject(Type \type, list[Declaration] typeParameters, list[Expression] args)
| \qualifiedName(list[Expression] identifiers)
| \conditional(Expression expression, Expression thenBranch, Expression elseBranch)
| \fieldAccess(Expression name)
| \fieldAccess(Expression qualifier, Expression name)
| \superFieldAccess(Expression expression, Expression name)
| \instanceof(Expression leftSide, Type rightSide)
| \methodCall(list[Type] typeArguments, Expression name, list[Expression] arguments)
| \methodCall(Expression receiver, list[Type] typeArguments, Expression name, list[Expression] arguments)
| \superMethodCall(list[Type] typeArguments, Expression name, list[Expression] arguments)
| \superMethodCall(Expression qualifier, list[Type] typeArguments, Expression name, list[Expression] arguments)
| \null()
| \number(str numberValue)
| \booleanLiteral(str boolValue)
| \stringLiteral(str stringValue, str literal=stringValue)
| \textBlock(str stringValue, str literal=stringValue)
| \type(Type \type)
| \bracket(Expression expression)
| \this()
| \this(Expression qualifier)
| \super()
| \declarationExpression(Declaration declaration)
| \times(Expression lhs, Expression rhs)
| \divide(Expression lhs, Expression rhs)
| \remainder(Expression lhs, Expression rhs)
| \plus(Expression lhs, Expression rhs)
| \minus(Expression lhs, Expression rhs)
| \leftShift(Expression lhs, Expression rhs)
| \rightShift(Expression lhs, Expression rhs)
| \rightShiftSigned(Expression lhs, Expression rhs)
| \less(Expression lhs, Expression rhs)
| \greater(Expression lhs, Expression rhs)
| \lessEquals(Expression lhs, Expression rhs)
| \greaterEquals(Expression lhs, Expression rhs)
| \equals(Expression lhs, Expression rhs)
| \notEquals(Expression lhs, Expression rhs)
| \xor(Expression lhs, Expression rhs)
| \or(Expression lhs, Expression rhs)
| \and(Expression lhs, Expression rhs)
| \conditionalOr(Expression lhs, Expression rhs)
| \conditionalAnd(Expression lhs, Expression rhs)
| \postIncrement(Expression operand)
| \postDecrement(Expression operand)
| \preIncrement(Expression operand)
| \preDecrement(Expression operand)
| \prePlus(Expression operand)
| \preMinus(Expression operand)
| \preComplement(Expression operand)
| \preNot(Expression operand)
| \id(str identifier)
| \switch(Expression expression, list[Statement] cases)
| \methodReference(Type \type, list[Type] typeArguments, Expression name)
| \methodReference(Expression expression, list[Type] typeArguments, Expression name)
| \creationReference(Type \type, list[Type] typeArguments)
| \superMethodReference(list[Type] typeArguments, Expression name)
| \lambda(list[Declaration] parameters, Statement block)
| \lambda(list[Declaration] parameters, Expression body)
| \memberValuePair(Expression name, Expression \value)
;

data Statement

These are the Statement types of Java.

data Statement  
= \assert(Expression expression)
| \assert(Expression expression, Expression message)
| \block(list[Statement] statements)
| \break()
| \break(Expression label)
| \continue()
| \continue(Expression label)
| \do(Statement body, Expression condition)
| \empty()
| \foreach(Declaration parameter, Expression collection, Statement body)
| \for(list[Expression] initializers, Expression condition, list[Expression] updaters, Statement body)
| \for(list[Expression] initializers, list[Expression] updaters, Statement body)
| \if(Expression condition, Statement thenBranch)
| \if(Expression condition, Statement thenBranch, Statement elseBranch)
| \label(str identifier, Statement body)
| \return(Expression expression)
| \return()
| \switch(Expression expression, list[Statement] statements)
| \case(list[Expression] expressions)
| \caseRule(list[Expression] expressions)
| \defaultCase()
| \synchronizedStatement(Expression lock, Statement body)
| \throw(Expression expression)
| \try(Statement body, list[Statement] catchClauses)
| \try(Statement body, list[Statement] catchClauses, Statement \finally)
| \catch(Declaration exception, Statement body)
| \declarationStatement(Declaration declaration)
| \while(Expression condition, Statement body)
| \expressionStatement(Expression stmt)
| \constructorCall(list[Type] typeArguments, list[Expression] arguments)
| \superConstructorCall(Expression expr, list[Type] typeArguments, list[Expression] arguments)
| \superConstructorCall(list[Type] typeArguments, list[Expression] arguments)
| \yield(Expression argument)
;

data Type

These are the literal types you can find in Java programs.

data Type (TypeSymbol typ=unresolved()) 
= arrayType(Type \type)
| parameterizedType(Type \type, list[Type] typeArguments)
| qualifiedType(list[Modifier] annotations, Type typeQualifier, Expression simpleName)
| qualifiedType(list[Modifier] annotations, Expression nameQualifier, Expression simpleName)
| simpleType(Expression typeName)
| unionType(list[Type] types)
| intersectionType(list[Type] types)
| wildcard(list[Modifier] annotations)
| super(list[Modifier] annotations, Type \type)
| extends(list[Modifier] annotations, Type \type)
| \int()
| short()
| long()
| float()
| double()
| char()
| string()
| byte()
| \void()
| \boolean()
;
  • The constructors of Type represent the syntax of types in Java.
  • Their typ keyword field maps the syntax to the symbolic type representation as TypeSymbols.

Pitfalls

  • Type and TypeSymbol are easy to confuse because they are very similar in name, structure and intent. It is good to remember that there can be more TypeSymbols while analyzing types for Java than one can type in. Namely, TypeSymbol is used to compute with and analyze the Java type system, while Type is only meant to represent the syntax of types in Java source code.
  • Type closely follows the syntactic structure, while TypeSymbol follows the logical structure. For example: Node<Cons>[] in Java syntax becomes arrayType(parameterizedType(simpleType(id("Node")),[simpleType(id("Cons"))])) as an abstract syntax tree Type, which becomes this TypeSymbol: array(class(|class:///Node|,[interface(|interface:///Cons|,[])]),1))
  • TypeSymbol reduces different ways of writing types to one core canonical

data Modifier

Modifiers are additional pieces of information attached to (typically) declarations.

data Modifier  
= \private()
| \public()
| \protected()
| \friendly()
| \static()
| \final()
| \synchronized()
| \transient()
| \abstract()
| \native()
| \volatile()
| \strictfp()
| \default()
| \open()
| \transitive()
| \markerAnnotation(Expression name)
| \normalAnnotation(Expression name, list[Expression] memberValuePairs)
| \singleMemberAnnotation(Expression typeName, Expression \value)
;

This also includes "user-defined" modifers such as so called "Java Annotations".

function getPaths

set[loc] getPaths(loc dir, str suffix)

function findRoots

Utility to help configuring the createAstFromFile function.

set[loc] findRoots(set[loc] folders)

The Create Ast From File works well if the source roots and library classpath parameters are configured correctly.

This helper function crawls the file system from bottom to top. Starting with a potentially interesting set of Java files or folders for analysis, it finds the "root" of the class path by inspecting the package declarations all .java files and subtracting each package name from their source location to arrive at a set of root folders.

Benefits

  • Robust way of configuring the source getPaths

Pitfalls

  • Typically projects have dependencies which are not found using this function.
  • This function does a lot of IO for just a little fact extraction.

function createAstFromFile

Creates AST from a single file.

Declaration createAstFromFile(loc file, bool collectBindings, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())

Wrapper around Create Asts From Files to call it on a single file.

function createAstsFromFiles

Creates ASTs for a set of files using Eclipse JDT compiler.

set[Declaration] createAstsFromFiles(set[loc] file, bool collectBindings, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())

Pitfalls

  • While the function takes a set of locations, it ignores the positional information of the location. Meaning, that it analyzes the whole file and not just the part that the positional information describes.

function createAstFromString

Creates AST from a string using Eclipse JDT compiler.

Declaration createAstFromString(loc fileName, str source, bool collectBinding, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())

function createAstsFromDirectory

Creates a set ASTs for all Java source files in a project using Eclipse's JDT compiler.

set[Declaration] createAstsFromDirectory(loc project, bool collectBindings, bool errorRecovery = false, Language javaVersion = JLS13() )

Recursively looks for the .java files in the directory, and also looks for the dependencies (.jar files) to include them. Wraps around Create Asts From Files.

function createAstsFromMavenProject

Creates a set of ASTs for all Java source files in a Maven project using Eclipse's JDT compiler.

set[Declaration] createAstsFromMavenProject(loc project, bool collectBindings, bool errorRecovery = false, Language javaVersion = JLS13() )

This function uses Reflective-getProjectPathConfig, which inspects a pom.xml to compute the dependencies and concrete locations of jar files that a Maven project depends on. The location of project points to the root of the project to analyze. As a consequence, the pom.xml is expected to be at project + "pom.xml".

Wraps around Create Asts From Files.