module lang::java::m3::Core
Extends the M3 Core model with Java specific concepts such as inheritance and overriding.
Usage
import lang::java::m3::Core;
Dependencies
extend lang::java::m3::TypeSymbol;
import lang::java::m3::AST;
extend analysis::m3::Core;
import analysis::graphs::Graph;
import analysis::m3::Registry;
import IO;
import String;
import Relation;
import Set;
import List;
import util::FileSystem;
import util::Reflective;
data M3
Java extensions to the generic M3 model.
data M3 (
rel[loc from, loc to] extends = {},
rel[loc from, loc to] implements = {},
rel[loc from, loc to] methodInvocation = {},
rel[loc from, loc to] fieldAccess = {},
rel[loc from, loc to] typeDependency = {},
rel[loc from, loc to] methodOverrides = {},
rel[loc from, loc to] annotations = {}
)
Notice that this model also contains the core attributes from M3;
in particular containment
, declarations
, modifiers
, uses
, types
, and messages
are hot for the Java M3 model.
The additional relations represent specifically static semantic links from the object-oriented programming paradigm that Java belongs to. However, this only contains facts extracted directly from source code. The actual static semantic interpretation (type hierarchies, call graphs) requires an additional analysis step with its own design choices.
Ground truth fact kind about source code | Description |
---|---|
rel[loc from, loc to] extends | captures class and interface inheritance, classes extend classes and interfaces extend interfaces. Implicit inheritance (i.e. from java.lang.Object) is not included. |
rel[loc from, loc to] implements | which class implements which interfaces (directly, transitive implementation via de extends relation must be derived) |
rel[loc from, loc to] methodInvocation | which method potentially invokes which (virtual) method. For a call graph this must be composed with methodOverrides |
rel[loc from, loc to] fieldAccess | which method (or static block or field initializer) accesss which fields from which classes |
rel[loc from, loc to] typeDependency | uses of types (literally!) in methods, static blocks and field initializers. |
rel[loc from, loc to] methodOverrides | captures which methods override which other methods from their parents in the inheritance/implements hierarchy. Useful for approximating call graphs. |
rel[loc from, loc to] annotations | logs which declarations (classes, interfaces, parameters, methods, variables, etc.) are tagged with which annotation classes or interfaces |
These are the kinds of logical names that can be found in a Java M3 model:
unknown:///
is from general M3 and means a name has not even been tried to be resolved. It usually the default for the keyword parametersdecl
andtyp
.unresolved:///
is from general M3 and means a name was tried to be resolved, but this was unsuccessful. Typically this means either the Java source code was statically incorrect, or the classpath for configuring AST or M3 extraction was incomplete.java+class:///
is a fully resolved and qualified class namejava+interface:///
is a fully resolved and qualified interface namejava+classOrInterface:///
is the fully qualified name of an external class or interface that has not been resolved (since it is not on the classpath but it is used).java+module:///
the root scheme points to any or all modules and when it has a qualified name it is a specific module (a la Java 9's module system).java+compilationUnit:///
is the name of a file that contains an entire Java compilationUnit. The path name starts from the root of the source path for the current analysis run, or from the jar file that contains the .class bytecode currently under analysis. Typically a source compilation unit has one class member in thecontainment
relation, but this is not required. There could be more private or protected classes in the same unit.java+constructor:///
is the fully qualified constructor method of class (including parameter types to distinguish it from the other constructors)java+method:///
is the fully qualified constructor method of class (including parameter types to distinguish it from the other constructors)java+initializer:///
unique address of an initializer expression for a field or variable.java+parameter:///
unique address for method, constructor and lambda parameters.java+variable:///
unique address for local variables in methods, constructors and lambdas.java+field:///
unique address for fields of classes and interfaces.java+enum:///
is a fully resolved and qualified name of an enum class.java+array:///
is the name of an array typejava+typeVariable:///
is the unique address of an open type variable (of a class or method)java+wildcardType:///
is the address of an anonymous type variable.java+enumConstant:///
is the unique address of one of the constants of anenum
type.java+field:///
is the unique address of a field in a class, interface or enum.java+arrayLength:///
is the singleton address for the length field of all arrays in Java.java+primitiveType:///
is used when type resolution points to a builtin primitive in Java. These names also often occurjava+anonymousClass:///
uniquely labels anonymous classes, however since indexing is used the names are not stable between different versions of the same code, or between binary and source extractions. as elements of other schemes, for example to uniquel encode parameter types of methods.
Benefits
- Java M3 is an immutable database, which is implemented using advanced persistent hash-tries under the hood. This makes analysis fast, and if you compute new M3 models from earlier models your source data can never be influenced by the later stages. This is very good for research purposes where the provenance of every data point is essential for the sake of validity.
- Java M3 is complete in the sense that all notions of programmable artefacts that exist in Java are represented. However, every piece of information is local and static: intra-procedural, flow and path insensitive. If the M3 database does not provide enough information, then the AST model is the next source to use. There is also a flow analysis module: Java To Object Flow.
- Java M3 is aligned with the AST model for Java:
- every
decl=
parameter on Declarations nodes corresponds to an entry in thedeclarations
relation in M3 - every
decl=
parameter on other nodes (Expressions, Types, Statements), corresponds to an entry in theuses
relation in M3. - every
src
parameter on AST nodes can be looked up in thedeclarations
oruses
relations. Although not all nodes lead to declarations or uses of declarations, if they do their source locations line up between the AST and the M3 model. - The
typ
parameters on nodes is a distributed version of the generictypes
relation in de M3 model. - scope nesting in the AST is represented one-to-one by the
containment
relation in the M3 model. - The
modifiers
relation in M3 collects all the modifiers for every Declaration node that may have modifiers in Java.
- every
- Java M3 is freely composable using the Compose M3 function, to create larger databases of measurable and analyzable software artefacts.
- Java M3 has been used in countless education and research projects.
- Java M3 can be composed with M3 models of other languages (C++, C) for cross language analysis.
- Java M3 represents only exact and accurate facts, and no analysis results (yet). You can write a simple call graph analysis in one line of Rascal; but it is good to realize that such analyses are inaccurate by their very nature.
- Java M3 has all the basic facts to build basic and advanced static analyses (such as call graphs)
- Using Diff Java M3 software evolution can be tracked over syntactic and semantic objects, instead of just line of text.
- M3 models can be extracted from source code, but also from .class files in jars and folders.
Pitfalls
- with
|unresolved:///|
's orunknown:///
's in the model, counting elements for the sake of software metrics is dubious. The reason is that the counts will simply be off (both over- and under-approximated). Typically it is better to iterate a better classpath until the list of errormessages
from the compiler is empty and all names and types have been resolved, and then start measuring or running downstream analyses. - Composition via Compose M3 is dumb; it simply unions all the sets of tuples. For a meaningful link, there are specific analyses to run.
- Models extracted for the same project from source or from .class files can be slightly different:
- If classpath parameters were different between the different extractors, then names and types can be resolved differently or not at all.
- Some artefacts on the source code level are compiled away on the bytecode level (lambdas are a fine example). If you need to line up facts from source and binary, an additional analysis is required using heuristics that "decompile" JVM bytecode back to the Java level.
- Anonymous entities such as lambdas and anonymous nested classes may be labeled by a different counter, accidentally.
- The intersection of the generic
modifiers
relation in M3 and the Java-specificannotations
is not empty. All annotations are also modifiers in Java M3, but no non-annotation modifiers end up in the `annotations`` relation.
data M3
Extensions for modelling the Java module system.
data M3 (
rel[loc \module, loc requiredModule] moduleRequiresModule = {},
rel[loc \module, loc package, loc to] moduleOpensPackage = {},
rel[loc \module, loc service, loc to] moduleExportsPackage = {},
rel[loc \module, loc service, loc implementation] moduleProvidesService = {},
rel[loc \module, loc service] moduleUsesService = {}
)
The Java module system was introduced with Java 9. A "module" is a group of related packages, what is typically called a "component" in general programming terms. A module definition explains:
- what other modules the current module depends on:
moduleRequiresModule
- what packages are open to reflection by other modules:
moduleOpensPackage
- what interfaces are available to other modules (the rest is unavailable by default):
moduleExportsInterface
- the "services" it uses from other modules:
moduleUsesInterface
- the "services" it offers to other modules:
moduleProvidesImplementation
And so each of these aspects has their own set of facts in the extended M3 model for Java:
| Facts about Java modules | Description |
| ---------------------------------------------------- | ------------------------------------------------------- |
| rel[loc \module, loc requiredModule] moduleRequiresModule
| which modules each module requires |
| rel[loc \module, loc package, loc to] moduleOpensPackage
| what packages are open for reflection in a given module |
| rel[loc \module, loc service, loc to] moduleExportsPackage
| which packages (the public and protected classes therein) are exported by every module |
| rel[loc \module, loc service, loc implementation] moduleProvidesService
| what services are implemented by this module |
| rel[loc \module, loc service] moduleUsesService
| which services are used by every module |
Benefits
- M3 models with the Module system extensions are composable to generate large queriable databases for entire software ecosystems.
- The same module definitions can be extracted from both bytecode (in jar files) and source code.
Pitfalls
- modules, although they semantically encapsulate packages, interfaces and classes, do not appear in the
containment
relation of the core M3 model. That is becausecontainment
represents the static scoping relation of declared source code elements. The relation between modules and what is inside them is yet another form of encapsulation; so to avoid conflating them they are stored in different relations. - modules information extracted from .class files in jars can be different from the information extracted from source. The matching
cause of this is a different
classpath
at M3-model-extraction-time. Some classes or interfaces from external projects may be resolved asjava+classOrInterface:///examplePackage/exampleClassOrInterface
in the one, while the exact type is visible asjava+interface:///examplePackage/exampleClassOrInterface
in the other. After linking all available models using Compose M3 you could write a simple analysis that resolves the unresolved references, or perhaps this information is not consequential to your analysis task. - the Java module system does not pertain projects and project dependencies and their versions.
function composeJavaM3
Combines a set of Java meta models by merging their relations.
M3 composeJavaM3(loc id, set[M3] models)
function diffJavaM3
Returns the difference between the first model and the others.
M3 diffJavaM3(loc id, list[M3] models)
Combines models[1..]
into a single model and then calculates
the difference between model[0]
and this new joined model.
The id
is the identifier for the returned model.
function createM3FromFile
Creates a M3 from a single files.
M3 createM3FromFile(loc file, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())
Identical to Create M3s From Files: createM3sFromFiles({file})
.
function createM3sFromFiles
For a set of Java files, generates matching M3s.
set[M3] createM3sFromFiles(set[loc] files, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())
Each M3 has the id
filled with a matching location from files
.
function createM3FromFiles
For a set of Java files, creates a composed M3.
M3 createM3FromFiles(loc projectName, set[loc] files, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())
While Create M3s From Files leaves the M3s separated, this function composes them into a single model.
function createM3sAndAstsFromFiles
tuple[set[M3], set[Declaration]] createM3sAndAstsFromFiles(set[loc] files, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())
function createM3FromString
M3 createM3FromString(loc fileName, str contents, bool errorRecovery = false, list[loc] sourcePath = [], list[loc] classPath = [], Language javaVersion = JLS13())
function createM3FromJarClass
M3 createM3FromJarClass(loc jarClass, list[loc] classPath = [])
function createM3FromSingleClass
M3 createM3FromSingleClass(loc jarClass, str className, list[loc] classPath = [])
function createM3FromJarFile
M3 createM3FromJarFile(loc jarLoc, list[loc] classPath = [])
function createM3FromDirectory
Globs for jars, class files and java files in a directory and tries to compile all source files into an M3 model.
M3 createM3FromDirectory(loc project, bool errorRecovery = false, bool includeJarModels=false, Language javaVersion = JLS13(), list[loc] classPath = [])
function createM3FromMavenProject
Globs for jars, class files and java files in a directory and tries to compile all source files into an M3 model.
M3 createM3FromMavenProject(loc project, bool errorRecovery = false, bool includeJarModels=false, Language javaVersion = JLS13())
function createM3FromJar
Extract an M3 model from all the class files in a jar.
M3 createM3FromJar(loc jarFile, list[loc] classPath = [])
We use Create M3 From Jar to extract an initial M3 model and then a number of steps enrich the M3 towards a model that could have come from the original source.
In particular:
typeDependency
is enriched by addingextends
andimplements
methodOverrides
is recovered fromextends
andimplements
, but restricted to the actual overriden methods.
function unregisterJavaProject
void unregisterJavaProject(loc project)
function getMethodSignature
str getMethodSignature(loc method)
function isCompilationUnit
Checks if the logical name of the entity
is a compilation unit.
bool isCompilationUnit(loc entity)
A compilation unit is equivalent to a .java
file in Java.
function isPackage
Checks if the logical name of the entity
is a package.
bool isPackage(loc entity)
function isClass
Checks if the logical name of the entity
is a class.
bool isClass(loc entity)
function isConstructor
Checks if the logical name of the entity
is a constructor.
bool isConstructor(loc entity)
function isMethod
Checks if the logical name of the entity
is a method.
bool isMethod(loc entity)
Constructors and initializers are also considered methods here.
Pitfalls
If isConstructor(entity)
, then also isMethod(entity)
.
Note that the opposite is not true.
function isParameter
Checks if the logical name of the entity
is a parameter.
bool isParameter(loc entity)
function isVariable
Checks if the logical name of the entity
is a variable.
bool isVariable(loc entity)
function isField
Checks if the logical name of the entity
is a field.
bool isField(loc entity)
function isInterface
Checks if the logical name of the entity
is an interface.
bool isInterface(loc entity)
function isEnum
Checks if the logical name of the entity
is an enum.
bool isEnum(loc entity)
function isType
Checks if the logical name of the entity
is a type.
bool isType(loc entity)
A type is considered to be a class, an interface, or an enum.
Pitfalls
If isClass(entity)
, then also isType(entity)
.
If isInterface(entity)
, then also isType(entity)
.
If isEnum(entity)
, then also isType(entity)
.
Note that the opposite is not true.
function files
Extracts all fields that are contained in parent
.
set[loc] files(rel[loc, loc] containment)
function declaredMethods
rel[loc, loc] declaredMethods(M3 m, set[Modifier] checkModifiers = {})
function declaredFields
rel[loc, loc] declaredFields(M3 m, set[Modifier] checkModifiers = {})
function declaredFieldsX
rel[loc, loc] declaredFieldsX(M3 m, set[Modifier] checkModifiers = {})
function declaredTopTypes
For all compilation units (left side), gets the types (right side).
rel[loc, loc] declaredTopTypes(M3 m)
function declaredSubTypes
rel[loc, loc] declaredSubTypes(M3 m)
function classes
Extracts all classes (logical names) from an M3.
set[loc] classes(M3 m)
Caches the results in memory.
function interfaces
Extracts all interfaces (logical names) from an M3.
set[loc] interfaces(M3 m)
Caches the results in memory.
function packages
Extracts all packages (logical names) from an M3.
set[loc] packages(M3 m)
Caches the results in memory.
function variables
Extracts all variables (logical names) from an M3.
set[loc] variables(M3 m)
Caches the results in memory.
function parameters
Extracts all parameters (logical names) from an M3.
set[loc] parameters(M3 m)
Caches the results in memory.
function fields
Extracts all fields (logical names) from an M3.
set[loc] fields(M3 m)
Caches the results in memory.
function methods
Extracts all methods (logical names) from an M3.
set[loc] methods(M3 m)
Caches the results in memory.
function constructors
Extracts all constructors (logical names) from an M3.
set[loc] constructors(M3 m)
Caches the results in memory.
function enums
Extracts all enums (logical names) from an M3.
set[loc] enums(M3 m)
Caches the results in memory.
function types
Extracts all types (logical names) from an M3.
set[loc] types(M3 m)
Caches the results in memory.
function elements
Extracts all elements that are contained in parent
.
set[loc] elements(M3 m, loc parent)
See M3 containment
for the definition of contains.
function fields
Extracts all fields that are contained in class
.
set[loc] fields(M3 m, loc class)
Filtered version of Elements.
function methods
Extracts all methods that are contained in class
.
set[loc] methods(M3 m, loc class)
Filtered version of Elements.
function constructors
Extracts all constructors that are contained in class
.
set[loc] constructors(M3 m, loc class)
Filtered version of Elements.
function nestedClasses
Extracts all classes that are contained in class
.
set[loc] nestedClasses(M3 m, loc class)
Filtered version of Elements.