Gom is a generator of tree implementations in Java, allowing the definition of a preferred canonical form, using hooks. The tree implementations Gom generates are characterized by strong typing, immutability (there is no way to manipulate them with side-effects), maximal subterm sharing and the ability to be used with strategies.
The basic functionality of Gom is to provide a tree implementation in Java corresponding to an algebraic specification.
Gom provides a syntax to concisely define abstract syntax tree, which is inspired from the algebraic type definition of ML languages.
Each Gom file consists in the definition of one or more modules. In each module we define sorts, and operators for the sorts. Additionally, it allows to define hooks modifying the default behavior of an operator.
GomGrammar | ::= | Module |
Module | ::= | ‘module’ ModuleName [Imports] Grammar |
Imports | ::= | ‘imports’ (ModuleName)* |
Grammar | ::= | ‘abstract syntax’ (TypeDefinition ∣ HookDefinition)* |
TypeDefinition | ::= | SortName ‘=’ [‘|’] OperatorDefinition (‘|’ OperatorDefinition)* |
OperatorDefinition | ::= | OperatorName ‘(’[SlotDefinition(‘,’SlotDefinition)*]‘)’ |
∣ | OperatorName ‘(’ SortName‘*’ ‘)’ | |
SlotDefinition | ::= | SlotName ‘:’ SortName |
ModuleName | ::= | Identifier |
SortName | ::= | Identifier |
OperatorName | ::= | Identifier |
SlotName | ::= | Identifier |
HookDefinition | ::= | HookType ‘:’ HookOperation |
HookType | ::= | OperatorName |
∣ | ‘module’ ModuleName | |
∣ | ‘sort’ SortName | |
HookOperation | ::= | ‘make’ ‘(’ [Identifier (‘,’ Identifier)*] ‘)’ ‘{’ TomCode ‘}’ |
∣ | ‘make_insert’ ‘(’ Identifier ‘,’ Identifier)* ‘)’ ‘{’ TomCode ‘}’ | |
∣ | ‘make_empty’ ‘(’‘)’ ‘{’ TomCode ‘}’ | |
∣ | ‘Free () {}’ | |
∣ | ‘FL () {}’ | |
∣ | (‘AU’ ∣‘ACU’) ‘()’ ‘{’ [‘‘’ term] ‘}’ | |
∣ | ‘interface()’ ‘{’ Identifier (‘,’ Identifier )* ‘}’ | |
∣ | ‘import()’ ‘{’ JavaImports ‘}’ | |
∣ | ‘block()’ ‘{’ TomCode ‘}’ | |
∣ | ‘rules()’ ‘{’ RulesCode ‘}’ | |
∣ | ‘graphrules(’ Identifier ‘,’ (‘Identity’ ∣‘Fail’)‘)’‘{’ GraphRulesCode ‘}’ | |
RulesCode | ::= | (Rule)* |
Rule | ::= | RulePattern ‘->’ TomTerm [‘if’ Condition] |
RulePattern | ::= | [ AnnotedName ‘@’ ] PlainRulePattern |
PlainRulePattern | ::= | [‘!’]VariableName [ ‘*’ ] |
∣ | [‘!’]HeadSymbolList ‘(’ [RulePattern ( ‘,’ RulePattern )*] ‘)’ | |
∣ | ‘_’ | |
∣ | ‘_*’ | |
GraphRulesCode | ::= | (GraphRule)* |
GraphRule | ::= | TermGraph ‘->’ TermGraph [‘if’ Condition] |
TermGraph | ::= | [ Label ‘:’ ] TermGraph |
∣ | ‘&’ Label | |
∣ | VariableName [ ‘*’ ] | |
∣ | HeadSymbolList ‘(’ [TermGraph ( ‘,’ TermGraph )*] ‘)’ | |
Label | ::= | Identifier |
Condition | ::= | TomTerm Operator TomTerm |
∣ | RulePattern ‘<<’ TomTerm | |
∣ | Condition ‘&&’ Condition | |
∣ | Condition ‘||’ Condition | |
∣ | ‘(’ Condition ‘)’ | |
Operator | ::= | ‘>’ ∣‘>=’ ∣‘<’ ∣‘<=’ ∣‘==’ ∣‘!=’ |
Gom supports several builtin sorts, that may be used as sort for new operators slots. To each of these builtin sorts corresponds a Java type. Native data types from Java can be used as builtin fields, as well as ATerm and ATermList structures. To use one of these builtin types in a Gom specification, it is required to add an import for the corresponding builtin.
Name | Java type |
int | int |
String | String |
char | char |
double | double |
long | long |
float | float |
ATerm | aterm.ATerm |
ATermList | aterm.ATermList |
It is not possible to define a new operator whose domain is one of the builtin sorts, since those sorts are defined in another module.
External Gom modules may be imported in a Gom specification, by adding the name of the module to import in the ‘imports’ section. Once a module imported, it is possible to use any of the sorts this module declares or imports itself as type for a new operator slot. Adding new operators to an imported sort is however not allowed.
The syntax of Gom is quite simple and can be used to define many-sorted abstract-datatypes. The module section defines the name of the signature. The imports section defines the name of the imported signatures. The abstract syntax part defines the operators with their signature. For each argument, a sort and a name (called slot-name) has to be given.
module Expressions imports String int abstract syntax Bool = True() | False() | Eq(lhs:Expr, rhs:Expr) Expr = Id(stringValue:String) | Nat(intValue:int) | Add(lhs:Expr, rhs:Expr) | Mul(lhs:Expr, rhs:Expr)
The definition of a signature in Gom has several restrictions:
A first solution to combine Tom with Gom is to use Gom as a standalone tool, using the command line tool or the ant task.
In that case, the module name of the Gom specification and the package option determine where the files are generated. To make things correct, it is sufficient to import the generated Java classes, as well as the generated Tom file. In the case of a Gom module called Module, all files are generated in a directory named module and the Tom program should do the following:
import module.*; import module.types.*; class MyClass { ... %include { module/Module.tom } ... }
A second possibility to combine Tom with Gom is to use the %gom construct offered by Tom. In that case, the Gom module can be directly included into the Tom file, using the %gom instruction:
package myPackage; import myPackage.myclass.expressions.*; import myPackage.myclass.expressions.types.*; class MyClass { %gom{ module Expressions abstract syntax Bool = True() | False() ... Expr = Mul(lhs:Expr, rhs:Expr) } ... }
Note that the Java code is generated in a package that corresponds to the current package, followed by the class-name and the module-name. This allows to define the same module in several classes, and avoid name clashes.
Gom provides hooks that allow to define properties of the data-structure, in particular canonical forms for the terms in the signature in an algebraic way.
The ‘rules’ hook ables to define a set of conditional rewrite rules over the current module signature. Those rules are applied systematically using a leftmost innermost strategy. Thus, the only terms that can be produced and manipulated in the Tom program are normal with respect to the defined system.
module Expressions imports String int abstract syntax Bool = True() | False() | Eq(lhs:Expr, rhs:Expr) Expr = Id(stringValue:String) | Nat(intValue:int) | Add(lhs:Expr, rhs:Expr) | Mul(lhs:Expr, rhs:Expr) module Expressions:rules() { Eq(x,x) -> True() Eq(x,y) -> False() if x!=y }
Since the rules do alter the behavior of the construction functions in the term structure, it is required in a module that the rules in a ‘rules’ hook have as left hand side a pattern rooted by an operator of the current module. The rules are tried in the order of their definitions, and the first matching rule is applied.
Hooks may be used to specify how operators should be created. ‘make’, ‘make_empty’ and ‘make_insert’ hooks are altering the creation operations for respectively algebraic, neutral element (empty variadic) and variadic operators. ‘make_insert’ is simply a derivative case of ‘make’, with two arguments, for variadic operators.
The hook operation type is followed by a list of arguments name between ‘()’. The creation operation takes those arguments in order to build a new instance of the operator. Thus, the arguments number has to match the slot number of the operator definition, and types are inferred from this definition.
Then the body of the hook definition is composed of Java and Tom code. The Tom code is compiled using the mapping definition for the current module, and thus allows to build and match terms from the current module. This code can also use the realMake function, which consists in the "inner" default allocation function. This function takes the same number of arguments as the hook. In any case, if the hooks code does not perform a return itself, this realMake function is called at the end of the hook execution, with the corresponding hooks arguments
Using the expression example introduced in 11.1.2, we can add hooks to implement the computation of Add and Mul when both arguments are known integers (i.e. when they are Nat(x))
module Expressions imports String int abstract syntax Bool = True() | False() | Eq(lhs:Expr, rhs:Expr) Expr = Id(stringValue:String) | Nat(intValue:int) | Add(lhs:Expr, rhs:Expr) | Mul(lhs:Expr, rhs:Expr) Add:make(l,r) { %match(Expr l, Expr r) { Nat(lvalue), Nat(rvalue) -> { return `Nat(lvalue + rvalue); } } } Mul:make(l,r) { %match(Expr l, Expr r) { Nat(lvalue), Nat(rvalue) -> { return `Nat(lvalue * rvalue); } } }
Using this definition, it is impossible to have an expression containing unevaluated expressions where a value can be calculated. Thus, a procedure doing constant propagations for Id whose value is known could simply replace the Id by the corresponding Nat, and rely on this mechanism to evaluate the expression. Note that the arguments of the make hook are themselves elements built on this signature, and thus the hooks have been applied for them. In the case of hooks encoding a rewrite system, this corresponds to using an innermost strategy.
In order to ease the use of variadic operators with the same domain and co-domain, Gom does provide hooks that enforce a particular canonical form for lists.
<op>:FL() {}
, ensures that structures
containing the operator <op> are left to right combs, with an empty
<op>() at the right.
This constitutes a particular form of associative with neutral element
canonical form, with a restriction on the application of the neutral rules.
This corresponds to the generation of the following normalisation rules:
make(Empty,tail) -> tail make(Cons(h,t),tail) -> make(h,make(t,tail)) make(head,tail) -> make(head,make(tail,Empty)) if tail!=Empty and tail!=Cons
<op>:Free() {}
, ensures the variadic
symbol remains free, i.e. deactivates the default FL hook.
<op>:AU() {}
, ensures that structures
containing the <op> operator are left to right comb, and that neutral
elements are removed. It is possible to specify an alternate neutral element
to the associative with neutral theory, using <op>:AU() {`<elem>()}
,
where <elem> is a term in the signature.
This will generate the following normalisation rules:
make(Empty,tail) -> tail make(head,Empty) -> head make(Cons(h,t),tail) -> make(h,make(t,tail))
make(Empty,tail) -> tail make(head,Empty) -> head make(Cons(h,t),tail) -> make(h,make(t,tail)) make(head,Cons(h,t)) -> make(head,Cons(h,t)) if head < h make(head,Cons(h,t)) -> make(h,make(head,t)) if head >= h
If you do not define any hook of the form AU, ACU, FL, Free, or rules, the FL hook will be automatically declared for variadic operators those domain and co-domain are equals. In practice, this makes list matching and associative matching with neutral element easy to use.
If you use a hook rules, there may be an interaction that can lead to non-termination. Therefore, no hook will be automatically added, and you are forced to declare a hook of the form AU, ACU, FL or Free.
For each Module, the Gom compiler generates an API specific of this module, possibly using the API of other modules (declared in the Imports section). This API is located in a Java package named using the ModuleName lowercased, and contains the tree implementation itself, with some additional utilities:
public aterm.ATerm toATerm(); public String symbolName(); public int compareTo(Object o); public int compareToLPO(Object o); public void toStringBuilder(java.lang.StringBuilder buffer);
public boolean is<op>(); public <SlotType> get<slotName>(); public <SortName> set<slotName>(<SlotType>); public static <SortName> fromTerm(aterm.ATerm trm); public static <SortName> fromString(String s); public static <SortName> fromStream(java.io.InputStream stream) throws java.io.IOException; public int length(); public <SortName> reverse();
public static <op> make(arg1,...,argn);
We show elements of the generated API for a very simple example, featuring variadic operator. It defines natural numbers as Zero() and Suc(n), and lists of natural numbers.
module Mod abstract syntax Nat = Zero() | Suc(pred:Nat) | Plus(lhs:Nat,rhs:Nat) | List(Nat*)
Using the command gom Mod.gom, the list of generated files is:
mod/Mod.tom (the Tom mapping) mod/ModAbstractType.java mod/types/Nat.java (abstract class for the "Nat" sort) mod/types/nat/List.java \ mod/types/nat/ConsList.java \ mod/types/nat/EmptyList.java / Implementation for the operator "List" mod/types/nat/Plus.java (Implementation for "Plus") mod/types/nat/Suc.java (Implementation for "Suc") mod/types/nat/Zero.java (Implementation for "Zero")
The ModAbstractType class declares generic methods shared by all operators in the Mod module:
public aterm.ATerm toATerm() public String symbolName() public String toString()
The mod/types/Nat.java class provides an abstract class for all operators in the Nat sort, implementing the ModAbstractType and contains the following methods. First, the methods for checking the root operator, returning false by default:
public boolean isConsList() public boolean isEmptyList() public boolean isPlus() public boolean isSuc() public boolean isZero()
Then getter methods, throwing an UnsupportedOperationException by default, as the slot may not be present in all operators. This is convenient since at the user level, we usually manipulate objects of sort Nat, without casting them to more specific types.
public mod.types.Nat getpred() public mod.types.Nat getlhs() public mod.types.Nat getHeadList() public mod.types.Nat getrhs() public mod.types.Nat getTailList()
The fromTerm static method allows Gom data structure to be
interoperable with ATerm
public static mod.types.Nat fromTerm(aterm.ATerm trm)
The operator implementations redefine all or some getters for the operator to return its subterms. It also provides a static make method to build a new tree rooted by this operator, and implements the tom.library.sl.Visitable interface. For instance, in the case of the Plus operator, the interface is:
public static Plus make(mod.types.Nat lhs, mod.types.Nat rhs) public int getChildCount() public tom.library.sl.Visitable getChildAt(int index) public tom.library.sl.Visitable setChildAt(int index, tom.library.sl.Visitable v)
completed with the methods from the Nat class and the ModAbstractType.
The operators implementing the variadic operator both extend the List class, which provides list related methods, such as length, toArray and reverse. The toArray method produces an array of object of the codomain type corresponding to the list elements, while the reverse method returns the list with all elements in reverse order. The List class for our example then contains:
public int length() public mod.types.Nat[] toArray() public mod.types.Nat reverse()
For the ConsList class, we obtain:
/* the constructor */ public static ConsList make(mod.types.Nat _HeadList, mod.types.Nat _TailList) { ... } public String symbolName() { ... } /* From the "Nat" class */ public boolean isConsList() { ... } public mod.types.Nat getHeadList() { ... } public mod.types.Nat getTailList() { ... } /* From the "ModAbstractType" class */ public aterm.ATerm toATerm() { ... } public static mod.types.Nat fromTerm(aterm.ATerm trm) { ... } /* The tom.library.sl.Visitable interface */ public int getChildCount() { ... } public tom.library.sl.Visitable getChildAt(int index) { ... } public tom.library.sl.Visitable setChildAt(int index, tom.library.sl.Visitable v) { ... } /* The MuVisitable interface */ public tom.library.sl.Visitable setChilds(tom.library.sl.Visitable[] childs) { ... }
There exist four other hooks ‘import’, ‘interface’, ‘block’ and ‘mapping’ that offer possibilities to enrich the generated API. Contrary to ‘make’ and ‘make_insert’, these hooks have no parameters. Moreover, they can be associated not only to an operator but also to a module or a sort.
HookDefinition | ::= | HookType ‘:’ HookOperation |
HookType | ::= | OperatorName |
∣ | ‘module’ ModuleName | |
∣ | ‘sort’ SortName | |
HookOperation | ::= | ‘interface()’ ‘{’ Identifier (‘,’ Identifier )* ‘}’ |
∣ | ‘import()’ ‘{’ JavaImports ‘}’ | |
∣ | ‘block()’ ‘{’ TomCode ‘}’ | |
∣ | ‘mapping()’ ‘{’ TomMapping ‘}’ |
There are few constraints on the form of the code in these hooks:
The code given in the hook is just added at the correct position in the corresponding Java class:
In the case of ‘mapping’ hooks, the corresponding code is added to the mapping generated for the signature.
module Expressions imports String int abstract syntax Bool = True() | False() | Eq(lhs:Expr, rhs:Expr) Expr = Id(stringValue:String) | Nat(intValue:int) | Add(lhs:Expr, rhs:Expr) | Mul(lhs:Expr, rhs:Expr) True:import() { import tom.library.sl.*; import java.util.HashMap; } sort Bool:interface() { Cloneable, Comparable } module Expressions:block() { %include{ util/HashMap.tom } %include{ sl.tom } %strategy CollectIds(table:HashMap) extends Identity() { visit Expr { Id(value) -> { table.put(`value,getEnvironment().getPosition()); } } } public static HashMap collect(Expr t) { HashMap table = new HashMap(); `TopDown(CollectIds(table)).apply(t); return table; } }
A term-graph is a term where subterms can be shared and where there may be cycles. Gom offers support to define term-graphs and term-graph rule systems. There exist several ways to define term-graphs but in our case, we propose to represent term-graphs by terms with pointers. These pointers are defined by a relative path inside the term. All the formal definitions can be found in this paper.
When defining a Gom algebraic signature, it is possible to construct term-graphs on these signature using the option –termgraph. In this case, the signature is automatically extended to manage labels. For every sort T, two new constructors are added:
LabT(label:String,term:T) RefT(label:String)
With these two new constructors, users can define term-graphs as labelled terms:
Term cyclicTerm = `LabTerm("l",f(RefTerm("l"))); Term termWithSharing = `g(RefTerm("a"),LabTerm("a",a()));
From this labelled term, users can obtain the term-graph representation with paths using the expand method. This method must be called before applying a term-graph strategy.
Using the hook graphrules, it is possible to define a set of term-graph rules. The left-hand and right-hand sides of these rules are term-graphs. A set of rules can only be associated to a given sort.
sort Term: graphrules(MyGraphStrat,Identity) { g(l:a(),&l) -> f(b()) f(g(g(a(),&l),l:x)) -> g(ll:b(),&ll) if b()<<x }
In the rules, sharings and cycles are not represented by the constructor LabTerm and RefTerm but using a light syntax. l:t is equivalent to LabTerm(l,t) and &l corresponds to RefTerm(l).
Contrary to classical term-graph rewriting, it is possible to reuse a label from the left-hand side in the right-hand side in order to obtain side effects. This feature is inspired from Rachid Echahed’s formalism.
sort Term: graphrules(SideEffect,Identity) { f(l:a()) -> g(&l,l:b()) }
This set of rules is translated into a Tom %strategy that can be used in a Tom program:
Term t = (Term) `g(RefTerm("a"),LabTerm("a",a())).expand(); `TopDown(Term.MyGraphStrat()).visit(t)
The data structures generated by Gom do provide support for the strategy language of Tom. We assume in this section the reader is familiar with the strategies support of Tom as described in chapter 12, and illustrated in chapter 7 of the tutorial.
The data structure generated by the Gom compiler provide support for the sl library implementing strategies for Tom. It is thus possible without any further manipulation to use the strategy language with Gom data structures.
This strategy support is extended by the Gom generator by providing congruence and construction elementary strategies for all operators of the data structure. Those strategies are made available through a Tom mapping _<module>.tom generated during Gom compilation.
The congruence strategies are generated for each operator in the Gom module. For a module containing a sort
Term = | a() | f(lt:Term,rt:Term)
congruence strategies are generated for both a and f operators, respectively called _a and _f. The semantics of those elementary strategies is as follows:
(_a)[t] | ⇒ | t if t equals a, failure otherwise |
(_f(s1,s2))[t] | ⇒ | f(t1’,t2’) if t = f(t1,t2), with (s1)[t1] ⇒t1’ and (s2)[t2] ⇒t2’, |
failure otherwise |
Thus, congruence strategies allows to discriminate terms based on how they are built, and to develop strategies adopting different behaviors depending on the shape of the term they are applied to.
Congruence strategies are commonly used to implement a specific behavior depending on the context (thus, it behaves like a complement to pattern matching). For instance, to print all first children of an operator f, it is possible to use a generic Print() strategy under a congruence operator.
Strategy specPrint = `TopDown(_f(Print(),Identity()));
Also, congruence strategies are used to implement map like strategies on tree structures:
Strategy map(Strategy arg) { return `mu(MuVar("x"), Choice(_Empty(),_Cons(arg,MuVar("x"))) ); }
The congruence strategy generated for variadic operators is similar to the map strategy, and will apply its argument to all subterms of a variadic operator.
Gom generated strategies’ purpose is to allow to build new terms for the signature at the strategy level. Those strategies do not use the terms they are applied to, and simply create a new term. Their semantics is as follows:
(Make_a)[t] | ⇒ | a |
(Make_f(s1,s2))[t] | ⇒ | f(t1,t2) if (s1)[null] ⇒t1 and (s2)[null] ⇒t2, |
failure otherwise |
We can note that as the sub-strategies for Make_f are applied to the null term, it is required that those strategies are themselves construction strategies and do not examine their argument.
These construction strategies, combined with congruence strategies can be used to implement rewrite rules as strategies. For instance, a rule f(a,b) → g(a,b) can be implemented by the strategy:
Strategy rule = `Sequence( _f(_a(),_b()), _Make_g(_Make_a(),_Make_b()) );