Chapter 10 Tom

10.1 Notations

The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (‘like this’). Non-terminal symbols are set in italic font (like that). Square brackets [...] denote optional components. Parentheses with a trailing star sign (...)* denotes zero, one or several repetitions of the enclosed components. Parentheses with a trailing plus sign (...)+ denote one or several repetitions of the enclosed components. Parentheses (...) denote grouping.

10.1.1 Lexical conventions

Identifier	::=	Letter ( Letter ∣ Digit ∣ ‘`_`’ ∣ ‘`-`’ )*
Integer	::=	(Digit )+
Double	::=	(Digit)+ [‘`.`’] (Digit)* ∣ ‘`.`’ (Digit)+
String	::=	‘`"`’ (Letter ∣ (‘`\`’ (‘`n`’ ∣ ‘`t`’ ∣ ‘`b`’ ∣ ‘`r`’ ∣ ‘`f`’ ∣ ‘`\`’ ∣ ‘`’`’ ∣ ‘`"`’) ) )* ‘`"`’
Letter	::=	‘`A`’ ... ‘`Z`’ ∣ ‘`a`’ ... ‘`z`’
Digit	::=	‘`0`’ ... ‘`9`’
Char	::=	‘`’`’ (Letter ∣Digit) ‘`’`’

10.1.2 Names

SubjectName	::=	Identifier
Type	::=	Identifier
SlotName	::=	Identifier
HeadSymbol	::=	Identifier
	∣	Integer
	∣	Double
	∣	String
	∣	Char
VariableName	::=	Identifier
AnnotedName	::=	Identifier
LabelName	::=	Identifier
FileName	::=	Identifier
AttributeName	::=	Identifier
XMLName	::=	Identifier
Name	::=	Identifier

10.2 Tom constructs

A Tom program is a host language program (namely C, Java, or Caml) extended by several new constructs such as %match, %strategy, %include, %gom, or backquote. Tom is a multi-language compiler, and therefore its syntax depends on the host language syntax. But for simplicity, we only present the syntax of its constructs and explain how they can be integrated into the host language.

Using Java as a host-language, the following Tom program is correct:

public class HelloWorld {
  %include { string.tom }

  public final static void main(String[] args) {
    String who = "World";
    %match(who) {
      "World" -> { System.out.println("Hello " + who); }
      _       -> { System.out.println("Don't panic"); }
    }
  }
}

10.2.1 Tom program

A Tom program is a list of blocks, where each block is either a Tom construct, or a sequence of characters (host language code). When compiling a Tom program, the Tom constructs are transformed into host language code, and the result is a valid host language program. For instance, in the previous example, %include and %match constructs are replaced by function definitions and Java instructions, making the resulting program a correct Java program.

Syntax of a Tom program:

Tom	::=	BlockList
BlockList	::=
	(	MatchConstruct
	∣	StrategyConstruct
	∣	BackQuoteTerm
	∣	IncludeConstruct
	∣	GomConstruct
	∣	TypeTerm
	∣	Operator
	∣	OperatorList
	∣	OperatorArray
	∣	‘`{`’ BlockList ‘`}`’
	)*

MatchConstruct is translated into a list of instructions. This construct may appear anywhere a list of instructions is valid in the host language.
StrategyConstruct is translated into a class definition. Since it is translated into a class, this construct is valid only for Java.
BackQuoteTerm is translated into a function call.
IncludeConstruct is replaced by the content of the file referenced by the construct. Tom looks for include files in: ./packageName/, $TOM_HOME/share/jtom/ and <path>, where <path> is specified by option: --import <path>. If the file contains some Tom constructs, they are expanded.
GomConstruct allows to define a Gom grammar. This construct is replaced by the content of the generated mapping. See Section 10.3.4 and Chapter 11 for more details.
TypeTerm, as well as Operator, OperatorList, and OperatorArray are replaced by some functions definitions.

10.2.2 Match construct

The %match construct (MatchConstruct) is one of the main contributions of Tom. This construct can be seen as an extension of the SwitchCase construct in C or Java, except that patterns are no longer restricted to constants (chars or integers). Given an object (the subject) and a list of patterns, our goal is to find the first pattern that matches the subjects (i.e. that has a compatible shape). More formally, a pattern is a term built over variables and constructors. The latter ones describe the shape of the pattern, whereas the variables are holes that can be instantiated to capture a value. When we consider the term f(a(),g(b())), it has to be viewed as a tree based data-structure with f as a root and a() the first child. Similarly, b() is the unique child of g, which is the second child of the root f . We say that the pattern f(x,y) matches this term (called subject), because we can give values to x and y such that the pattern and the subject become equal: we just have to assign a() to x and g(b()) to y. Finding this assignment is called matching and instantiating. This is exactly what Tom is supposed to do. A pattern may of course contain subterms. Therefore, f(x,g(b())) or f(a(),g(y)) are valid patterns which match against the subject.

Assuming that s is a Java variable, referencing a term (the tree based object f(a(),g(b())) for example), the following Tom construct is valid:

%match(s) {
  f(a(),g(y)) -> { /* action 1: code that uses y */ }
  f(x,g(b())) -> { /* action 2: code that uses x */ }
  f(x,y)      -> { /* action 3: code that uses x and y */ }
}

The %match construct is defined as follows:

MatchConstruct	::=	‘`%match`’ ‘`(`’ MatchArguments ‘`)`’ ‘`{`’ ( PatternAction )* ‘`}`’
	∣	‘`%match`’ ‘`{`’ ( ConstraintAction )* ‘`}`’
MatchArguments	::=	[Type] Term ( ‘`,`’ [Type] Term )*
PatternAction	::=	[LabelName‘`:`’] PatternList ‘`->`’ ‘`{`’ BlockList ‘`}`’
PatternList	::=	Pattern( ‘`,`’ Pattern )* [ (‘`&&`’ ∣‘`\|\|`’) Constraint ]
ConstraintAction	::=	Constraint ‘`->`’ ‘`{`’ BlockList ‘`}`’
Constraint	::=	Pattern ‘`<<`’ [Type] Term
	∣	Constraint ‘`&&`’ Constraint
	∣	Constraint ‘`\|\|`’ Constraint
	∣	‘`(`’ Constraint ‘`)`’
	∣	Term Operator Term
Operator	::=	‘`>`’ ∣‘`>=`’ ∣‘`<`’ ∣‘`<=`’ ∣‘`==`’ ∣‘`!=`’

A MatchConstruct is composed of two parts:

a list of subjects (the arguments of %match),
a list of PatternAction: this is a list of pairs (pattern,action), where an action is a set of host language instructions which is executed each time a pattern matches the subjects.

Since version 2.6 of the language, an additional syntax, based on constraints, is proposed. In this case, the MatchConstruct can be simply seen as a list of pairs (constraint,action) corresponding to ConstraintAction from the above syntax. For instance, the example we presented before can now be written:

%match {
  f(a(),g(y)) << s -> { /* action 1 : code that uses y */ }
  f(x,g(b())) << s -> { /* action 2 : code that uses x */ }
  f(x,y) << s      -> { /* action 3 : code that uses x and y */ }
}

The two versions are semantically equivalent. The expression p << s denotes a match constraint between the pattern p and the subject s. The advantage of this new syntax is the modularity that it offers. For instance, multiple constraints can be combined with the boolean connectors ‘&&’ and ‘||’, leading to a greater expressivity, sometimes closed to the one offered by regular expressions.

The two syntaxes are compatible. For instance the following construct is valid:

%match(s) {
  f(a(),g(y)) && ( y << a() || y << b() ) -> { /* action 1 : code that uses y */ }
  f(x,g(b())) -> { /* action 2 : code that uses x */ }  
}

Note: The right-hand sides of the constraints may contain variables that can be found in the left-hand side of other constraints. For instance we may write:

%match(s) {
  f(a(),g(y)) && ( g(z) << y || f(z,_) << y ) -> { /* action 1 : code that uses y and z */ }
  f(x,g(b())) && f(y,a()) << someHostFunction(x) -> { /* action 2 : code that uses x and y */ }
}

where someHostFunction can be an arbitrary function from the host language.

Note: Constraints built using an operator are trivially translated into host code. For instance, the constraint x == g(a()) ... is translated into the Java code

if (x == g(a())) {
  ...
}

For expository reasons, we consider that a %match construct is evaluated in the following way:

given a list of subjects (they correspond to objects only composed of constructors, therefore without variables, usually called ground terms), the execution control is transferred to the first PatternAction whose patterns match the list of ground terms (in the case constraints are used, the execution control is transferred to the first ConstraintAction whose Constraint is evaluated to true).
Note: since version 2.4, a subject is no longer restricted to a host language variable. A Tom term built upon variables, constructors and host-language functions can be used. Note also that the sort (the type of the subjects) is now optional — it is inferred from the type of the patterns.
given such a PatternAction (respectively ConstraintAction), its variables are instantiated and the associated semantic action is executed. The instantiated variables are bound in the underlying host-language, and thus can be used in the action part.
if the execution control is transferred outside the %match instruction (by a goto, break or return for example), the matching process is finished. Otherwise, the execution control is transferred to the next PatternAction whose patterns match the list of ground terms (respectively to the next ConstraintAction whose Constraint evaluates to true).
when there is no more PatternAction whose patterns match the list of subjects (respectively no more ConstraintAction whose Constraint evaluates to true), the %match instruction is finished, and the execution control is transferred to the next instruction.

The semantics of a match construct may remind the switch/case construct. However, there is a big difference. For instance, in Tom we can have patterns (list-patterns) that can match in several ways a subject. Informally, when considering the subject conc(a(),b(),c()), and the pattern conc(_^*,x,_^*), there are three possible match for this problem: either x=a(), either x=b(), either x=c(). Note that _^* is a special hole which can capture any sublist of conc(…). The list-matching is also known as associative matching. Besides this, some other matching theories are supported in Tom.

When taking this new possibility into account, the evaluation of a %match construct gets a little bit more complex:

given a PatternAction whose patterns match the list of ground terms (or a ConstraintAction whose Constraint is evaluated to true), the list of variables is instantiated and the associated semantic action is executed.
if the execution control is not transferred outside the %match instruction, in addition to the previous explanations, if the considered matching theory may return several matches, for each match, the free variables are instantiated and the associated semantic action is executed. This means that the same action may be executed several times, but in a different context: i.e. the variables have different instantiations.
when all matches have been computed (there is at most one match in the syntactic theory, i.e. when no special theory, as associativity for instance, is associated to the symbols used in the patterns), the execution control is transferred to the next PatternAction whose patterns match the list of ground terms (respectively to the next ConstraintAction whose Constraint evaluates to true).
as before, when there is no more PatternAction whose patterns match the list of subject (or respectively no ConstraintAction whose Constraint evaluates to true), the %match instruction is finished, and the execution control is transferred to the next instruction.

As mentioned in the BNF-syntax, a label may be attached to a pattern. In that case, in C and Java, it becomes possible to exit from the current PatternAction (using a goto or a break), without exiting from the whole %match construct. The control is transferred to the next pattern which matches the subjects (respectively to the next constraint). This feature is useful to exit from a complex associative-matching for which, for any reason, we are no longer interested in getting other solutions.

Note: the behavior is not determined if inside an action the subjects of the %match instruction under evaluation are modified.

When using the new syntax based on constraints, there are several restrictions that are imposed by the compiler (an error is generated if they are not respected):

no circular references among variables are allowed. For instance, something like x << x will generate an error. This verification works even when the cycles are less evident, like for instance for the following constraint: f(g(x),a()) << y && f(b(),y) << z && g(f(z,c())) << x.
when using disjunctions, all the variables that are used in the action have to be found in each member of the disjunction (for ensuring that no non-instantiated variable is used in the action). For instance, using the following ConstraintAction generates an error: x << s1 || y << s2 -> { /* code that uses y */ }

10.2.3 Tom pattern

As we can imagine, the behavior of a %match construct strongly depends on the patterns which are involved. The formalism which defines the syntax of a pattern is also an essential component of Tom. But because there exist several ways to define patterns with equivalent behavior, its formal definition is not so simple. Although, the different shortcuts help the programmer to simplify the definitions of patterns.

Informally, a pattern is a term built with constructors and variables (please note that x denotes a variable, whereas a() is a constructor). A variable can also be anonymous, and it is denoted by _. Let’s look at some examples of patterns: x, a(), f(a()), g(a(),x), or h(a(),_,x). When a pattern matches a subject, it may be useful to keep a reference to a matched subterm. The annotation mechanism (z@g(y) for example) can be used for this purpose. Thus, considering the matching between the pattern f(x,z@g(y)) and the subject f(a(),g(h(b()))), y is instantiated by h(b()), and z is instantiated by g(h(b())). This can be useful in C, to free the memory for example.

When identical actions have to be performed for a set of patterns which share a common structure, the disjunction of symbols may be used: pattern (f∣g)(a()) is equivalent to the set {f(a()), g(a())}. The disjunction of symbols may also be used in subterms, like in h( (f∣g)(x) ).

More formally, a Tom pattern and a Tom term has the following syntax:

Term	::=	VariableName[‘`*`’]
	∣	Name ‘`(`’[Term ( ‘`,`’ Term )*]‘`)`’
Pattern	::=	[ AnnotedName ‘`@`’ ] PlainPattern
PlainPattern	::=	[‘`!`’]VariableName [ ‘`*`’ ]
	∣	[‘`!`’] HeadSymbolList (ExplicitTermList ∣ ImplicitPairList)
	∣	ExplicitTermList
	∣	‘`_`’
	∣	‘`_*`’
	∣	XMLTerm
HeadSymbolList	::=	HeadSymbol [ ‘`?`’ ]
	∣	‘`(`’ HeadSymbol ( ‘`∣`’ HeadSymbol )+ ‘`)`’
ExplicitTermList	::=	‘`(`’ [ Pattern ( ‘`,`’ Pattern )*] ‘`)`’
ImplicitPairList	::=	‘`[`’ [ PairPattern ( ‘`,`’ PairPattern )*] ‘`]`’
PairPattern	::=	SlotName ‘`=`’ Pattern

Concerning the Term syntax, both Tom and host-language variables can be used and Name can be either a function symbol declared in Tom or the name of a method from the host language.

Note: since version 2.4, negative patterns, called anti-patterns have been introduced. See section 10.2.4 for a detailed description.

A pattern is a term which could contain variables. When matching a pattern against a subject (a ground term), these variables are instantiated by the matching procedure (generated by Tom). In Tom, the variables do not have to be declared: their type is inferred automatically, depending on the context in which they appear.

As described previously, Tom offers several mechanisms to simplify the definition of a pattern:

standard notation: a pattern can be defined using a classical prefix term notation. To make a distinction between variables and constants, the latter have to be written with explicit parentheses, like x(). In this case, the corresponding Tom operator (%op x()) should have been declared. When omitting parentheses, like x, this denotes a variable.
unnamed variables: the _ notation denotes an anonymous variable. It can be used everywhere a variable name can be used. It is useful when the instance of the variable does not need to be used. Similarly, the _* notation can be used to denote an anonymous list-variable. This last notation can improve the efficiency of list-matching because the instances of anonymous list-variables do not need to be built.
annotated variable: the @ operator allows to give a variable name to a subterm. In f(x@g(_)) for example, x is a variable that will be instantiated by the instance of the subterm g(_). The variable x can then be used as any other variable.
implicit notation: as explained below, the %op operator forces to give name to arguments. Assuming that the operator f has two arguments, named arg1 and arg2, then we can write the pattern f[arg1=a()] which is equivalent to f(a(),_). This notation is interesting mostly when using constructors with many subterms. Besides that, using this notation can avoid changing the patterns when the signature slightly changes (the order of the arguments, adding a new argument etc).
unnamed list operator: it is often the case that given a list-sort, only one list-operator is defined. In this case, when there is no ambiguity, the name of the operator can be omitted. Considering the conc list-operator for example (see %oplist and %oparray below), to improve the readability, the pattern conc(_*,x,_*) can be written (_*,x,_*). This feature is particularly useful in the XML notation introduced in the following.
symbol disjunction notation: to factorize the definition of pattern which have common subterms, it is possible to describe a family of patterns using a disjunction of symbols. The pattern (f∣g)(a(),b()) corresponds to the disjunction f(a(),b()) or g(a(),b()). To be allowed in a disjunction (in standard notation), the constructors should have the same signature (arity, domain and codomain).
In practice, it is usually better to use the disjunction notation with the implicit notation Tom offers: ((f∣g)[arg1=a()]). In that case, the signatures of symbols do not have to be identical: only involved slots have to be common (same names and types). Thus, the pattern (f∣g)[arg1=a()] is correct, even if g has more slots than f: it only has to have the slot arg1, with the same sort.
Note: the disjunction of symbol can also be used in Xml notation: <(A∣B)>...</(A∣B)>.

The use of ‘?’ after a list operator enables real associative with neutral elements (AU) matchings. By using conc? for example, we simply specify that the subject may not start with a conc symbol. For instance, the following code would produce the output matched, because x and respectively y can be instantiated with the neutral element of conc (please see the documentation on Gom for further details about specifying symbols’ type and their neutral elements).

 L l = `a();
 %match(l) {
   conc?(x*,y*) -> { System.out.println("matched"); }
 }

Note that without the use of ‘?’, the subject must start with a conc in order to have a match.

Note: since version 2.5 we can use, inside a list, a subpattern that also denotes a list; moreover, this subpattern can be annotated. For instance, patterns like conc(_*,p@conc(a(),_*,b()),_*) are now valid.

10.2.4 Tom anti-pattern

The notion of anti-pattern offers more expressive power by allowing complement constructs: a pattern can describe what should not be in the matched subject.

The notion of complement is introduced by the symbol ‘!’, as illustrated by the following grammar fragment:

PlainPattern	::=	[‘`!`’] VariableName
	∣	[‘`!`’] HeadSymbolList ( ExplicitTermList ∣ ImplicitPairList )
	∣	...

The semantics of anti-patterns can be best understood when regarding them as complements. For example, a pattern like car[color=blue()] will match all the blue cars. If we add an ‘!’ symbol, the anti-pattern !car[color=blue()] will match everything that is not a blue car, i.e all objects that are not cars or the cars that have a color different from blue. The grammar allows also car[color=!blue()] - matches all cars that are not blue, or !car[color=!blue()] - either everything that is not a car, or a blue car.

Using the non-linearity combined with anti-patterns allows to express interesting searches also. For example, car[interiorColor=x,exteriorColor=x] will match the cars that have the same interior and exterior color. By using the anti-pattern car[interiorColor=x,exteriorColor=!x], the result is as one would expect: it will match all the cars with different interior - exterior colors.

It is also possible to use the anti-patterns in list constructions. Please refer to the tutorial for more examples.

Note: The use of annotations (@) is forbidden bellow a ‘!’ symbol and is prevented by a compilation error. This is due to the fact that what is under this symbol generally will not match, therefore it cannot be used in the action part of the rules. You will also get an error if you try to put ‘!’ before an ‘_’ or ‘_*’ because a construct like ‘!_’ in a pattern will never match anything, therefore it is useless.

10.2.5 Backquote construct

Backquote construct (‘) can be used to build an algebraic term or to retrieve the value of a Tom variable (a variable instantiated by pattern-matching).

BackQuoteTerm ::= [‘‘’] CompositeTerm

The syntax of CompositeTerm is not fixed since it depends on the underlying language.

However, CompositeTerm should be of the following form:

Name: to denote a Tom variable
Name‘*’: to denote a Tom list-variable
Name( ... ): to build a prefix term
( ... ): to build an expression
xml( ... ): to build an Xml term

In general, it is sufficient to add a backquote before the term you want to build to have the desired behavior. The execution of ‘f(g(a())) will build the term f(g(a)), assuming that f, g, and a are Tom operators. Suppose now that g is no longer a constructor but a function of the host-language. The construction ‘f(g(a())) is still valid, but the semantics is the following: the constant a is built, then the function g is called, and the returned result is put under the constructor f. Therefore, the result of g must be a correct term, which belongs to the right type (i.e. the domain of f).

To simplify the interaction with the host-language, it is also possible to use “unknown symbols” like f(x.g()) or f(1+x). The scope of the backquote construct is determined by the scope of the most external enclosing braces, except in two case: ‘x and ‘x* which allow you to use variables instantiated by the pattern part. In that case the scope is limited to the length of the variable name, eventually extended by the ‘’*’’. Sometimes, when writing complex expression like if(‘x==‘y || ‘x==‘z), it can be useful to introduce extra braces (if( ‘(x==y || x==z) )) in order to extend the scope of the backquote.

Note: since version 2.5, type inference is available when building lists. For instance, when building a list, the following code is now valid: ‘conc(f(…)) if the type of f is known by Tom — if previously declared using a %op. Before version 2.5, supposing that the type of f was L, one had to write: L X = ‘f(…);‘conc(X*).

10.2.6 Meta-quote construct

Tom provides the construct ‘%[’ … ‘]%’ that allows to build formatted strings without the need to encode special characters such as tabulations and carriage returns as it is usually done in Java. For example, to build a string containing the HelloWorld program, one can simply write:

String hello = %[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld !");
    }
  }
]%

Additionally, it is possible to insert in this string the content of a String variable, or the result of a function call (if it is a String) using the ‘@’ as escape character: the code contained between ‘@’ in this construct will be evaluated and the result inserted in the surrounding formatted string. Then, to add to the HelloWorld example a version string, we can use:

String version = "v12";
String hello2=%[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld   @version@");
    }
  }
]%;

Even if the contents of the meta-quote construct is a formatted string, it is required that this string contains correctly balanced braces.

Note: the expression between two ‘@’ can contain Tom constructs, like backquote constructs for example.

Note: use ‘@@’ to insert the character ‘@’

10.3 Tom signature constructs (*)

10.3.1 Sort definition constructs

To define the mapping between the algebraic constructors and their concrete implementation, Tom provides a signature-mapping mechanism composed of several constructs. In addition to predefined mapping for usual builtin sorts (int, long, double, boolean, string, and char), all other algebraic sorts have to be declared using the %typeterm construct.

To use predefined sorts (in a %match construct or in the definition of a new operator), it is sufficient to use the %include construct (%include { int.tom } for example).

When defining a new type with the %typeterm construct, some information has to be provided:

the implement construct describes how the new type is implemented. The host language part written between braces (’{’ and ’}’) is never parsed. It is used by the compiler to declare some functions and variables.
the is_sort(t) construct specifies how to check the sort of an object of this type (in Java this is usually done with instanceof). It is only required when using this type in a %match construct.
the equals(t1,t2) construct corresponds to a predicate (parameterized by two term variables). This predicate should return true if the terms are “equal”. The true value should correspond to the builtin true value of the considered host language. This last optional predicate is used to compare builtin values and to compile non-linear left-hand sides.

Given a Java class Person we can define an algebraic mapping for this class:

 %typeterm TomPerson {
   implement { Person }
   is_sort(t) { t instanceof Person }
   equals(t1,t2) { t1.equals(t2) }
 }

Here, we assume that the method equals implements a comparison function over instances of Person. Note that we used TomPerson to make a clear distinction between algebraic sorts (defined in Tom) and implementation sorts (defined in Java, via the use of classes). In practice, we usually use the same name to denote both the algebraic sort and the implementation sort.

The grammar is the following:

IncludeConstruct	::=	‘`%include`’ ‘`{`’ FileName ‘`}`’
GomConstruct	::=	‘`%gom`’ [‘`(`’ optionString ‘`)`’] ‘`{`’ GomGrammar ‘`}`’
GoalLanguageBlock	::=	‘`{`’ BlockList ‘`}`’
TypeTerm	::=	‘`%typeterm`’ Type ‘`{`’
		KeywordImplement [KeywordIsSort] [KeywordEquals]
		‘`}`’
KeywordImplement	::=	‘`implement`’ GoalLanguageBlock
KeywordIsSort	::=	‘`is_sort`’ GoalLanguageSortCheck
KeywordEquals	::=	‘`equals`’ ‘`(`’ Name ‘`,`’ Name ‘`)`’ GoalLanguageBlock

Note: since version 2.6, the variables used in the host-code may be prefixed by a ‘$’ sign. This allows the compiler to inline the definition, making the code often smaller and more efficient.

10.3.2 Constructor definition constructs

Once algebraic sorts are declared (using %typeterm), Tom provides a mechanism to define signatures for constructors of these sorts using %op, %oplist or %oparray constructs. When defining a new symbol with the %op construct, the user should specify the name of the operator, its codomain, and its domain. The later one is defined by a list of pairs (slot-name, sort).

Let us consider again the class Person, and let us suppose that an instance of Person has two fields (name and age), we can define the following operator:

%op TomPerson person(name:String, age:int)

In this example, the algebraic operator person has two slots (name and age) respectively of sorts String and int, where String and int are pre-defined sorts.

In addition to the signature of an operator, several auxiliary functions have to be defined:

The is_fsym(t) { predicate(t) } construct is used to check if a term t is rooted by the considered symbol. The true value should correspond to the builtin true value of the considered host language (true in Java or Caml, and something different from 0 in C for example).
The make(t1,...,tn) construct is parameterized by several variables (i.e. that should correspond to the arity of the symbol). A call to this make function should return a term rooted by the considered symbol, where each subterm correspond to the terms given in arguments to the function. When defining a constant (i.e. an operator without argument, make can be defined without braces: make { ... }).
The get_slot(slotName,t) construct has to be defined for all slots of the signature. The implementation of these constructs should be such that the corresponding subterm is returned.

Coming back to our example, checking if an object t is rooted by the symbol person can be done by checking that t is an instance of the class Person. Building a person can be done via the Java function new Person(...). Accessing to the slots name and age could be implemented by an access to the variables of the class Person. In practice, the following operator definition should work fine:

%op TomPerson person(name:String, age:int) {
  is_fsym(t)  { t instanceof Person }
  make(t1,t2) { new Person(t1,t2) }
  get_slot(name,t) { t.name } // assuming that 'name' is public
  get_slot(age,t)  { t.age  } // assuming that 'age' is public
}

When defining a new symbol with the %oplist construct, the user has to specify how the symbol is implemented. In addition, the user has to specify how a list can be built and accessed:

the make_empty() construct should return an empty list.
the make_insert(e,l) construct corresponds to a function parameterized by a list variable and a term variable. This function should return a new list l’ where the element e has been inserted at the head of the list l (i.e. equals(get_head(l’),e) and equals(get_tail(l’),l) should be true).
the get_head(l) function is parameterized by a list variable and should return the first element of the considered list.
the get_tail(l) function is parameterized by a list variable and should return the tail of the considered list.
the is_empty(l) constructs corresponds to a predicate parameterized by a list variable. This predicate should return true if the considered list contains no element.

Similarly, when defining a new symbol with the %oparray construct, the user has to specify how the symbol is implemented, how an array can be built, and accessed:

the make_empty(n) construct should return a list such that n successive make_append(e,l) can be made.
the make_append(e,l) construct corresponds to a function parameterized by a list variable and a term variable.
Warning: This function should return a list l’ such that the element e is at the end of l.
the get_element(l,n) construct is parameterized by a list variable and an integer. This should correspond to a function that return the n-th element of the considered list l.
the get_size(l) constructs corresponds to a function that returns the size of the considered list (i.e. the number of elements of the list). The size of an empty list is 0.

The %oplist or %oparray is complex but not difficult to use. Let us consider the ArrayList Java class, and let us define a Tom mapping over this data-structure. The first thing to do consists in defining the sort for the elements and the sort for the list-structure:

  %typeterm Object {
    implement     { Object        }
    equals(l1,l2) { l1.equals(l2) }
  }
  %typeterm TomList {
    implement     { ArrayList    }
    equals(l1,l2) { l1.equals(l2) }
  }

Once defined the sorts, it becomes possible to define the list-operator TomList conc( Object* ). This operator has a variadic arity: it takes several Object and returns a TomList.

  %oparray TomList conc( Object* ) {
    is_fsym(t)       { t instanceof ArrayList }
    make_empty(n)    { new ArrayList(n)       }
    make_append(e,l) { myAdd(e,(ArrayList)l)   }
    get_element(l,n) { (Object)l.get(n)        }
    get_size(l)      { l.size()                }
  }

  private static ArrayList myAdd(Object e,ArrayList l) {
    l.add(e);
    return l;
  }

An auxiliary function myAdd is used since the make_append construct should return a new list. The get_element should return an element whose sort belongs to the domain (Object) in this example. Although not needed in this example, in general, a cast ((Object)l.get(n)) is needed.

The grammar for the mapping constructs is the following:

Operator	::=	‘`%op`’ Type Name
		‘`(`’ [ SlotName ‘`:`’ Type ( ‘`,`’ SlotName ‘`:`’ Type )* ] ‘`)`’
		‘`{`’ KeywordIsFsym ( KeywordMake ∣ KeywordGetSlot )* ‘`}`’
OperatorList	::=	‘`%oplist`’ Type Name ‘`(`’ Type ‘`*`’ ‘`)`’
		‘`{`’ KeywordIsFsym ( KeywordMakeEmptyList
		∣ KeywordMakeInsert ∣ KeywordGetHead
		∣ KeywordGetTail ∣ KeywordIsEmpty )* ‘`}`’
OperatorArray	::=	‘`%oparray`’ Type Name ‘`(`’ Type ‘`*`’ ‘`)`’
		‘`{`’ KeywordIsFsym ( KeywordMakeEmptyArray ∣
		KeywordMakeAppend ∣ KeywordElement
		∣ KeywordGetSize )* ‘`}`’
KeywordIsFsym	::=	‘`is_fsym`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordGetSlot	::=	‘`get_slot`’ ‘`(`’ Name ‘`,`’ Name ‘`)`’ GoalLanguageBlock
KeywordMake	::=	‘`make`’ [ ‘`(`’ Name ( ‘`,`’ Name )* ‘`)`’ ] GoalLanguageBlock
KeywordGetHead	::=	‘`get_head`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordGetTail	::=	‘`get_tail`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordIsEmpty	::=	‘`is_empty`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordMakeEmptyList	::=	‘`make_empty`’ [ ‘`(`’ ‘`)`’ ] GoalLanguageBlock
KeywordMakeInsert	::=	‘`make_insert`’ ‘`(`’ Name ‘`,`’ Name ‘`)`’ GoalLanguageBlock
KeywordGetElement	::=	‘`get_element`’ ‘`(`’ Name ‘`,`’ Name ‘`)`’ GoalLanguageBlock
KeywordGetSize	::=	‘`get_size`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordMakeEmptyArray	::=	‘`make_empty`’ ‘`(`’ Name ‘`)`’ GoalLanguageBlock
KeywordMakeAppend	::=	‘`make_append`’ ‘`(`’ Name ‘`,`’ Name ‘`)`’ GoalLanguageBlock

Note: since version 2.5, is_fsym and make functions are optional (therefore the %op construct could be empty). When not declared, is_fsym default value is false. When nothing is declared, there is a make default value: a call to the function which has the same name as the considered operator. This shortcut is useful to provide type information about a Java function for example.

10.3.3 Predefined sorts and operators

See Section 13.1.

10.3.4 Gom construct

The grammar for the Gom construct is as follows:

GomConstruct ::= ‘%gom’ [‘(’ optionString ‘)’] ‘{’ GomGrammar ‘}’

It allows to define a Gom signature (for more details about Gom see Chapter 11). The Gom compiler is called on the GomGrammar, and the construct is replaced by the produced Tom mapping. The optionString is composed by a list of command line options to pass to the underlying Gom compiler, as described in section 16.1.

10.4 XML pattern

To deal with XML documents, the XML notation can be used (<A></A> for example).

When manipulating XML documents, we distinguish two main kinds of operations: retrieving information and transforming a document. Tom provides three different XML notations that ought to simplify the definition of patterns: the “standard ” and the “implicit” XML notations are used to define compact (but incomplete) patterns. This notation is well suited to retrieve information. The “explicit” XML notation is used to precisely describe an XML pattern and all the variables that have to be instantiated. This notation is particularly well suited to perform XML transformation since it allows the programmer to precisely describe how variables have to be instantiated.

To make the XML notation understandable, we have to explain how XML documents are handled by Tom. To each XML document corresponds a DOM (Document Object Model) representation. In Tom, we have defined a mapping from DOM sorts to abstract algebraic sorts: TNode and TNodeList, which correspond respectively to Node and NodeList, defined by the Java DOM implementation.

Thus, a Node object becomes a ternary operator Element whose first subterm is the name of the XML node, the second subterm is a list of attributes and the third subterm is a list of subterms (which correspond to XML sub-elements). The second and the third elements are terms of sort TNodeList (because they are implemented by NodeList objects in DOM).

Thus, when considering the <A></A> XML document, the corresponding algebraic term is Element("A",[],[]), where [] denotes the empty list. Similarly, <A></A> is encoded into Element("A",[],[Element("B",[Attribute("attribute","name")],[])]).

When defining an XML pattern, the user has to introduce extra list-variables to precisely describe the XML pattern and capture the different contexts. Suppose that we are interested in finding a node which is a subterm of a node <A></A> (but not necessary the first subterm). The algebraic pattern should be Element("A",[_*],[_*,Element("B",[_*],[_*]),_*]). Using the XML notation, this pattern should be expressed as follows: <A(_*)>(_*,<B(_*)>(_*),_*)</A>. This notation (called explicit) is precise but error prone. This is why we have introduced the explicit notation, where all context variable can be removed (and () are replaced by []): <A[]><B[]>[]</A>. The last notation (called standard XML notation) allows the user to remove the [] and replace the list-separator (,) by spaces. The previous pattern can be written: <A></A>.

These three different notations allow the user to choose the level of control he wants to have on the XML pattern matching algorithm.

The formal description of the syntax is the following:

XMLTerm	::=	‘`<`’ XMLNameList XMLAttrList ‘`/>`’
	∣	‘`<`’ XMLNameList XMLAttrList ‘`>`’ XMLChilds ‘`</`’ XMLNameList ‘`>`’
	∣	‘`#TEXT`’ ‘`(`’ Identifier ∣ String ‘`)`’
	∣	‘`#COMMENT`’ ‘`(`’ Identifier ∣ String ‘`)`’
	∣	‘`#PROCESSING-INSTRUCTION`’
		‘`(`’ (Identifier ∣ String) ‘`,`’ (Identifier ∣ String) ‘`)`’
XMLNameList	::=	XMLName
	∣	‘`(`’ XMLName ( ‘`∣`’ XMLName )* ‘`)`’
XMLAttrList	::=	‘`[`’ [ XMLAttribute (‘`,`’ XMLAttribute)* ] ‘`]`’
	∣	‘`(`’ [ XMLAttribute (‘`,`’ XMLAttribute)* ] ‘`)`’
	∣	( XMLAttribute )*
XMLAttribute	::=	‘`_*`’
	∣	VariableName ‘`*`’
	∣	AttributeName ‘`=`’ [AnnotedName ‘`@`’] ( Identifier ∣ String )
	∣	[AnnotedName ‘`@`’] ‘`_`’ ‘`=`’ [AnnotedName ‘`@`’] ( Identifier ∣ String )
XMLChilds	::=	( Term )*
	∣	‘`[`’ Term ( ‘`,`’ Term )* ‘`]`’