Previous Up Next

Chapter 5  Tom

5.1  Notations

The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (`like this'). Non-terminal symbols are set in italic font (like that). Square brackets [...] denote optional components. Parentheses with a trailing star sign (...)* denotes zero, one or several repetitions of the enclosed components. Parentheses with a trailing plus sign (...)+ denote one or several repetitions of the enclosed components. Parentheses (...) denote grouping.

5.1.1  Lexical conventions

Identifier ::= Letter ( LetterDigit`_'`-' )*
Integer ::= (Digit )+
Double ::= (Digit)+ [`.'] (Digit)* ∣ `.' (Digit)+
String ::= `"' (Letter ∣ (`\' (`n'`t'`b'`r'`f'`\'`''`"') ) )* `"'
Letter ::= `A' ... `Z'`a' ... `z'
Digit ::= `0' ... `9'
Char ::= `'' (LetterDigit) `''


5.1.2  Names

SubjectName ::= Identifier
Type ::= Identifier
SlotName ::= Identifier
HeadSymbol ::= Identifier
  Integer
  Double
  String
  Char
VariableName ::= Identifier
AnnotedName ::= Identifier
LabelName ::= Identifier
FileName ::= Identifier
AttributeName ::= Identifier
XMLName ::= Identifier
Name ::= Identifier


5.2  Tom constructs

A Tom program is a host language program (namely C, Java, or Caml) extended by several new constructs such as %match, %strategy, %rule, %include, %gom, or backquote. Tom is a multi-languages compiler, so, its syntax depends on the host language syntax. But for simplicity, we only present the syntax of its constructs and explain how they can be integrated into the host language.

Using Java as host-language, the following Tom program is correct:

public class HelloWorld {
  %include { string.tom }

  public final static void main(String[] args) {
    String who = "World";
    %match(who) {
      "World" -> { System.out.println("Hello " + who); }
      _       -> { System.out.println("Don't panic"); }
    }
  }
}


5.2.1  Tom program

Basically, a Tom program is list of blocks, where each block is either a Tom construct, or a sequence of characters. The idea is that after transformation, the sequence of characters merged with the compiled Tom constructs should be a valid host language program. In the previous example, %include and %match constructs are replaced by function definitions and Java instructions, making the resulting program a correct Java program.

Syntax of a Tom program:

Tom ::= BlockList
BlockList ::=
  ( MatchConstruct
  StrategyConstruct
  RuleConstruct
  BackQuoteTerm
  IncludeConstruct
  GomConstruct
  TypeTerm
  Operator
  OperatorList
  OperatorArray
  `{' BlockList `}'
  )*

5.2.2  Match construct

The %match construct (MatchConstruct) is one of the main contributions of Tom. This construct can be seen as an extension of the SwitchCase construct in C or Java, except that patterns are no longer restricted to constants (chars or integers). Given an object (the subject) and a list of patterns, our goal is to find the first pattern that matches the subjects (i.e. that have a compatible shape). More formally, a pattern is a term built over variables and constructors. The latter ones describe the shape of the pattern, whereas the former ones are holes that can be instantiated to capture a value. When we consider the term f(a(),g(b())), this has to be viewed as a tree based data-structure where a() is the first child of the root, labeled by f. Similarly, b() is the unique child of g, which is the second child of the root. We say that the pattern f(x,y) matches this term (called subject), because we can give values to x and y such that the pattern and the subject become equal: we just have to assign a() to x and g(b()) to y. Finding this assignment is called matching and instantiating. This is exactly what Tom is supposed to do. A pattern may of course contains subterms. Therefore, f(x,g(b())) or f(a(),g(y)) are valid patterns which match against the subject.

Assuming that s is a Java variable, referencing a term (the tree based object f(a(),g(b())) for example), the following Tom construct is valid:
%match(s) {
  f(a(),g(y)) -> { /* code that uses y */ }
  f(x,g(b())) -> { /* code that uses x */ }
  f(x,y)      -> { /* code that uses x and y */ }
}


A MatchConstruct is composed of two parts: The construct is defined as follow:

MatchConstruct ::= `%match' `(' MatchArguments `)' `{' ( PatternAction )* `}'
MatchArguments ::= [Type] Term ( `,' [Type] Term )*
PatternAction ::= [LabelName`:'] PatternList ( `|' PatternList )* `->' `{' BlockList `}'
PatternList ::= Pattern ( `,' Pattern )*


For expository reasons, we consider that a %match construct is evaluated in the following way: The semantics of a match constructs may remind the switch/case construct, however, a big difference exists. In Tom, a theory like associativity may be attached to a constructor. Thus, given a subject and a pattern, there may exist several ways to match the pattern against the subject. Informally, when considering the subject conc(a(),g(b()),c(),g(d())), and the pattern conc(_*,g(x),_*), there are two possible match for this problem: either x=b, either x=d(). Note that _* is a special hole which can capture any sublist of conc(…).

When taking this new possibility into account, the evaluation of a %match construct is a bit more complex: As mentioned in the BNF-syntax, a label may be attached to a pattern. In that case, in C and Java, it becomes possible to exit from the current PatternAction (using a goto or a break), without exiting from the whole %match construct. The control is transferred to the next pattern which matches the subjects. This feature is useful to exit from a complex associative-matching pattern which, for any reason, is no longer interesting.

Note: the behavior is not determined if a semantic action modifies a host language variable which is an argument of a %match instruction under evaluation.


Note: the `'|'' symbol can be used to introduce disjunction of pattern. This is a shortcut which avoid duplicating a PatternAction. However, this construct is deprecated. We recommend to use the disjunction of symbols construct, presented in the next section.


5.2.3  Tom pattern

As we can imagine, the behavior of a %match construct strongly depends on the patterns which are involved. The formalism which defines the syntax of a pattern is also an essential component of Tom. Unfortunately, its formal definition is complex, simply because there exist several ways to define patterns which have equivalent behaviors. On the other hand, the different shortcuts may help the programmer to simplify the definitions of patterns.

Informally, a pattern is a term that can be either a variable or an anonymous variable (x or _ for example). A pattern can also be composed of constructors (a(), f(a()), g(a(),x), or h(a(),_,x) for example). When a pattern matches a subject, it may be useful to keep a reference to a matched subterm. The annotation mechanism (z@g(y) for example) can be used for this purpose. Thus, considering the pattern f(x,z@g(y)) and the subject f(a(),g(h(b()))), y is instantiated by h(b()), and z is instantiated by g(h(b())). This can be useful in C, to free the memory for example.

When identical actions have to be performed for a set of patterns which share a common structure, the disjunction of symbols may be used: pattern (fg)(a()) is equivalent to the set {f(a()), g(a())}. The disjunction of symbols may also be used in subterm, like in h( (fg)(x) ).

More formally, a Tom term has the following syntax:
Term ::= VariableName
  HeadSymbol [ `(' Term ( `,' Term )*`)' ]
Pattern ::= [ AnnotedName `@' ] PlainPattern
PlainPattern ::= VariableName [ `*' ]
  HeadSymbolList [ ExplicitTermListImplicitPairList ]
  ExplicitTermList
  `_'
  `_*'
  XMLTerm
HeadSymbolList ::= HeadSymbol
  `(' HeadSymbol ( `' HeadSymbol )+ `)'
ExplicitTermList ::= `(' Pattern ( `,' Pattern )* `)'
ImplicitPairList ::= `[' PairPattern ( `,' PairPattern )* `]'
PairPattern ::= `[' SlotName `=' Pattern `]'


Note: since version 2.4, negative patterns, called anti-patterns have been introduced. See section 5.2.4 for a detailed description.




A pattern is a term which could contain variables. When matching a pattern against a subject (a ground term), these variables are instantiated by the matching procedure (generated by Tom). In Tom, the variables do not have to be declared: their type is inferred automatically, depending on the context in which they appear.

As described previously, Tom offers several mechanisms to simplify the definition of a pattern:

5.2.4  Tom anti-pattern

Since version 2.4, the notion of anti-pattern has been introduced. This offer more expressive power by allowing complement constructs: a pattern can now describe what should not be in the matched subject.

Therefore, the grammar presented in section 5.2.3 is now extended with a new symbol: `!'. The new syntax is the following:

PlainPattern ::= (`!')* VariableName
  (`!')* HeadSymbolList [ ExplicitTermListImplicitPairList ]
  ...


The semantics of anti-patterns can be best understood when regarding them as complements. For example, a pattern like car[color=blue()] will match all the blue cars. If we add an `!' symbol, the anti-pattern !car[color=blue()] will match everything that is not a blue car, i.e all objects that are not cars or the cars that have a color different from blue. The grammar allows also car[color=!blue()] - matches all cars that are not blue, or !car[color=!blue()] - either everything that is not a car, or a blue car.

Using the non-linearity combined with anti-patterns allows to express interesting searches also. For example, car[interiorColor=x,exteriorColor=x] will match the cars that have the same interior and exterior color. By using the anti-pattern car[interiorColor=x,exteriorColor=!x], the result is as one would expect: it will match all the cars with different interior - exterior colors.

It is also possible to use the anti-patterns in list constructions. Please refer to the tutorial for more examples.

Note: The use of annotations (@) is forbidden bellow a `!' symbol and is prevented by a compilation error. This is due to the fact that what is under this symbol generally will not match, therefore it cannot be used in the action part of the rules. You will also get an error if you try to put `!' before an `_' or `_*' because a construct like `!_' in a pattern will never match anything, therefore it is useless.


5.2.5  Rule construct

In Tom, we can also define a set of rewrite rules. All the left-hand sides should begin with the same root symbol:

RuleConstruct ::= `%rule' `{' ( Rule )* `}'
Rule ::= RuleBody ( RuleCondition )*
RuleBody ::= Pattern `->' Term
RuleCondition ::= `where' Pattern `:=' Term
  `if' Term `==' Term


The %rule construct is composed of a list of conditional rewrite rules (the left-hand side is a term and the right-hand side is a term). All these rules (enclosed into a %rule { ... } construct) should begin with the same root symbol. The Tom compiler will generate a function (with a number of arguments equals to the arity of this root symbol) whose name corresponds to the name of this unique root symbol. Given a ground term, applying this function returns the instantiated and normalized right-hand side of the first rule (from top to bottom) whose pattern matches the considered subject and whose conditions are satisfied. When no rule can be applied (i.e. no pattern matches the subject, or no condition is satisfied), the given ground term, rooted by the root symbol of the rewrite system is returned.

In Tom, we consider two kinds of conditions:

Note: since the introduction of Gom, the %rule construct is a bit deprecated. We recommend to use hooks since they ensure that the reduction will always be performed, in particular during the application of a strategy.


5.2.6  Backquote construct

Another construct of Tom is the backquote (`).

This construct can be used to build an algebraic term or to retrieve the value of a Tom variable (a variable instantiated by pattern-matching). The syntax of this operator is not fixed since it depends on the underlying language.

However, a backquote term should be of the following form: In general, it is sufficient to add a backquote before the term you want to build to have the wanted behavior. The execution of `f(g(a())) will build the term f(g(a)), assuming that f, g, and a are Tom operators. Suppose now that g is no longer a constructor but a function of the host-language. The construction `f(g(a())) is still valid, but the semantic is the following: the constant a is built, then the function g is called, and the returned is put under the constructor f. Therefore, the result of g must be a correct term, which belongs to the right type (i.e. the domain of f).

To simplify the interaction with the host-language, it is also possible to use “unknown symbols” like f(x.g() or f(1+x). The scope of the backquote construct is determined by the scope of the most external enclosing braces, except in two case: `x and `x* which allows you to use variables instantiated by the pattern part. In that case the scope is limited to the length of the variable name, eventually extended by the `'*''. Sometime, when writing complex expression like if(`x==`y || `x==`z), it can be useful to introduce extra braces (if( `(x==y || x==z) )) to extend the scope of the backquote.

5.2.7  Meta-quote construct

Tom provide a construct `%['`]%' allowing to build formatted strings without the need to encode special characters such as tabulations and carriage return as usually done in Java. For example, to build a string containing the HelloWorld program, one can simply use:

String hello = %[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld !");
    }
  }
]%


Additionally, it is possible to insert in this string the content of a String variable, or the result of a function call (if it is a String) using the `@' as escape character: the code contained between `@' in this construct will be evaluated and the result inserted in the surrounding formatted string. Then, to add to the HelloWorld example a version string, we can use:

String version = "v12";
String hello2=%[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld   @version@");
    }
  }
]%;


Even if the contents of the meta-quote construct is a formatted string, it is required that this string contains correctly balanced braces.

Note: the expression between two `@' can contain Tom constructs, like backquote expressions for example.


5.3  Tom signature constructs

5.3.1  Sort definition constructs

To define the mapping between the algebraic constructors and their concrete implementation, Tom provides a signature-mapping mechanism composed of several constructs. In addition to predefined mapping for usual builtin sorts (int, long, double, boolean, string, and char), all other algebraic sorts have to be declared using the %typeterm construct.

To use predefined sorts (in a %match construct or in the definition of a new operator), it is sufficient to use the %include construct (%include { int.tom } for example).

When defining a new type with the %typeterm construct, at least two extra informations have to be provided: Given a Java class Person we can define an algebraic mapping for this class:

 %typeterm TomPerson {
   implement { Person }
   equals(t1,t2) { t1.equals(t2) }
 }


Here, we assume that the method equals implement a comparison function over instances of Person. Note also that we have used TomPerson to make a clear distinction between algebraic sorts (defined in Tom) and implementation sorts (defined in Java, via the use of classes). In practice, we usually use the same name to denote both the algebraic sort and the implementation sort.

The grammar is the following:
IncludeConstruct ::= `%include' `{' FileName `}'
GomConstruct ::= `%gom' `{' GomGrammar `}'
GoalLanguageBlock ::= `{' BlockList `}'
TypeTerm ::= `%typeterm' Type `{'
    KeywordImplement [KeywordEquals] [KeywordVisitorFwd]
    `}'
KeywordImplement ::= `implement' GoalLanguageBlock
KeywordEquals ::= `equals' `(' Name `,' Name `)' GoalLanguageBlock
KeywordVisitorFwd ::= `visitor_fwd' GoalLanguageVisitorFwd


Note: in order to use strategy constructs (%strategy), an extra information have to be defined using the visitor_fwd construct.

This sort information should correspond to a Java class. In order to use strategies, sorts must define a Java class. This class must extend VisitorFwd and must implement the interface VisitableVisitor. The goal of the VisitorFwd is to provide a base class for building strategies. Thus, it has to provide typed visitors for each defined sort, and implement its visit method to call the typed visit_sort functions depending on the dynamic type of the subject.

See Section 7.7 for a detailed example of strategy support for arbitrary Java classes.


5.3.2  Constructor definition constructs

Once algebraic sorts are declared, Tom provides a mechanism to define sorted signatures for constructors using %op, %oplist or %oparray constructs. When defining a new symbol with the %op construct, the user should specify the name of the operator, its codomain, and its domain. The later one is defined by a list of pairs (slot-name, sort).

Let us consider again the class Person, and let us suppose that an instance of Person has two fields (name and age), we can define the following operator:
%op TomPerson person(name:String, age:int)
In this example, the algebraic operator person has two slots (name and age) respectively of sorts String and int, where String and int are pre-defined sorts.

In addition to the signature of an operator, several auxiliary functions have to be defined: Coming back to our example, checking if an object t is rooted by the symbol person can be done by checking that t is an instance of the class Person. Building a person can be done via the Java function new Person(...). Accessing to the slots name and age could be implemented by an access to the variables of the class Person. In practice, the following operator definition should work fine:

%op TomPerson person(name:String, age:int) {
  is_fsym(t)  { t instanceof Person }
  make(t1,t2) { new Person(t1,t2) }
  get_slot(name,t) { t.name } // assuming that 'name' is public
  get_slot(age,t)  { t.age  } // assuming that 'age' is public
}



When defining a new symbol with the %oplist construct, the user has to specify how the symbol is implemented. In addition, the user has to specify how a list can be built and accessed: Similarly, when defining a new symbol with the %oparray construct, the user has to specify how the symbol is implemented, how an array can be built, and accessed: The %oplist or %oparray is complex but not difficult to use. Let us consider the ArrayList Java class, and let us define a Tom mapping over this data-structure. The first thing to do consists in defining the sort for the elements and the sort for the list-structure:

  %typeterm Object {
    implement     { Object        }
    equals(l1,l2) { l1.equals(l2) }
  }
  %typeterm TomList {
    implement     { ArrayList    }
    equals(l1,l2) { l1.equals(l2) }
  }


Once defined the sorts, it becomes possible to define the list-operator TomList conc( Object* ). This operator has a variadic arity: it takes several Object and returns a TomList.

  %oparray TomList conc( Object* ) {
    is_fsym(t)       { t instanceof ArrayList }
    make_empty(n)    { new ArrayList(n)       }
    make_append(e,l) { myAdd(e,(ArrayList)l)   }
    get_element(l,n) { (Object)l.get(n)        }
    get_size(l)      { l.size()                }
  }

  private static ArrayList myAdd(Object e,ArrayList l) {
    l.add(e);
    return l;
  }


An auxiliary function myAdd is used since the make_append construct should return a new list. The get_element should return an element whose sort belongs to the domain (Object) in this example. Although not needed in this example, in general, a cast ((Object)l.get(n)) is needed.

The grammar for the mapping constructs is the following:
Operator ::= `%op' Type Name
    `(' [ SlotName `:' Type ( `,' SlotName `:' Type )* ] `)'
    `{' KeywordIsFsym ( KeywordMakeKeywordGetSlot )* `}'
OperatorList ::= `%oplist' Type Name `(' Type `*' `)'
    `{' KeywordIsFsym ( KeywordMakeEmptyList
    KeywordMakeInsertKeywordGetHead
    KeywordGetTailKeywordIsEmpty )* `}'
OperatorArray ::= `%oparray' Type Name `(' Type `*' `)'
    `{' KeywordIsFsym ( KeywordMakeEmptyArray
    KeywordMakeAppendKeywordElement
    KeywordGetSize )* `}'
KeywordIsFsym ::= `is_fsym' `(' Name `)' GoalLanguageBlock
KeywordGetSlot ::= `get_slot' `(' Name `,' Name `)' GoalLanguageBlock
KeywordMake ::= `make' [ `(' Name ( `,' Name )* `)' ] GoalLanguageBlock
KeywordGetHead ::= `get_head' `(' Name `)' GoalLanguageBlock
KeywordGetTail ::= `get_tail' `(' Name `)' GoalLanguageBlock
KeywordIsEmpty ::= `is_empty' `(' Name `)' GoalLanguageBlock
KeywordMakeEmptyList ::= `make_empty' [ `(' `)' ] GoalLanguageBlock
KeywordMakeInsert ::= `make_insert' `(' Name `,' Name `)' GoalLanguageBlock
KeywordGetElement ::= `get_element' `(' Name `,' Name `)' GoalLanguageBlock
KeywordGetSize ::= `get_size' `(' Name `)' GoalLanguageBlock
KeywordMakeEmptyArray ::= `make_empty' `(' Name `)' GoalLanguageBlock
KeywordMakeAppend ::= `make_append' `(' Name `,' Name `)' GoalLanguageBlock


5.3.3  Predefined sorts and operators

See Section 8.1.

5.4  XML pattern

To deal with XML documents, the XML notation can be used (<A><B attribute="name"/></A> for example).

When manipulating XML documents, we distinguish two main kinds of operations: retrieving information and transforming a document. Tom provides three different XML notations that ought to simplify the definition of patterns: the “standard ” and the “implicit” XML notations are used to define compact (but incomplete) patterns. This notation is well suited to retrieve information. The “explicit” XML notation is used to precisely describe an XML pattern and all the variables that have to be instantiated. This notation is particularly well suited to perform XML transformation since it allows the programmer to precisely describe how variables have to be instantiated.

To make the XML notation understandable, we have to explain how XML documents are handled by Tom. To each XML document corresponds a DOM (Document Object Model) representation. In Tom, we have defined a mapping from DOM sorts to abstract algebraic sorts: TNode and TNodeList, which correspond respectively to Node and NodeList, defined by the Java DOM implementation.

Thus, a Node object becomes a ternary operator Element whose first subterm is the name of the XML node, the second subterm is a list of attributes and the third subterm is a list of subterms (which correspond to XML sub-elements). The second and the third elements are terms of sort TNodeList (because they are implemented by NodeList objects in DOM).

Thus, when considering the <A></A> XML document, the corresponding algebraic term is Element("A",[],[]), where [] denotes the empty list. Similarly, <A><Battribute="name"></B></A> is encoded into Element("A",[],[Element("B",[Attribute("attribute","name")],[])]).

When defining an XML pattern, the user has to introduce extra list-variables to precisely describe the XML pattern and capture the different contexts. Suppose that we are interested in finding a node <B></B> which is a subterm of a node <A></A> (but not necessary the first subterm). The algebraic pattern should be Element("A",[_*],[_*,Element("B",[_*],[_*]),_*]). Using the XML notation, this pattern should be expressed as follows: <A(_*)>(_*,<B(_*)>(_*)</B>,_*)</A>. This notation (called explicit) is precise but error prone. This is why we have introduced the explicit notation, where all context variable can be removed (and () are replaced by []): <A[]><B[]>[]</B></A>. The last notation (called standard XML notation) allows the user to remove the [] and replace the list-separator (,) by spaces. The previous pattern can be written: <A><B></B></A>.

These three different notations allow the user to choose the level of control he wants to have on the XML pattern matching algorithm.

The formal description of the syntax is the following:
XMLTerm ::= `<' XMLNameList XMLAttrList `/>'
  `<' XMLNameList XMLAttrList `>' XMLChilds `</' XMLNameList `>'
  `#TEXT' `(' IdentifierString `)'
  `#COMMENT' `(' IdentifierString `)'
  `#PROCESSING-INSTRUCTION'
    `(' (IdentifierString) `,' (IdentifierString) `)'
XMLNameList ::= XMLName
  `(' XMLName ( `' XMLName )* `)'
XMLAttrList ::= `[' [ XMLAttribute (`,' XMLAttribute)* ] `]'
  `(' [ XMLAttribute (`,' XMLAttribute)* ] `)'
  ( XMLAttribute )*
XMLAttribute ::= `_*'
  VariableName `*'
  AttributeName `=' [AnnotedName `@'] ( IdentifierString )
  [AnnotedName `@'] `_' `=' [AnnotedName `@'] ( IdentifierString )
XMLChilds ::= ( Term )*
  `[' Term ( `,' Term )* `]'



Previous Up Next