Previous Up Next

Chapter 2  Advanced features

2.1  The “Hello World” example

Another very simple program is the following:
public class HelloWorld {
  %include { string.tom }

  public final static void main(String[] args) {
    String who = "World";
    %match(who) {
      "World" -> { System.out.println("Hello " + who); }
      _       -> { System.out.println("Don't panic"); }
    }
  }
}
The %include { string.tom } construct imports the predefined (Java) mapping which defines the Tom String algebraic sort. Thus, the sort String can be used in the %match construct.

This examples shows that pattern matching can be performed against built-in data-types. In this example, the %match construct is equivalent to a switch/case construct. `_' denotes an anonymous variable that corresponds to a default case in this example.

2.1.1  “Hello World” revisited

One particularity of Tom is to support list-matching (also known as associative-matching).

When considering that a string is a list of characters, it becomes natural to describe pattern-matching using an associative operator which corresponds to the concatenation of two characters. Thus, the string “Hello” is equivalent to the list of characters concString('H','e','l','l','o').

Given a string, to check if the string contains the character `e', we can use the following matching construct:
  %match(t) {
    concString(_*,'e',_*) -> { System.out.println("we have found a 'e'"); }
  }
In this example, `_*' corresponds to an anonymous variable which can be instantiated by any list of characters (possibly reduced to the empty list).

In order to capture the context, it is possible to use a named variable:
  %match(t) {
    concString(before*,'e',after*) -> {
      System.out.println("we have found a 'e'" +
                         " after " + `before* +
                         " but before " + `after*);
    }
  }
In this example, we have introduced two new variables (before* and after*), called “variable-star”, which are instantiated by a list of characters. This is `*' that denoted a list variable.

To use their value in a Java expression, Tom provides the ``' (backquote) mechanism. This construct can be use to retrieve the value of a Tom variable (instantiated by pattern-matching). As illustrated by the previous examples, `before* and `after* are replaced by expressions of sort String.

Suppose now that we look for a word whose last letter is a `o'. A possible Tom pattern is:
  %match(t) {
    concString(before*,'o') -> { ... }
  }
Using this mechanism, it also becomes possible to look for a word which contains an `e' somewhere and an `o' in last position:
  %match(t) {
    concString(before*,'e',_*,'o') -> { ... }
  }
Note that some pattern could provide several outcomes:
  %match(t) {
    concString(before*,'l',after*) -> {
      System.out.println("we have found " + `before* +
                         " before 'l' and " + `after* + " after");
    }
  }
It will match twice:
  we have found he before 'l' and lo after
  we have found hel before 'l' and o after
Let us suppose that we look for two consecutive `l' anywhere in the matched expression. This could be expressed by:
  %match(t) {
    concString(_*,'l','l',_*) -> { ... }
  }
Since this syntax is error prone when looking for long/complex substrings, Tom provides an abbreviated notation: `ll':
  %match(t) {
    concString(_*,'ll',_*) -> { ... }
  }
This notation is fully equivalent to the previous one.

2.1.2  “Hello World” re-revisited

As illustrated previously, we can use variables to capture contexts. In fact, this mechanism is more general and we can use a variable anywhere to match something which is not statically know. Thus, it is possible to use variable (without a star) to explicitly say that a character should be there:
  %match(t) {
    concString(x,_*,'ll',_*,y) -> { ... }
  }
This example matches the string t if this string contains two consecutive `l' and at least two other characters. In this case, the first letter of the string is stored in the variable `x' (respectively the last letter in `y').

The variable `x' and `y' are local to the given pattern. Thus, when considering several patterns in a same match construct, there is no interference between patterns. However, in a same pattern, we can use the same variable two times to indicate to we want to match two identical elements. This feature is known as non-linear matching:
  %match(t) {
    concString(x,_*,'ll',_*,y) -> { ... }
    concString(x*,y*,y*,x*)        -> { /* we have found a palindrome */ }
  }
When looking for several instances of a same character, it may be interesting to use a variable to describe this repetition:
  %match(t) {
    concString(_*,'a',_*,'a',_*,'a',_*) -> { /* look for 3 'a' in a string */ }
    concString(_*,x@'a',_*,x,_*,x,_*)   -> { /* look for 3 'a' in a string */ }
  }
Here, the first `a' is bound to `x', and then, all remaining occurrences of `x' are compared to this first instance. `x' is an alias for `'a''.
Note: in the previous section, we have said that (_*,'ll',_*) is equivalent to (_*,'l','l',_*). This is still true, but the reader should also note that (_*,x@'ab',_*) becomes (after expansion) equivalent to (_*,x@'a',x@'b',_*), which can never match any string.


2.2  Associative matching

As illustrated previously, Tom supports a generalized form of string matching, called associative matching, also known as list matching. In Tom, some operators are special and do not have a fixed number of arguments. This intuitively corresponds to the idea of list where elements are stored in a given order. A list of elements may be written (a,b,c) or [a,b,c]. In Tom, it is written f(a(),b(),c()), where `f' is this special operator. In some sense, `f' is the name of the list. This allows to distinguish list of apples, from list of oranges for examples.

The definition of such operators can be done using Gom:
import list1.list.types.*;
public class List1 {
  %gom {
    module List
    abstract syntax
    E = a()
      | b()
      | c()
    L = f( E* )
   }
   ...
}
Once this signature defined, it becomes possible to define patterns and ` expressions. For example, the function that removes identical consecutive elements can be expressed as follows:
  public static L removeDouble(L l) {
    %match(l) {
      f(X1*,x,x,X2*) -> {
        return removeDouble(`f(X1*,x,X2*));
      }
    }
    return l;
  }
This example is interesting since it express a complex operation in a concise way: given a list l, the %match construct looks for two identical consecutive elements (f(X1*,x,x,X2*) is called a non-linear pattern since x appears twice). If there exists such two elements, the term f(X1*,x,X2*) is built (one x has been removed), and the removeDouble function is called recursively. When there is no such two elements, the pattern does not match, this means that the list does not contain any consecutive identical elements. Therefore, the instruction return l is executed and the function returns.

Similarly, a sorting algorithm can be implemented: if a list contains two elements in the wrong order, just swap them. This can be expressed as follows:
  public static L swapSort(L l) {
    %match(l) {
      f(X*,e1,Z*,e2,Y*) -> {
        if(`gt(e1,e2)) {
          return `swapSort(f(X*,e2,Z*,e1,Y*));
        }
      }
    }
    return l;
  }

  private static boolean gt(E e1, E e2) {
    return e1.toString().compareTo(e2.toString()) > 0;
  }
In this example, the order is implemented by the gt function, using the lexicographical ordering provided by Java.

Note: given a list l = f(b(),b(),a(),a()), there exists several ways to match the pattern. In this case, there are 6 possibilities:
1. X*=f(),        e1=b(), Z*=f(),        e2=b(), Y*=f(a(),a())
2. X*=f(),        e1=b(), Z*=f(b()),     e2=a(), Y*=f(a())
3. X*=f(),        e1=b(), Z*=f(b(),a()), e2=a(), Y*=f()
4. X*=f(b()),     e1=b(), Z*=f(),        e2=a(), Y*=f(a)
5. X*=f(b()),     e1=b(), Z*=f(a()),     e2=a(), Y*=f()
6. X*=f(b(),b()), e1=a,   Z*=f(),        e2=a(), Y*=f()
Assuming that a < b, there are only 4 solutions that can be used to apply the rule: 2, 3, 4, and 5 (e1 must be greater than e2).

It becomes important to remind you how a rule is applied in Tom:
  1. a rule whose pattern match is selected (in our case, there is only one rule),
  2. then, for each solution of the matching problem (there are 6 solutions in our case), the right part is executed (i.e. the Java code).

    Let us suppose that Tom computes the solution 1. first (the order in which solutions are computed is not fixed and depends on the implementation). In that case, the test if(`gt(e1,e2)) is evaluated. Since b() is not greater than b(), the return swapSort(...) is not performed.

    As specified in Section 5.2.2, another solution is computed. Let us say 2.. In that case, the test becomes true and the function swapSort is called recursively.

  3. when there is no more solution (i.e. this means that the list is already sorted), the control is transfered to the next pattern. Since there is no more pattern in our example, the return l is executed, which returns the sorted list.
The following code shows how to build a list and call the function defined previously. This results in sorted lists where elements only occur once.
  public final static void main(String[] args) {
    L l = `f(a(),b(),c(),a(),b(),c(),a());
    L res1 = swapSort(l);
    L res2 = removeDouble(res1);
    System.out.println(" l       = " + l);
    System.out.println("sorted l = " + res1);
    System.out.println("single l = " + res2);
  }


2.3  Another example using Gom

On one side Tom offers customized pattern matching facilities to support the development of Xml and string based applications. On the other side, Tom is a rule based engine which allows to solve complex pattern matching problems over algebraic data-types.

In order to describe user defined abstract data-types, Tom provides a signature definition mechanism (%typeterm, %typelist, %op, etc. c.f. section 2.5). However, Tom does not provide any support to implement these abstract data-types. This is why, in addition to Tom, we have developed the system Gom. This tool provides a human readable syntax definition formalism (inspired from SDF) as well as an efficient implementation for the data-type. The implementation generated by Gom is based on the SharedObject library (which is the code of the ATerm library), using maximal sharing in an efficient way.

In the following, we consider a simple abstract data-type which defines three sorts (Date, Person, and PersonList), as well as three constructors (date, person, and concPerson). In this signature, Int and String are predefined builtin sorts:
  %gom {
    module person
    imports int String
    abstract syntax
    Date = date(day:int, month:int, year:int)
    Person = person(firstname:String, lastname:String, birthdate:Date)
    PersonList = concPerson(Person*)
  }
The notation concPerson(Person*) means that concPerson is a variadic associative operator which takes a list of Person in arguments, and returns a PersonList.

By using the %gom{ ... } construct, a Tom signature and an implementation are automatically generated. In order to connect these generated files, it is only necessary to import the corresponding packages:
import addressbookgom.person.types.*;

public class AddressBookGom {
  %gom { ... }

  public final static void main(String[] args) {
    ...
  }
  ...
}
Once importation and definition are performed, it becomes very easy to use Tom: the constructors can be immediately used in a backquote or a matching constructs.

2.3.1  Using the backquote mechanism

In this section, we consider a tiny address-book application: given a date and a list of persons, the problem consists in finding out the persons whose birthday corresponds to the date.

The first sub-problem consists in building a list of persons. For this, we use the backquote notation and the concPerson operator:
  public static PersonList generateBook() {
    return `concPerson(
      person("John","Smith",date(1965,3,27)),
      person("Marie","Muller",date(1986,3,26)),
      person("Paul","Muller",date(2000,1,27))
    );
  }


2.3.2  Using list-matching facilities

The main(String[] args) method generates a list of persons, builds a date object and calls the happyBirthday() method whose goal is to determine persons whose birthday corresponds to the date:
  public final static void main(String[] args) {
    PersonList book = generateBook();
    Date today = `date(2003,3,27);
    happyBirthday(book,today);
  }
The following method is interesting because it uses pattern matching to encode a searching procedure. As illustrated below, the use of associative operator, combined with non-linearity, allows us to solve the problem in a nice and concise way:
  public static void happyBirthday(PersonList book, Date date) {
    %match(book, date) {
      concPerson(_*,person(firstname,_,date(year,month,day)),_*),date(_,month,day) -> {
        System.out.println("Happy birthday " + `firstname);
      }
    }
  }
Let us mention that the println statement is executed for all persons whose birthdate corresponds to the given date. It also interesting to note that all instantiations of firstname, year, month and day are tried before exiting the %match construct.

2.4  Anti-matching

Recently, Tom patterns have been enriched to support the use of complements, further called anti-patterns. In other words, it is now possible to specify what you don't want to match. This is possible using the `!' symbol, according to the grammar defined in the language reference.

If we consider the Gom signature
import list1.list.types.*;
public class List1 {
  %gom {
    module List
    abstract syntax
    E = a()
      | b()
      | c()
      | f(x1:E, x2:E)
      | g(x3:E)
    L = List( E* )
  }
  ...
}


a very simple use of the anti-patterns would be

  ...
  %match(subject) {
    !a() -> {
      System.out.println("The subject is different from 'a'");
    }
  }
  ...


The `!' symbols can be nested, and therefore more complicated examples can be generated:

  ...
  %match(subject) {
    f(a(),!b()) -> {
      // matches an f which has x1=a() and x2!=b()
    }
    f(!a(),!b()) -> {
      // matches an f which has x1!=a() and x2!=b()
    }
    !f(a(),!b()) -> {
      // matches either something different from f(a(),_) or f(a(),b())
    }
    !f(x,x) -> {
      // matches either something different from f, or an f with x1 != x2
    }
    f(x,!x) -> {
      // matches an f which has x1 != x2
    }
    f(x,!g(x)) -> {
      // matches an f which has either x2!=g or x2=g(y) with y != x1
    }
  }
  ...


The anti-patterns can be also quite useful when used with lists. Imagine that you what to search for a list that doesn't contain a specific element. Without the use of anti-patterns, you would be forced to match with a variable instead of the element you don't want, and after that to perform a test in the action part to check the contents of the variable. For the signature defined above, if we are looking for a list that doesn't contain a(), using the anti-patterns we would write:

  ...
  %match(listSubject) {
    !List(_*,a(),_*) -> {
      System.out.println("The list doesn't contain 'a'");
    }
  }
  ...


Please note that this is different from writing

  ...
  %match(listSubject) {
    List(_*,!a(),_*) -> {
      // at least an element different from a()
    }
  }
  ...
which would match a list that has at least one element different from a().

Some more useful anti-patterns can be imagined in the case of lists:

  ...
  %match(listSubject) {
    !List(_*,!a(),_*) -> {
      // matches a list that contains only 'a()'
    }
    !List(X*,X*) -> {
      // matches a non-symmetrical list
    }
    !List(_*,x,_*,x,_*) -> {
      // matches a list that has all the elements distinct
    }
    List(_*,x,_*,!x,_*) -> {
      // matches a list that has at least two elements that are distinct
    }
  }
  ...


2.5  Hand-written mappings

Up to now, we have considered examples where the implementation of data-types is either predefined (Xml case), or generated by a tool (Gom case). One interesting contribution of Tom is to be customizable. By defining a mapping between the signature formalism (%typeterm, %op, etc.) and the concrete implementation, it becomes possible to perform pattern matching against any data-structure.

2.5.1  Defining a simple mapping

In this section, we consider that we have a Java library for representing tree based data structures. On another side, to be able to use Tom, we consider the following abstract data-type:
  %typeterm Nat
  %op Nat zero()
  %op Nat suc(Nat)
By doing so, we have defined a sort (Nat), and two operators (zero and suc). When defining a mapping, the goal consists in explaining to Tom how the considered abstract data-type is implemented, and how a term (over this signature) can be de-constructed.

For expository reasons, the ATerm library is the Java library used to implement terms. However, any other tree/term based library could have been used instead.

In order to define a correct mapping, we have to describe how the algebraic sort (Nat in our example) is implemented (by the ATermAppl class). This is done via the implement construct:
  %typeterm Nat {
    implement { ATermAppl }
  }
The second part of the mapping definition consists in defining how the symbols (zero and suc) are represented in memory. This is done via the is_fsym construct:
  %op Nat zero() {
    is_fsym(t) { t.getName().equals("zero") }
  }

  %op Nat suc(pred:Nat) {
    is_fsym(t)       { t.getName().equals("suc") }
    get_slot(pred,t) { (ATermAppl)t.getArgument(0) }
  }
In addition, get_slot describes how to retrieve a subterm of a term.

Given a term built over the ATerm library, it becomes possible to perform pattern matching as previously explained.

2.5.2  Using backquote constructs

When using the backquote construct (`suc(suc(zero)) for example), Tom has to know how to build a term in memory. For this purpose, we consider an extension of the signature definition formalism: the make construct.
  %op Nat zero() {
    is_fsym(t) { t.getName().equals("zero") }
    make       { SingletonFactory.getInstance().makeAppl(
                   SingletonFactory.getInstance().makeAFun("zero",0,false)) }
  }

  %op Nat suc(pred:Nat) {
    is_fsym(t)       { t.getName().equals("suc") }
    get_slot(pred,t) { (ATermAppl)t.getArgument(0) }
    make(t)          { SingletonFactory.getInstance().makeAppl(
                         SingletonFactory.getInstance().makeAFun("suc",1,false),t) }
  }
The makeAFun function has three arguments: the function name, the number of arguments and a boolean that indicates if the quotes are included in the function name or not.

Given this mapping, Tom can be used as previously: the following function implements the addition of two Peano integers:
  public ATermAppl plus(ATermAppl t1, ATermAppl t2) {
    %match(t2) {
      zero() -> { return t1; }
      suc(y) -> { return `suc(plus(t1,y)); }
    }
    return null;
  }


2.5.3  Advanced examples

Let us suppose that we have a Java class Person with two getters (getFirstname and getLastname). In order to illustrate the signature definition formalism, we try to redefine (without using Gom) the abstract data type for the sort Person.

The first thing to do consists in defining the Tom sort Person:
  %typeterm Person {
    implement { Person }
  }
To avoid any confusion, we use the same name twice: the Tom sort Person is implemented by the Java class Person. When declaring an operator, we defined the behavior as shown in the previous example:
  %op Person person(firstname:String, lastname:String) {
    is_fsym(t) { t instanceof Person }
    get_slot(firstname,t) { t.getFirstname() }
    get_slot(lastname,t) { t.getLastname() }
    make(t0, t1) {  new Person(t0, t1) }
  }
In this example, we illustrate another possibility offered by Tom: being able to know whether a term is rooted by a symbol without explicitly representing this symbol. The is_fsym(t) construct should return true when the term t is rooted by the algebraic operator we are currently defining. In this example, we say that the term t is rooted by the symbol person when the object t is implemented by an instance of the class Person. By doing so, we do not explicitly represent the symbol person, even if it could have been done via the reflective capabilities of Java (by using Person.getClass() for example).

2.5.4  Using list-matching

In this section, we show how to describe a mapping for associative operators.
  %typeterm TomList {
    implement { ArrayList }
    equals(l1,l2)      { l1.equals(l2) }
  }
Assuming that the sort Element is already defined, we can use the %oparray construct to define an associative operator. We also have to explain to Tom how to compute the size of a list, and how to access a given element. This is done via get_size and get_element constructs:
  %oparray TomList conc( Element* ) {
    is_fsym(t)       { t instanceof ArrayList }
    get_element(l,n) { (ATermAppl)l.get(n) }
    get_size(l)      { l.size() }
    make_empty(n)    { myEmpty(n) }
    make_append(e,l) { myAdd(e,(ArrayList)l) }
  }
This construct is similar to %op except that additional information have to be given: how to build an empty list (make_empty), and how to add an element to a given list (make_append). The auxiliary Java functions are defined as follows:
  private static ArrayList myAdd(Object e,ArrayList l) {
    l.add(e);
    return l;
  }

  private static ArrayList myEmpty(int n) {
    ArrayList res = new ArrayList(n);
    return res;
  }
Usually, we use an associative operator to represent (in a abstract way) a list data structure. There are many ways to implement a list, but the two most well-known are the use of array based list, and the use of linked-list. The previously described mapping shows how to map an abstract list to an array based implementation.

Tom offers another similar construct %oplist to map an associative operator to a linked-list based implementation. When using the ATerm library for example, a possible implementation could be:
  %typeterm TomTerm {
    implement { aterm.ATermAppl }
    equals(t1, t2)     { t1==t2 }
  }
  %typeterm TomList {
    implement { ATermList }
    equals(l1,l2) { l1==l2 }
  }

  %oplist TomList conc( TomTerm* ) {
    is_fsym(t) { t instanceof aterm.ATermList }
    make_empty()  { aterm.pure.SingletonFactory.getInstance().makeList() }
    make_insert(e,l) { l.insert(e) }
    get_head(l)   { (aterm.ATermAppl)l.getFirst() }
    get_tail(l)   { l.getNext() }
    is_empty(l)   { l.isEmpty() }
  }



Previous Up Next