2.1 The “Hello World” example
Another very simple program is the following:
public class HelloWorld {
%include { string.tom }
public final static void main(String[] args) {
String who = "World";
%match(who) {
"World" -> { System.out.println("Hello " + who); }
_ -> { System.out.println("Don't panic"); }
}
}
}
The %include { string.tom } construct imports the predefined
(Java) mapping which defines the Tom String algebraic
sort. Thus, the sort String can be used in the %match construct.
This examples shows that pattern matching can be performed against
built-in data-types. In this example, the %match construct is
equivalent to a switch/case construct. `_' denotes an
anonymous variable that corresponds to a default case in this example.
2.1.1 “Hello World” revisited
One particularity of Tom is to support list-matching (also known as
associative-matching).
When considering that a string is a list of characters, it becomes
natural to describe pattern-matching using an associative operator
which corresponds to the concatenation of two characters.
Thus, the string “Hello” is equivalent to the list of
characters concString('H','e','l','l','o').
Given a string, to check if the string contains the character
`e', we can use the following matching construct:
%match(t) {
concString(_*,'e',_*) -> { System.out.println("we have found a 'e'"); }
}
In this example, `_*' corresponds to an anonymous variable which
can be instantiated by any list of characters (possibly reduced
to the empty list).
In order to capture the context, it is possible to use a named
variable:
%match(t) {
concString(before*,'e',after*) -> {
System.out.println("we have found a 'e'" +
" after " + `before* +
" but before " + `after*);
}
}
In this example, we have introduced two new variables (before*
and after*), called “variable-star”, which are instantiated
by a list of characters. This is `*' that denoted a list variable.
To use their value in a Java expression, Tom provides the ``'
(backquote) mechanism. This construct can be use to retrieve the value
of a Tom variable (instantiated by pattern-matching).
As illustrated by the previous examples, `before* and
`after* are replaced by expressions of sort String.
Suppose now that we look for a word whose last letter is a
`o'. A possible Tom pattern is:
%match(t) {
concString(before*,'o') -> { ... }
}
Using this mechanism, it also becomes possible to look for a word which
contains an `e' somewhere and an `o' in last position:
%match(t) {
concString(before*,'e',_*,'o') -> { ... }
}
Note that some pattern could provide several outcomes:
%match(t) {
concString(before*,'l',after*) -> {
System.out.println("we have found " + `before* +
" before 'l' and " + `after* + " after");
}
}
It will match twice:
we have found he before 'l' and lo after
we have found hel before 'l' and o after
Let us suppose that we look for two consecutive `l' anywhere in
the matched expression. This could be expressed by:
%match(t) {
concString(_*,'l','l',_*) -> { ... }
}
Since this syntax is error prone when looking for long/complex
substrings, Tom provides an abbreviated notation: `ll':
%match(t) {
concString(_*,'ll',_*) -> { ... }
}
This notation is fully equivalent to the previous one.
2.1.2 “Hello World” re-revisited
As illustrated previously, we can use variables to capture
contexts. In fact, this mechanism is more general and we can use a
variable anywhere to match something which is not statically know.
Thus, it is possible to use variable (without a star) to explicitly
say that a character should be there:
%match(t) {
concString(x,_*,'ll',_*,y) -> { ... }
}
This example matches the string t if this string contains two
consecutive `l' and at least two other characters. In this case,
the first letter of the string is stored in the variable `x'
(respectively the last letter in `y').
The variable `x' and `y' are local to the given pattern. Thus,
when considering several patterns in a same match construct, there is
no interference between patterns.
However, in a same pattern, we can use the same variable two times to
indicate to we want to match two identical elements. This feature is
known as non-linear matching:
%match(t) {
concString(x,_*,'ll',_*,y) -> { ... }
concString(x*,y*,y*,x*) -> { /* we have found a palindrome */ }
}
When looking for several instances of a same character, it may be
interesting to use a variable to describe this repetition:
%match(t) {
concString(_*,'a',_*,'a',_*,'a',_*) -> { /* look for 3 'a' in a string */ }
concString(_*,x@'a',_*,x,_*,x,_*) -> { /* look for 3 'a' in a string */ }
}
Here, the first `a' is bound to `x', and then, all remaining
occurrences of `x' are compared to this first instance. `x'
is an alias for `'a''.
Note:
in the previous section, we have said that (_*,'ll',_*) is
equivalent to (_*,'l','l',_*). This is still true, but the
reader should also note that (_*,x@'ab',_*) becomes (after
expansion) equivalent to (_*,x@'a',x@'b',_*), which can never
match any string.
2.2 Associative matching
As illustrated previously, Tom supports a generalized form of string
matching, called associative matching, also known as list
matching.
In Tom, some operators are special and do not have a fixed number of
arguments. This intuitively corresponds to the idea of list where elements are
stored in a given order.
A list of elements may be written (a,b,c) or [a,b,c]. In
Tom, it is written f(a(),b(),c()), where `f' is this special
operator. In some sense, `f' is the name of the list. This allows to
distinguish list of apples, from list of oranges for examples.
The definition of such operators can be done using Gom:
import list1.list.types.*;
public class List1 {
%gom {
module List
abstract syntax
E = a()
| b()
| c()
L = f( E* )
}
...
}
Once this signature defined, it becomes possible to define patterns and
` expressions. For example, the function that removes identical
consecutive elements can be expressed as follows:
public static L removeDouble(L l) {
%match(l) {
f(X1*,x,x,X2*) -> {
return removeDouble(`f(X1*,x,X2*));
}
}
return l;
}
This example is interesting since it express a complex operation in a concise
way: given a list l, the %match construct looks for two
identical consecutive elements (f(X1*,x,x,X2*) is called a non-linear
pattern since x appears twice). If there exists such two elements, the
term f(X1*,x,X2*) is built (one x has been removed), and the
removeDouble function is called recursively.
When there is no such two elements, the pattern does not match, this means that
the list does not contain any consecutive identical elements. Therefore, the
instruction return l is executed and the function returns.
Similarly, a sorting algorithm can be implemented: if a list contains two
elements in the wrong order, just swap them. This can be
expressed as follows:
public static L swapSort(L l) {
%match(l) {
f(X*,e1,Z*,e2,Y*) -> {
if(`gt(e1,e2)) {
return `swapSort(f(X*,e2,Z*,e1,Y*));
}
}
}
return l;
}
private static boolean gt(E e1, E e2) {
return e1.toString().compareTo(e2.toString()) > 0;
}
In this example, the order is implemented by the gt function, using the
lexicographical ordering provided by Java.
Note:
given a list
l = f(b(),b(),a(),a()), there exists several ways to match
the pattern. In this case, there are 6 possibilities:
1. X*=f(), e1=b(), Z*=f(), e2=b(), Y*=f(a(),a())
2. X*=f(), e1=b(), Z*=f(b()), e2=a(), Y*=f(a())
3. X*=f(), e1=b(), Z*=f(b(),a()), e2=a(), Y*=f()
4. X*=f(b()), e1=b(), Z*=f(), e2=a(), Y*=f(a)
5. X*=f(b()), e1=b(), Z*=f(a()), e2=a(), Y*=f()
6. X*=f(b(),b()), e1=a, Z*=f(), e2=a(), Y*=f()
Assuming that
a < b, there are only 4 solutions that can be used to
apply the rule: 2, 3, 4, and 5 (
e1 must be greater than
e2).
It becomes important to remind you how a rule is applied in Tom:
-
a rule whose pattern match is selected (in our case, there is only one rule),
- then, for each solution of the matching problem (there are 6 solutions in
our case), the right part is executed (i.e. the Java code).
Let us suppose that Tom computes the solution 1. first (the order
in which solutions are computed is not fixed and depends on the
implementation). In that case, the test if(`gt(e1,e2)) is evaluated.
Since b() is not greater than b(), the return
swapSort(...) is not performed.
As specified in Section 5.2.2, another solution is
computed. Let us say 2.. In that case, the test becomes true and the
function swapSort is called recursively.
- when there is no more solution (i.e. this means that the list is already
sorted), the control is transfered to the next pattern. Since there is no
more pattern in our example, the return l is executed, which returns the
sorted list.
The following code shows how to build a list and call the function defined
previously.
This results in sorted lists where elements only occur once.
public final static void main(String[] args) {
L l = `f(a(),b(),c(),a(),b(),c(),a());
L res1 = swapSort(l);
L res2 = removeDouble(res1);
System.out.println(" l = " + l);
System.out.println("sorted l = " + res1);
System.out.println("single l = " + res2);
}
2.3 Another example using Gom
On one side Tom offers customized pattern matching facilities to
support the development of Xml and string based applications.
On the other side, Tom is a rule based engine which allows to
solve complex pattern matching problems over algebraic data-types.
In order to describe user defined abstract data-types, Tom provides a
signature definition mechanism (%typeterm, %typelist,
%op, etc. c.f. section 2.5).
However, Tom does not provide any support to implement these
abstract data-types.
This is why, in addition to Tom, we have developed the system Gom.
This tool provides a human readable syntax definition formalism (inspired from
SDF) as well as an efficient implementation for the data-type.
The implementation generated by Gom is based on the SharedObject
library (which is the code of the ATerm library), using
maximal sharing in an efficient way.
In the following, we consider a simple abstract data-type which
defines three sorts (Date, Person, and
PersonList), as well as three constructors
(date, person, and concPerson). In this
signature, Int and String are predefined builtin sorts:
%gom {
module person
imports int String
abstract syntax
Date = date(day:int, month:int, year:int)
Person = person(firstname:String, lastname:String, birthdate:Date)
PersonList = concPerson(Person*)
}
The notation concPerson(Person*) means that concPerson
is a variadic associative operator which takes a list of Person
in arguments, and returns a PersonList.
By using the %gom{ ... }
construct, a Tom signature and
an implementation are automatically generated.
In order to connect these generated files, it is only necessary
to import the corresponding packages:
import addressbookgom.person.types.*;
public class AddressBookGom {
%gom { ... }
public final static void main(String[] args) {
...
}
...
}
Once importation and definition are performed, it becomes very easy to use
Tom: the constructors can be immediately used in a backquote or a matching
constructs.
2.3.1 Using the backquote mechanism
In this section, we consider a tiny address-book application: given a
date and a list of persons, the problem consists in finding out the
persons whose birthday corresponds to the date.
The first sub-problem consists in building a list of persons. For this,
we use the backquote notation and the concPerson operator:
public static PersonList generateBook() {
return `concPerson(
person("John","Smith",date(1965,3,27)),
person("Marie","Muller",date(1986,3,26)),
person("Paul","Muller",date(2000,1,27))
);
}
2.3.2 Using list-matching facilities
The main(String[] args) method generates a list of persons, builds a date object and
calls the happyBirthday() method whose goal is to determine
persons whose birthday corresponds to the date:
public final static void main(String[] args) {
PersonList book = generateBook();
Date today = `date(2003,3,27);
happyBirthday(book,today);
}
The following method is interesting because it uses pattern matching
to encode a searching procedure. As illustrated below, the use of
associative operator, combined with non-linearity, allows us to solve
the problem in a nice and concise way:
public static void happyBirthday(PersonList book, Date date) {
%match(book, date) {
concPerson(_*,person(firstname,_,date(year,month,day)),_*),date(_,month,day) -> {
System.out.println("Happy birthday " + `firstname);
}
}
}
Let us mention that the println statement is executed for all
persons whose birthdate corresponds to the given date.
It also interesting to note that all instantiations of
firstname, year, month and day are tried
before exiting the %match construct.
2.4 Anti-matching
Recently, Tom patterns have been enriched to support the use of complements, further called anti-patterns.
In other words, it is now possible to specify what you don't want to match. This is
possible using the `!' symbol, according to the grammar defined in the language reference.
If we consider the Gom signature
import list1.list.types.*;
public class List1 {
%gom {
module List
abstract syntax
E = a()
| b()
| c()
| f(x1:E, x2:E)
| g(x3:E)
L = List( E* )
}
...
}
a very simple use of the anti-patterns would be
...
%match(subject) {
!a() -> {
System.out.println("The subject is different from 'a'");
}
}
...
The `!' symbols can be nested, and therefore more complicated examples can be generated:
...
%match(subject) {
f(a(),!b()) -> {
// matches an f which has x1=a() and x2!=b()
}
f(!a(),!b()) -> {
// matches an f which has x1!=a() and x2!=b()
}
!f(a(),!b()) -> {
// matches either something different from f(a(),_) or f(a(),b())
}
!f(x,x) -> {
// matches either something different from f, or an f with x1 != x2
}
f(x,!x) -> {
// matches an f which has x1 != x2
}
f(x,!g(x)) -> {
// matches an f which has either x2!=g or x2=g(y) with y != x1
}
}
...
The anti-patterns can be also quite useful when used with lists. Imagine that you what to search for a list
that doesn't contain a specific element. Without the use of anti-patterns, you would be forced to match with
a variable instead of the element you don't want, and after that to perform a test in the action
part to check the contents of the variable. For the signature defined above, if we are looking for
a list that doesn't contain a(), using the anti-patterns we would write:
...
%match(listSubject) {
!List(_*,a(),_*) -> {
System.out.println("The list doesn't contain 'a'");
}
}
...
Please note that this is different from writing
...
%match(listSubject) {
List(_*,!a(),_*) -> {
// at least an element different from a()
}
}
...
which would match a list that has at least one element different from a().
Some more useful anti-patterns can be imagined in the case of lists:
...
%match(listSubject) {
!List(_*,!a(),_*) -> {
// matches a list that contains only 'a()'
}
!List(X*,X*) -> {
// matches a non-symmetrical list
}
!List(_*,x,_*,x,_*) -> {
// matches a list that has all the elements distinct
}
List(_*,x,_*,!x,_*) -> {
// matches a list that has at least two elements that are distinct
}
}
...
2.5 Hand-written mappings
Up to now, we have considered examples where the implementation of
data-types is either predefined (Xml case), or generated by a tool
(Gom case).
One interesting contribution of Tom is to be customizable. By
defining a mapping between the signature formalism (%typeterm,
%op, etc.) and the concrete implementation, it
becomes possible to perform pattern matching against any
data-structure.
2.5.1 Defining a simple mapping
In this section, we consider that we have a Java library for
representing tree based data structures.
On another side, to be able to use Tom, we consider the following abstract
data-type:
%typeterm Nat
%op Nat zero()
%op Nat suc(Nat)
By doing so, we have defined a sort (Nat), and two operators
(zero and suc).
When defining a mapping, the goal consists in explaining to Tom how
the considered abstract data-type is implemented, and how a term (over
this signature) can be de-constructed.
For expository reasons, the ATerm library is the Java
library used to implement terms. However, any other tree/term based library
could have been used instead.
In order to define a correct mapping, we have to describe how the
algebraic sort (Nat in our example) is implemented (by the
ATermAppl class).
This is done via the implement construct:
%typeterm Nat {
implement { ATermAppl }
}
The second part of the mapping definition consists in defining how the
symbols (zero and suc) are represented in memory. This
is done via the is_fsym construct:
%op Nat zero() {
is_fsym(t) { t.getName().equals("zero") }
}
%op Nat suc(pred:Nat) {
is_fsym(t) { t.getName().equals("suc") }
get_slot(pred,t) { (ATermAppl)t.getArgument(0) }
}
In addition, get_slot describes how to retrieve a subterm of a
term.
Given a term built over the ATerm library, it becomes possible to
perform pattern matching as previously explained.
2.5.2 Using backquote constructs
When using the backquote construct (`suc(suc(zero)) for
example), Tom has to know how to build a term in memory. For this
purpose, we consider an extension of the signature definition
formalism: the make construct.
%op Nat zero() {
is_fsym(t) { t.getName().equals("zero") }
make { SingletonFactory.getInstance().makeAppl(
SingletonFactory.getInstance().makeAFun("zero",0,false)) }
}
%op Nat suc(pred:Nat) {
is_fsym(t) { t.getName().equals("suc") }
get_slot(pred,t) { (ATermAppl)t.getArgument(0) }
make(t) { SingletonFactory.getInstance().makeAppl(
SingletonFactory.getInstance().makeAFun("suc",1,false),t) }
}
The makeAFun function has three arguments: the function name,
the number of arguments and a boolean that indicates if the quotes
are included in the function name or not.
Given this mapping, Tom can be used as previously: the following
function implements the addition of two Peano integers:
public ATermAppl plus(ATermAppl t1, ATermAppl t2) {
%match(t2) {
zero() -> { return t1; }
suc(y) -> { return `suc(plus(t1,y)); }
}
return null;
}
2.5.3 Advanced examples
Let us suppose that we have a Java class Person with two getters
(getFirstname and getLastname).
In order to illustrate the signature definition formalism, we try to
redefine (without using Gom) the abstract data type for the sort
Person.
The first thing to do consists in defining the Tom sort
Person:
%typeterm Person {
implement { Person }
}
To avoid any confusion, we use the same name twice: the Tom sort
Person is implemented by the Java class Person.
When declaring an operator, we defined the behavior as shown in
the previous example:
%op Person person(firstname:String, lastname:String) {
is_fsym(t) { t instanceof Person }
get_slot(firstname,t) { t.getFirstname() }
get_slot(lastname,t) { t.getLastname() }
make(t0, t1) { new Person(t0, t1) }
}
In this example, we illustrate another possibility offered by Tom:
being able to know whether a term is rooted by a symbol without
explicitly representing this symbol.
The is_fsym(t) construct should return true when the
term t is rooted by the algebraic operator we are currently
defining.
In this example, we say that the term t is rooted by the symbol
person when the object t is implemented by an instance
of the class Person. By doing so, we do not explicitly
represent the symbol person, even if it could have been done
via the reflective capabilities of Java (by using
Person.getClass() for example).
2.5.4 Using list-matching
In this section, we show how to describe a mapping for associative
operators.
%typeterm TomList {
implement { ArrayList }
equals(l1,l2) { l1.equals(l2) }
}
Assuming that the sort Element is already defined, we can use
the %oparray construct to define an associative operator.
We also have to explain to Tom how to compute the size of a list,
and how to access a given element. This is done via get_size
and get_element constructs:
%oparray TomList conc( Element* ) {
is_fsym(t) { t instanceof ArrayList }
get_element(l,n) { (ATermAppl)l.get(n) }
get_size(l) { l.size() }
make_empty(n) { myEmpty(n) }
make_append(e,l) { myAdd(e,(ArrayList)l) }
}
This construct is similar to %op except that additional
information have to be given: how to build an empty list
(make_empty), and how to add an element to a given list
(make_append).
The auxiliary Java functions are defined as follows:
private static ArrayList myAdd(Object e,ArrayList l) {
l.add(e);
return l;
}
private static ArrayList myEmpty(int n) {
ArrayList res = new ArrayList(n);
return res;
}
Usually, we use an associative operator to represent (in a abstract
way) a list data structure. There are many ways to implement a list,
but the two most well-known are the use of array based list, and the use
of linked-list. The previously described mapping shows how to map an
abstract list to an array based implementation.
Tom offers another similar construct %oplist to map an associative
operator to a linked-list based implementation. When using the ATerm library
for example, a possible implementation could be:
%typeterm TomTerm {
implement { aterm.ATermAppl }
equals(t1, t2) { t1==t2 }
}
%typeterm TomList {
implement { ATermList }
equals(l1,l2) { l1==l2 }
}
%oplist TomList conc( TomTerm* ) {
is_fsym(t) { t instanceof aterm.ATermList }
make_empty() { aterm.pure.SingletonFactory.getInstance().makeList() }
make_insert(e,l) { l.insert(e) }
get_head(l) { (aterm.ATermAppl)l.getFirst() }
get_tail(l) { l.getNext() }
is_empty(l) { l.isEmpty() }
}