Previous Up Next

Chapter 6  Language Basics – Level 2

6.1  The “Hello World” example

A very simple program is the following:

public class HelloWorld {
  %include { string.tom }

  public final static void main(String[] args) {
    String who = "World";
    %match(who) {
      "World" -> { System.out.println("Hello " + who); }
      _       -> { System.out.println("Don't panic"); }
    }
  }
}

The %include { string.tom } construct imports the predefined (Java) mapping which defines the Tom String algebraic sort. Thus, the sort String can be used in the %match construct.

This examples shows that pattern matching can be performed against built-in data-types. In this example, the %match construct is equivalent to a switch/case construct: as expected the first print statement is executed, but the second one is also executed since no break statement has been executed.

To use a break statement in a %match construct, we recommend to use the notion of labeled block:

public class HelloWorld {
  %include { string.tom }

  public final static void main(String[] args) {
    String who = "World";
matchblock: {
      %match(who) {
        "World" -> { System.out.println("Hello " + who);
                     break matchblock;
                   }
        _       -> { System.out.println("Don't panic"); }
      }
    }
  }
}

6.2  “Hello World” revisited: introduction to list-matching

One particularity of Tom is to simplify the manipulation of lists.

It is natural to consider a string as a list of characters. Characters can be concatenated to from a string, and string can also be concatenated. In Tom, we have to give a name to this concatenation operator. Let us call it concString.

Thus, the string “Hello” is equivalent to the list of characters concString(’H’,’e’,’l’,’l’,’o’).

Note: concString does not have a fixed arity. concString() corresponds to the empty list of characters, i.e. the string “”.

Given a string, to check if the string contains the character e, we can use the following matching construct:

public class HelloWorld {
  %include { string.tom }
  public final static void main(String[] args) {
    %match("Hello") {
      concString(_*,'e',_*) -> { System.out.println("we have found a 'e'"); }
    }
  }
}

In this example, _* corresponds to an anonymous variable which can be instantiated by any list of characters (possibly reduced to the empty list).

In order to capture the context (i.e. what is before and after the e), it is possible to use named variables:

  %match(t) {
    concString(before*,'e',after*) -> {
      System.out.println("we have found a 'e'" +
                         " after " + `before* +
                         " but before " + `after*);
    }
  }

In this example, we have introduced two new variables (before* and after*), called “variable-star”, which are instantiated by a list of characters, instead of a single character (if the * was not there).

Suppose now that we look for a word whose last letter is a o:

  %match(t) {
    concString(before*,'o') -> { ... }
  }

Using this mechanism, it also becomes possible to look for a word which contains an e somewhere and an o in last position:

  %match(t) {
    concString(before*,'e',_*,'o') -> { ... }
  }

Note that a single pattern could provide several outcomes:

  %match(t) {
    concString(before*,'l',after*) -> {
      System.out.println("we have found " + `before* +
                         " before 'l' and " + `after* + " after");
    }
  }

When applied to “Hello”, there are two possible solutions for before* and after*:

  we have found he before 'l' and lo after
  we have found hel before 'l' and o after

Let us suppose that we look for two consecutive l anywhere in the matched expression. This could be expressed by:

  %match(t) {
    concString(_*,'l','l',_*) -> { ... }
  }

Since this syntax is error prone when looking for long/complex substrings, Tom provides an abbreviated notation: ll:

  %match(t) {
    concString(_*,'ll',_*) -> { ... }
  }

This notation is fully equivalent to the previous one.

6.3  String matching – continued

In the following, we consider the program:

public class StringMatching {
  %include { string.tom }

  public final static void main(String[] args) {
    String s = "abcabc";
    %match(s) {
      concString(_*,x,_*,x,_*)    -> { System.out.println("x = " + `x); }
      concString('a',_*,y,_*,'c') -> { System.out.println("y = " + `y); }
      concString(C1*,'abc',C2*)   -> { System.out.println("C1 = " + `C1*); }
      concString(_*,z@'bc',_*)    -> { System.out.println("z = " + `z); }
      concString(_*,L*,_*,L*,_*)  -> { if(`L*.length() > 0) {
                                        System.out.println("L = " + `L*); } }
  }
}

As illustrated previously, we can use variables to capture contexts. In fact, this mechanism is more general and we can use a variable anywhere to match something which is not statically known. The following pattern looks for two characters which are identical:

  concString(_*,x,_*,x,_*) -> { System.out.println("x = " + `x); }

Since list matching is not unitary, there may be several results. In this case, we obtain:

x = a
x = b
x = c

The second pattern looks for a character in a string which should begin with a a and end with a c:

  concString('a',_*,y,_*,'c') -> { System.out.println("y = " + `y); }

The results are:

y = b
y = c
y = a
y = b

The third pattern look for the substring abc anywhere in the string:

  concString(C1*,'abc',C2*) -> { System.out.println("C1 = " + `C1*); }

When the substring is found, the prefix C1* is:

C1 =
C1 = abc

The last pattern illustrates the search of two identical substrings:

  concString(_*,L*,_*,L*,_*) -> { if(`L*.length() > 0) {
                                  System.out.println("L = " + `L*); } }

The results are:

L = a
L = ab
L = abc
L = b
L = bc
L = c

To look for a palindrome, you can use the following pattern:

  %match(t) {
    concString(x*,y*,y*,x*) -> { /* we have found a palindrome */ }
  }

6.4  List matching – Associative matching

As illustrated previously, Tom supports a generalized form of string matching, called list matching, also known as associative matching with neutral element. In Tom, some operators are special and do not have a fixed number of arguments. This intuitively corresponds to the idea of list where elements are stored in a given order. A list of elements may be written (a,b,c) or [a,b,c]. In Tom, it is written f(a(),b(),c()), where f is this special operator. In some sense, f is the name of the list. This allows to distinguish list of apples, from list of oranges for examples.

The definition of such operators can be done using Gom:

import list1.list.types.*;
public class List1 {
  %gom {
    module List
    abstract syntax
    E = a()
      | b()
      | c()
    L = f( E* )
   }
   ...
}

Once this signature defined, it becomes possible to define patterns and expressions. For example, the function that removes identical consecutive elements can be expressed as follows:

  public static L removeDouble(L l) {
    %match(l) {
      f(X1*,x,x,X2*) -> {
        return removeDouble(`f(X1*,x,X2*));
      }
    }
    return l;
  }

This example is interesting since it expresses a complex operation in a concise way: given a list l, the %match construct looks for two identical consecutive elements (f(X1*,x,x,X2*) is called a non-linear pattern since x appears twice). If there exists such two elements, the term f(X1*,x,X2*) is built (an x has been removed), and the removeDouble function is called recursively. When there is no such two elements, the pattern does not match, this means that the list does not contain any consecutive identical elements. Therefore, the instruction return l is executed and the function returns.

Similarly, a sorting algorithm can be implemented: if a list contains two elements in the wrong order, just swap them. This can be expressed as follows:

  public static L swapSort(L l) {
    %match(l) {
      f(X*,e1,Z*,e2,Y*) -> {
        if(`gt(e1,e2)) {
          return `swapSort(f(X*,e2,Z*,e1,Y*));
        }
      }
    }
    return l;
  }

  private static boolean gt(E e1, E e2) {
    return e1.toString().compareTo(e2.toString()) > 0;
  }

In this example, the order is implemented by the gt function, using the lexicographical ordering provided by Java.

Note: given a list l = f(b(),b(),a(),a()), there exists several ways to match the pattern. In this case, there are 6 possibilities:
1. X*=f(),        e1=b(), Z*=f(),        e2=b(), Y*=f(a(),a())
2. X*=f(),        e1=b(), Z*=f(b()),     e2=a(), Y*=f(a())
3. X*=f(),        e1=b(), Z*=f(b(),a()), e2=a(), Y*=f()
4. X*=f(b()),     e1=b(), Z*=f(),        e2=a(), Y*=f(a)
5. X*=f(b()),     e1=b(), Z*=f(a()),     e2=a(), Y*=f()
6. X*=f(b(),b()), e1=a,   Z*=f(),        e2=a(), Y*=f()
Assuming that a < b, there are only 4 solutions (2, 3, 4, and 5 since e1 must be greater than e2) that can be used to apply the rule

It becomes important to remind you how a rule is applied in Tom:

  1. a rule whose pattern match is selected (in our case, there is only one rule),
  2. then, for each solution of the matching problem (there are 6 solutions in our case), the right part is executed (i.e. the Java code).

    Let us suppose that Tom computes the solution 1. first (the order in which solutions are computed is not fixed and depends on the implementation). In that case, the test if(‘gt(e1,e2)) is evaluated. Since b() is not greater than b(), the return swapSort(...) is not performed.

    As specified in Section 10.2.2, another solution is computed. Let us say 2.. In that case, the test becomes true and the function swapSort is called recursively.

  3. when there is no more solution (i.e. this means that the list is already sorted), the control is transfered to the next pattern. Since there is no more pattern in our example, the return l is executed, which returns the sorted list.

The following code shows how to build a list and call the function defined previously. This results in sorted lists where elements only occur once.

  public final static void main(String[] args) {
    L l = `f(a(),b(),c(),a(),b(),c(),a());
    L res1 = swapSort(l);
    L res2 = removeDouble(res1);
    System.out.println(" l       = " + l);
    System.out.println("sorted l = " + res1);
    System.out.println("single l = " + res2);
  }

Previous Up Next