Language Basics – Level 1

In this first part, we introduce the basic notions provided by Tom: the definition of a data-type, the construction of data (i.e. trees), and the transformation of such data using pattern-matching.

5.1 Defining a data-structure

One of the most simple Tom programs is the following. It defines a data-type to represent Peano integers and builds the integers 0=zero and 1=suc(zero):

import main.peano.types.*;

public class Main {
  %gom {
    module Peano
    abstract syntax
    Nat = zero()
        | suc(pred:Nat)
        | plus(x1:Nat, x2:Nat)
   }

  public final static void main(String[] args) {
    Nat z   = `zero();
    Nat one = `suc(z);
    System.out.println(z);
    System.out.println(one);
  }
}

The %gom{ …} constructs defines a data-structure, also called signature or algebraic data-type. This data-structure declares a sort (Nat) that has three operators (zero, suc, and plus). These operators are called constructors, because they are used to construct the data-structure. zero is a constant (arity 0), whereas the arity of suc and plus is respectively 1 and 2. Note that a name has to be given for each slot (pred for suc, and x1, x2 for plus).

Note for the Java programmer: in our terminology, the notion of sort or type (like Nat) corresponds to the notion of abstract class. A constructor (such as zero, suc, or plus) can be seen as a class that extends the abstract class. The notion of slot corresponds to a field.

A second construct provided by Tom is the ‘ (backquote), which builds an object. The expression ‘zero() builds the object zero of sort Nat. This is equivalent to new zero().

In the example, the variable z is a Java object that can be used in ‘suc(z) to build the object suc(zero()). In other words, an algebraic data-structure definition can be seen as a grammar. The ‘ construct allows to build trees (Abstract Syntax Trees) over this grammar.

The implementation generated by Gom is such that objects can be converted into readable strings, using the toString() method.

In the following we assume that Tom is installed, as explained in chapter 14. To run the example, do a copy and paste into Main.t, then compile the file using the command: tom Main.t. Then, compile the generated Java file: javac Main.java, and run it with java Main, as shown below:

$ tom Main.t
$ javac Main.java
$ java Main
zero()
suc(zero())

Note: to use the code generated by a %gom {...}, an import statement has to be added: import main.peano.types.*;

The path corresponds to the name of the class (Main), followed by the name of the module (Peano), followed by types.*. These package names are in lowercase, even if the class name and the module name contain capitals.

5.2 Retrieving information in a data-structure

In addition to %gom and ‘, Tom provides a third construct: %match. Using the same program, we add a method evaluate(Nat n) and we modify the method run():

import main.peano.types.*;

public class Main {
  %gom {
    module Peano
    abstract syntax
    Nat = zero()
        | suc(pred:Nat)
        | plus(x1:Nat, x2:Nat)
  }

  public final static void main(String[] args) {
    Nat two = `plus(suc(zero()),suc(zero()));
    System.out.println(two);
    two = evaluate(two);
    System.out.println(two);
  }

  public static Nat evaluate(Nat n) {
    %match(n) {
      plus(x, zero()) -> { return `x; }
      plus(x, suc(y)) -> { return `suc(plus(x,y)); }
    }
    return n;
  }
}

The %match construct is similar to switch/case in the sense that given a tree (n), the %match selects the first pattern that matches the tree, and executes the associated action. By matching, we mean: giving values to variables to make the pattern equal to the subject. In our example n has the value plus(suc(zero()),suc(zero())). The first pattern does not match n, because the second subterm of n is not a zero(), but a suc(zero()). In this example, the second pattern matches n and gives the value suc(zero()) to x, and zero() to y.

When a pattern matches a subject, we say that the variables (x and y in our case) are instantiated. They can then be used in the action part (also called right-hand side). Similarly, the second rules can be applied when the second subterm is rooted by a suc. In that case, x and y are instantiated.

The method evaluate defines the addition of two integers represented by Peano successors (i.e. zero and suc). The first case says that the addition of an integer with zero is evaluated into the integer itself. The second case builds a new term (rooted by suc) whose subterm (plus(x,y)) denotes the addition of x and y.

plus(suc(zero()),suc(zero()))
suc(plus(suc(zero()),zero()))

Note: In a match construct, the variables should not be declared by the programmer. They are written without (), and their type is inferred. A variable introduced should be fresh: it should not be declared earlier in a wrapping block, either in Java, either in a %match construct. A Tom variable can be used in the corresponding action, but has to be introduced by a ‘ construct. Note that the scope of a back-quote expression is delimited by well-parenthesized sentences (‘zero(), ‘suc(zero()), ‘suc(x)). The special case ‘x, corresponding to a variable alone, is also correct. To write complex expressions, extra parentheses can be added: ‘(x), ‘(x == y) for example.

5.3 Using rules to simplify expressions

In this section, we will show how to use the notion of rules to simplify expressions. Suppose that we want to simplify boolean expressions. It is frequent to define the relations between expressions using a set of simplification rules like the following:

To encode such a simplification system, we have to implement a function that simplifies an expression until no more reduction can be performed. This can of course be done using functions defined in Java, combined with the %match constructs, but it is more convenient to use an algebraic construct that ensures that a rule is applied whenever a reduction is possible. For this purpose, the %gom construct provides a rule() construct where the left and the right-hand sides are terms. The previous simplification system can be defined as follows:

import gates.logic.types.*;

public class Gates {
  %gom {
    module Logic
      imports int
      abstract syntax
      Bool = Input(n:int)
           | True()
           | False()
           | Not(b:Bool)
           | Or(b1:Bool, b2:Bool)
           | And(b1:Bool, b2:Bool)
           | Nand(b1:Bool, b2:Bool)
           | Xor(b1:Bool, b2:Bool)

      module Logic:rules() {
        Not(a)   -> Nand(a,a)
        Or(a,b)  -> Nand(Not(a),Not(b))
        And(a,b) -> Not(Nand(a,b))
        Xor(a,b) -> Or(And(a,Not(b)),And(Not(a),b))
        Nand(False(),_)     -> True()
        Nand(_,False())     -> True()
        Nand(True(),True()) -> False()
      }
  }

  public final static void main(String[] args) {
    Bool b = `Xor(True(),False());
    System.out.println("b = " + b);
  }
}

When using the module Logic:rules() { ... } constructs, the simplification rules are integrated into the data-structure. This means that the rules are applied any time it is possible to do a reduction. The user does not have any control on them, and thus cannot prevent from applying a rule. Of course, the simplification system should be terminating, otherwise, infinite reductions may occur. When compiling and executing the previous program, we obtain b = True().

5.4 Separating Gom from Tom (*)

When programming large applications, it may be more convenient to introduce several classes that share a common data-structure. In that case, the %gom {...} construct could be avoided and a Gom file (Logic.gom for example) could be used instead:

module Logic
imports int
abstract syntax
Bool = Input(n:int)
     | True()
     | False()
     | Not(b:Bool)
     | Or(b1:Bool, b2:Bool)
     | And(b1:Bool, b2:Bool)
     | Nand(b1:Bool, b2:Bool)
     | Xor(b1:Bool, b2:Bool)

module Logic:rules() {
  Not(a)   -> Nand(a,a)
  Or(a,b)  -> Nand(Not(a),Not(b))
  And(a,b) -> Not(Nand(a,b))
  Xor(a,b) -> Or(And(a,Not(b)),And(Not(a),b))

  Nand(False(),_)     -> True()
  Nand(_,False())     -> True()
  Nand(True(),True()) -> False()
}

This generates several Java classes, among them a particular file called Logic.tom which explains to Tom how the data-structure is implemented. This process is hidden when using the %gom {...} construct:

$ ls logic
_Logic.tom              Logic.tom               LogicAbstractType.java
strategy                types

One of the simplest Tom program that uses the defined data-structure is the following:

import logic.types.*;
public class Main {

  %include{ logic/Logic.tom }

  public final static void main(String[] args) {
    Bool b = `Xor(True(),False());
    System.out.println("b = " + b);
  }
}

Note: since classes are generated in the logic directory, the statement import logic.types.*; is needed. The name comes from the name of the module (Logic).

In the Tom program, %include { logic/Logic.tom } is necessary to include (like a copy and paste) the Logic.tom file generated by gom. The %include command is similar to the #include command provided by the C-preprocessor.

5.5 Programming in Tom

In this section we present how Tom can be used to describe the abstract syntax and to implement an interpreter for a given language. We also give some tips and more information about the generated code.

When using %gom { ... }, the generated data-structure is particular: data are maximally shared. This technique, also known as hash-consing ensures that two identical subterms are implemented by the same objects. Therefore, the memory footprint is minimal and the equality-check can be done in constant time using the == construct (two terms are identical if the pointers are equal). As a consequence, a term is not mutable: once created, it cannot be modified.

Even if the generated API offers getters and setters for each defined slot (Bool getb1(), Bool getb2(), Bool setb1(Bool t) and Bool setb2(Bool t) for the And(b1:Bool,b2:Bool) defined below). These methods do not really modify the term on which they are applied: t.setb1(‘True()) will return a copy of t where the first child is set to True(), but the previous t is not modified. There is no side-effect in a maximally shared setting.

In Tom, it is quite easy to quickly develop applications which manipulates trees. As an example, let us consider a tiny language composed of boolean expressions (True, False, And, Or, and Not), expressions (constant, variable, Plus, Mult, and Mod), and instructions (Skip, Print, ;, If, and While). The abstract syntax of this language can be described as follows:

import pico1.term.types.*;
import java.util.*;

class Pico1 {
  %gom {
    module Term
    imports int String
    abstract syntax
    Bool = True()
         | False()
         | Not(b:Bool)
         | Or(b1:Bool, b2:Bool)
         | And(b1:Bool, b2:Bool)
         | Eq(e1:Expr, e2:Expr)

    Expr = Var(name:String)
         | Cst(val:int)
         | Plus(e1:Expr, e2:Expr)
         | Mult(e1:Expr, e2:Expr)
         | Mod(e1:Expr, e2:Expr)

    Inst = Skip()
         | Assign(name:String, e:Expr)
         | Seq(i1:Inst, i2:Inst)
         | If(cond:Bool, i1:Inst, i2:Inst)
         | While(cond:Bool, i:Inst)
         | Print(e:Expr)
  }
  ...
}

Assume that we want to write an interpreter for this language, we need a notion of environment to store the value assigned to a variable. A simple solution is to use a Map which associate an expression (of sort Expr) to a name of variable (of sort String). Given this environment, the evaluation of an expression can be implemented as follows:

  public static Expr evalExpr(Map env,Expr expr) {
    %match(expr) {
      Var(n)                -> { return (Expr)env.get(`n); }
      Plus(Cst(v1),Cst(v2)) -> { return `Cst(v1 + v2); }
      Mult(Cst(v1),Cst(v2)) -> { return `Cst(v1 * v2); }
      Mod(Cst(v1),Cst(v2))  -> { return `Cst(v1 % v2); }
      // congruence rules
      Plus(e1,e2) -> { return `evalExpr(env,Plus(evalExpr(env,e1),evalExpr(env,e2))); }
      Mult(e1,e2) -> { return `evalExpr(env,Mult(evalExpr(env,e1),evalExpr(env,e2))); }
      Mod(e1,e2)  -> { return `evalExpr(env,Mod(evalExpr(env,e1),evalExpr(env,e2)));  }

      x -> { return `x; }
    }
    throw new RuntimeException("should not be there: " + expr);
  }

Note: the congruence rules are needed to enforce reductions in sub-expressions of Plus, Mult, and Mod.

We will see later how they can be avoided using the notion of strategy.

  public static Bool evalBool(Map env,Bool bool) {
    %match(bool) {
      Not(True())     -> { return `False(); }
      Not(False())    -> { return `True(); }
      Not(b)          -> { return `evalBool(env,Not(evalBool(env,b))); }

      Or(True(),_)    -> { return `True(); }
      Or(_,True())    -> { return `True(); }
      Or(False(),b2)  -> { return `b2; }
      Or(b1,False())  -> { return `b1; }
      And(True(),b2)  -> { return `b2; }
      And(b1,True())  -> { return `b1; }
      And(False(),_)  -> { return `False(); }
      And(_,False())  -> { return `False(); }

      Eq(e1,e2) -> {
        Expr x = `evalExpr(env,e1);
        Expr y = `evalExpr(env,e2);
        return (x==y)?`True():`False();
      }

      x -> { return `x; }
    }
    throw new RuntimeException("should not be there: " + bool);
  }

Once defined the methods evalExpr and evalBool, it becomes easy to define the interpreter:

  public static void eval(Map env, Inst inst) {
    %match(inst) {
      Skip() -> {
        return;
      }

      Assign(name,e) -> {
        env.put(`name,evalExpr(env,`e));
        return;
      }

      Seq(i1,i2) -> {
        eval(env,`i1);
        eval(env,`i2);
        return;
      }

      Print(e) -> {
        System.out.println(evalExpr(env,`e));
        return;
      }

      If(b,i1,i2) -> {
        if(evalBool(env,`b)==`True()) {
          eval(env,`i1);
        } else {
          eval(env,`i2);
        }
        return;
      }

      w@While(b,i) -> {
        Bool cond = evalBool(env,`b);
        if(cond==`True()) {
          eval(env,`i);
          eval(env,`w);
        }
        return;
      }
    }
    throw new RuntimeException("strange term: " + inst);
  }

Note: the last two rules use the conditional test if from Java to implement a conditional rule.

Note also the use of w@... that introduces a variable w (an alias). This variable can be used in the action part to avoid building the term While(b,i).

To play with the Pico language, we just have to initialize the environment (env), create programs (p1 and p2), and evaluate them (eval):

  public final static void main(String[] args) {
    Map env = new HashMap();

    Inst p1 = `Seq(Assign("a",Cst(1)) , Print(Var("a")));
    System.out.println("p1: " + p1);
    eval(env,p1);

    Inst p2 = `Seq(Assign("i",Cst(0)),
                   While(Not(Eq(Var("i"),Cst(10))),
                         Seq(Print(Var("i")),
                             Assign("i",Plus(Var("i"),Cst(1))))));
    System.out.println("p2: " + p2);
    eval(env,p2);
  }

Chapter 5 Language Basics – Level 1

5.1 Defining a data-structure

5.2 Retrieving information in a data-structure

5.3 Using rules to simplify expressions

5.4 Separating Gom from Tom (*)

5.5 Programming in Tom