Chapter 1 Language Basics

1.1 Defining a data-structure

One of the most simple Tom program is the following:

import main.peano.types.*;

public class Main {
  %gom {
    module Peano
    abstract syntax
    Nat = zero()
        | suc(pred:Nat)
        | plus(x1:Nat, x2:Nat)
   }

  public final static void main(String[] args) {
    Nat z   = `zero();
    Nat one = `suc(z);
    System.out.println(z);
    System.out.println(one);
  }
}

The %gom{ ...} constructs defines a data-structure, also called algebraic data-type. This data-structure declares a sort (Nat) that has three operators (zero, suc, and plus). These operators are called constructors, because they are used to construct the data-structure. zero is a constant (arity 0), whereas the arity of suc and plus is respectively 1 and 2. In Gom, a name has to be given for each slot (pred for suc, and x1, x2 for plus).

A second construct provided by Tom is the ` (backquote), which builds an object. The expression `zero() builds the object zero() of sort Nat. Therefore, the variable z is a Java object that can be used to build the object suc(zero()). In other words, an algebraic data-structure definition can be seen as a grammar. The ` construct allows to build trees (Abstract Syntax Trees) over this grammar.

The implementation generated by Gom is such that objects can be converted into readable strings, using the toString() method.

We assume that Tom is installed, as explained in chapter 9. First we compile the file Main.t using the tom command, then we compile the generated Java file Main.java using javac and we run it with java, as shown below:

$ tom Main.t
$ javac Main.java
$ java Main
zero
suc(zero)

1.2 Retrieving information in a data-structure

In addition to %gom and `, Tom provides a third construct: %match. Using the same program, we add a method evaluate(Nat n) and we modify the method run():

  public static Nat evaluate(Nat n) {
    %match(n) {
      plus(x, zero()) -> { return `x; }
      plus(x, suc(y)) -> { return `suc(plus(x,y)); }
    }
    return n;
  }

  public final static void main(String[] args) {
    Nat two = `plus(suc(zero()),suc(zero()));
    System.out.println(two);
    two = evaluate(two);
    System.out.println(two);
  }

The %match construct is similar to switch/case in the sense that given a tree (generally called subject), the %match selects the first pattern that matches the tree, and executes the associated action. By matching, we mean: finding the instances of variables that make the pattern and the subject identical. In our example, the first pattern matches the subject n when n is a tree rooted by plus, and whose second subterm corresponds to the tree zero(). When it is the case, the variable x takes the first subterm as value. We say that a tree is rooted by a symbol when its top-node corresponds to this symbol. When a pattern matches, we say that its variables are instantiated. They can then be used in the action part (also called right-hand side). Similarly, the second rules can be applied when the second subterm is rooted by a suc. In that case, x and y are instantiated.

The method evaluate defines the addition of two integers represented by Peano successors (i.e. zero and suc). The first case says that the addition of an integer with zero is evaluated into the integer itself. The second case builds a new term (rooted by suc) whose subterm (plus(x,y)) denotes the addition of x and y.

When executing the run() method, we obtain:

plus(suc(zero),suc(zero))
suc(plus(suc(zero),zero))

In a match construct, the variables should not be declared by the programmer. They are written without (), and their type is inferred. A variable introduced should be fresh: it should not be declared earlier in a wrapping block, either in Java, either in a %match construct. A Tom variable can be used in the corresponding action, but has to be wrapped in a ` construct. Note that the scope of a back-quote expression is delimited by well-parenthesis sentences (`zero(), `suc(zero()), `suc(x)). The special case `x, corresponding to a variable alone, is also correct. To write complex expressions, extra parenthesis can be added: `(x), `(x == y) for example.

1.3 Combining Tom and Java

As illustrated below, Tom offers a simple way to retrieve information stored in subterms. The previous example describes a single-step when adding two integers. To get a canonical form (a term that can no longer be reduced), we need to encode a loop, to reach a fix-point. We also have to allow reductions in sub-trees (i.e. a plus which is not at a top-position):

  public static Nat evaluate(Nat n) {
    %match(n) {
      plus(x, zero()) -> { return `x; }
      plus(x, suc(y)) -> { return `suc(plus(x,y)); }
      // congruence rules
      suc(x)          -> { return `suc(evaluate(x)); }
      plus(x,y)       -> { return `plus(evaluate(x),evaluate(y)); }
    }
    return n;
  }

  public final static void main(String[] args) {
    Nat t = `plus(suc(zero()),suc(zero()));
    System.out.println(t + " -> " + evaluate(t));
    t = `plus(suc(zero()),plus(suc(zero()),suc(zero())));
    System.out.println(t + " -> " + evaluate(t));
  }

By adding two new congruence rules, a term that cannot be reduced by the first two rule will be recursively inspected by the two last rules. Therefore, if a transformation can be performed (at top position, or in sub-trees), it will be done:

plus(suc(zero),suc(zero)) -> suc(plus(suc(zero),zero))
plus(suc(zero),plus(suc(zero),suc(zero))) -> plus(suc(zero),suc(plus(suc(zero),zero)))

We use Java (while-loop or recursion) to compute a fix-point:

  public static Nat loop(Nat n) {
    Nat tmp = `evaluate(n);
    if(tmp==n) {
      return n;
    }
    return loop(tmp);
  }

  public final static void main(String[] args) {
    Nat t = `plus(suc(zero()),suc(zero()));
    System.out.println(t + " -> " + loop(t));
    t = `plus(suc(zero()),plus(suc(zero()),suc(zero())));
    System.out.println(t + " -> " + loop(t));
  }

And we get the expected results:

plus(suc(zero),suc(zero)) -> suc(suc(zero))
plus(suc(zero),plus(suc(zero),suc(zero))) -> suc(suc(suc(zero)))

1.4 Programming in Tom

When using Gom, the generated data-structure is particular: data are maximally shared. This technique, also known as hash-consing ensures that two identical subterms are implemented by the same objects. Therefore, the memory footprint is minimal and the equality-check can be done is constant time: two terms are identical if the pointers are equal.

As a consequence, a term (implemented using Gom) is not mutable: once created, it cannot be modified. One can note that the generated API offers a setx1(Nat t) and a setx2(Nat t) method. These methods can be used to “modify” the first or the second child or a term rooted by a plus. Note that the term on which you call these methods is never modified: a copy where the first or the second child is modified is returned instead.

By combining Tom, Java, and Gom, it is quite easy to quickly develop applications which manipulates trees. As an example, let us consider a tiny language composed of boolean expressions (true, false, and, or, and not), expressions (constant, variable, plus, mult, and mod), and instructions (skip, print, ;, if, and while). The abstract syntax of this language can be described using Gom:

import pico1.term.types.*;
import java.util.*;

class Pico1 {
  %gom {
    module Term
    imports int String
    abstract syntax
    Bool = True()
         | False()
         | Not(b:Bool)
         | Or(b1:Bool, b2:Bool)
         | And(b1:Bool, b2:Bool)
         | Eq(e1:Expr, e2:Expr)

    Expr = Var(name:String)
         | Cst(val:int)
         | Plus(e1:Expr, e2:Expr)
         | Mult(e1:Expr, e2:Expr)
         | Mod(e1:Expr, e2:Expr)

    Inst = Skip()
         | Assign(name:String, e:Expr)
         | Seq(i1:Inst, i2:Inst)
         | If(cond:Bool, i1:Inst, i2:Inst)
         | While(cond:Bool, i:Inst)
         | Print(e:Expr)
  }
  ...
}

Assume that we want to write an interpreter for this language, we need a notion of environment to store the value assigned to a variable. A simple solution is to use a Map which associate an expression (of sort Expr) to a name of variable (of sort String). Given this environment, the evaluation of an expression can be implemented as follows:

  public static Expr evalExpr(Map env,Expr expr) {
    %match(expr) {
      Var(n)                -> { return (Expr)env.get(`n); }
      Plus(Cst(v1),Cst(v2)) -> { return `Cst(v1 + v2); }
      Mult(Cst(v1),Cst(v2)) -> { return `Cst(v1 * v2); }
      Mod(Cst(v1),Cst(v2))  -> { return `Cst(v1 % v2); }
      // congruence rules
      Plus(e1,e2) -> {
        return `evalExpr(env,Plus(evalExpr(env,e1),evalExpr(env,e2)));
      }
      Mult(e1,e2) -> {
        return `evalExpr(env,Mult(evalExpr(env,e1),evalExpr(env,e2)));
      }
      Mod(e1,e2)  -> {
        return `evalExpr(env,Mod(evalExpr(env,e1),evalExpr(env,e2)));
      }

      x -> { return `x; }
    }
    throw new RuntimeException("strange term: " + expr);
  }

Similarly, the evaluation of boolean expressions can be implemented as follows:

  public static Bool evalBool(Map env,Bool bool) {
    %match(bool) {
      Not(True())     -> { return `False(); }
      Not(False())    -> { return `True(); }
      Not(b)          -> { return `evalBool(env,Not(evalBool(env,b))); }

      Or(True(),b2)   -> { return `True(); }
      Or(b1,True())   -> { return `True(); }
      Or(False(),b2)  -> { return `b2; }
      Or(b1,False())  -> { return `b1; }
      And(True(),b2)  -> { return `b2; }
      And(b1,True())  -> { return `b1; }
      And(False(),b2) -> { return `False(); }
      And(b1,False()) -> { return `False(); }

      Eq(e1,e2) -> {
        Expr x=`evalExpr(env,e1);
        Expr y=`evalExpr(env,e2);
        return (x==y)?`True():`False();
      }

      x -> { return `x; }
    }
    throw new RuntimeException("strange term: " + bool);
  }

Once defined the methods evalExpr and evalBool, it becomes easy to define the interpreter:

  public static void eval(Map env, Inst inst) {
    %match(inst) {
      Skip() -> {
        return;
      }

      Assign(name,e) -> {
        env.put(`name,evalExpr(env,`e));
        return;
      }

      Seq(i1,i2) -> {
        eval(env,`i1);
        eval(env,`i2);
        return;
      }

      Print(e) -> {
        System.out.println(evalExpr(env,`e));
        return;
      }

      If(b,i1,i2) -> {
        if(evalBool(env,`b)==`True()) {
          eval(env,`i1);
        } else {
          eval(env,`i2);
        }
        return;
      }

      w@While(b,i) -> {
        Bool cond = evalBool(env,`b);
        if(cond==`True()) {
          eval(env,`i);
          eval(env,`w);
        }
        return;
      }
    }
    throw new RuntimeException("strange term: " + inst);
  }

Note: the last two rules use the conditional test if from Java to implement a conditional rule. Note also that terms are compared using the == operator, thanks to maximal sharing provided by Gom.

To play with the Pico language, we just have to initialize the environment (env), create programs (p1 and p3), and evaluate them (eval):

  public final static void main(String[] args) {
    Map env = new HashMap();

    Inst p1 = `Seq(Assign("a",Cst(1)) , Print(Var("a")));
    System.out.println("p1: " + p1);
    eval(env,p1);

    Inst p3 = `Seq(Assign("i",Cst(0)),
                   While(Not(Eq(Var("i"),Cst(10))),
                         Seq(Print(Var("i")),
                             Assign("i",Plus(Var("i"),Cst(1))))));
    System.out.println("p3: " + p3);
    eval(env,p3);
  }

Exercise: write a Pico program that computes prime numbers up to 100.