Documentation:Tom

From Tom

Jump to: navigation, search
Doc : Language Reference

Tom  > Gom  > Strategies  > Runtime Library  > Migration guide  > EMF

Contents

Notations

The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font ('like this'). Non-terminal symbols are set in italic font (like that). Square brackets [...] denote optional components. Parentheses with a trailing star sign (...)* denotes zero, one or several repetitions of the enclosed components. Parentheses with a trailing plus sign (...)+ denote one or several repetitions of the enclosed components. Parentheses (...) denote grouping.

Lexical conventions

Identifier ::= Letter ( LetterDigit'_''-' )*
Integer ::= (Digit )+
Double ::= (Digit)+ ['.'] (Digit)* ∣ '.' (Digit)+
String ::= '"' (Letter ∣ ('\' ('n''t''b''r''f''\''’''"') ) )* '"'
Letter ::= 'A' ... 'Z''a' ... 'z'
Digit ::= '0' ... '9'
Char ::= '’' (LetterDigit) '’'

Names

SubjectName ::= Identifier
Type ::= Identifier
SlotName ::= Identifier
HeadSymbol ::= Identifier
Integer
Double
String
Char
VariableName ::= Identifier
AnnotedName ::= Identifier
LabelName ::= Identifier
FileName ::= Identifier
AttributeName ::= Identifier
XMLName ::= Identifier
Name ::= Identifier

Tom constructs

A Tom program is a host language program (namely C, Java, or Caml) extended by several new constructs such as %match, %strategy, %include, %gom, or backquote. Tom is a multi-language compiler, and therefore its syntax depends on the host language syntax. But for simplicity, we only present the syntax of its constructs and explain how they can be integrated into the host language.

Using Java as a host-language, the following Tom program is correct:

public class HelloWorld {
  %include { string.tom }
    public final static void main(String[] args) {
    String who = "World";
    %match(who) {
      "World" -> { System.out.println("Hello " + who); }
      _       -> { System.out.println("Don't panic"); }
    }
  }
}

Tom program

A Tom program is a list of blocks, where each block is either a Tom construct, or a sequence of characters (host language code). When compiling a Tom program, the Tom constructs are transformed into host language code, and the result is a valid host language program. For instance, in the previous example, %include and %match constructs are replaced by function definitions and Java instructions, making the resulting program a correct Java program.

Syntax of a Tom program:

Tom ::= BlockList
BlockList ::=
( MatchConstruct
StrategyConstruct
BackQuoteTerm
IncludeConstruct
GomConstruct
TransformationConstruct
TypeTerm
Operator
OperatorList
OperatorArray
'{' BlockList '}'
)*
  • MatchConstruct is translated into a list of instructions. This construct may appear anywhere a list of instructions is valid in the host language.
  • StrategyConstruct is translated into a class definition. Since it is translated into a class, this construct is valid only for Java. For a more detailed documentation concerning strategies, please refer to this page.
  • BackQuoteTerm is translated into a function call.
  • IncludeConstruct is replaced by the content of the file referenced by the construct. Tom looks for include files in: ./packageName/, $TOM_HOME/share/jtom/ and <path>, where <path> is specified by option: --import <path>. If the file contains some Tom constructs, they are expanded.
  • GomConstruct allows to define a Gom grammar. This construct is replaced by the content of the generated mapping. See Section Gom Construct and Chapter Documentation:Gom for more details.
  • TransformationConstruct allows to write easily models transformations. It is compiled as a complex strategy which is called as every classical ones. For the moment, only Java and EMF are supported. Note that this construct is new and only available as an alpha version.
  • TypeTerm, as well as Operator, OperatorList, and OperatorArray are replaced by some functions definitions.

Match construct

The %match construct (MatchConstruct) is one of the main contributions of Tom. This construct can be seen as an extension of the SwitchCase construct in C or Java, except that patterns are no longer restricted to constants (chars or integers). Given an object (the subject) and a list of patterns, our goal is to find the first pattern that matches the subjects (i.e. that has a compatible shape). As in the SwitchCase construct, in a MatchConstruct the type of each pattern is expected to be either the same or a subtype of that of the subject. More formally, a pattern is a term built over variables and constructors. The latter ones describe the shape of the pattern, whereas the variables are holes that can be instantiated to capture a value. When we consider the term f(a(),g(b())), it has to be viewed as a tree based data-structure with f as a root and a() the first child. Similarly, b() is the unique child of g, which is the second child of the root f . We say that the pattern f(x,y) matches this term (called subject), because we can give values to x and y such that the pattern and the subject become equal: we just have to assign a() to x and g(b()) to y. Finding this assignment is called matching and instantiating. This is exactly what Tom is supposed to do. A pattern may of course contain subterms. Therefore, f(x,g(b())) or f(a(),g(y)) are valid patterns which match against the subject.

Assuming that s is a Java variable, referencing a term (the tree based object f(a(),g(b())) for example), the following Tom construct is valid:

%match(s) {
  f(a(),g(y)) -> { /* action 1: code that uses y */ }
  f(x,g(b())) -> { /* action 2: code that uses x */ }
  f(x,y)      -> { /* action 3: code that uses x and y */ }
}

The %match construct is defined as follows:

MatchConstruct ::= '%match' '(' MatchArguments ')' '{' ( PatternAction )* '}'
'%match' '{' ( ConstraintAction )* '}'
MatchArguments ::= [Type] Term ( ',' [Type] Term )*
PatternAction ::= [LabelName':'] PatternList '->' '{' BlockList '}'
PatternList ::= Pattern( ',' Pattern )* [ ('&&''||') Constraint ]
ConstraintAction ::= Constraint '->' '{' BlockList '}'
Constraint ::= Pattern '<<' [Type] Term
Constraint '&&' Constraint
Constraint '||' Constraint
'(' Constraint ')'
Term Operator Term
Operator ::= '>''>=''<''<=''==''!='

A MatchConstruct is composed of two parts:

  • a list of subjects (the arguments of %match). They can be declared with their respective types, for instance Exp s where Exp corresponds to the exact type or to a subtype of s in case of a downcast.
  • a list of PatternAction: this is a list of pairs (pattern,action), where an action is a set of host language instructions which is executed each time a pattern matches the subjects.

Since version 2.6 of the language, an additional syntax, based on constraints, is proposed. In this case, the MatchConstruct can be simply seen as a list of pairs (constraint,action) corresponding to ConstraintAction from the above syntax. For instance, the example we presented before can now be written:

%match {
  f(a(),g(y)) << s -> { /* action 1 : code that uses y */ }
  f(x,g(b())) << s -> { /* action 2 : code that uses x */ }
  f(x,y) << s      -> { /* action 3 : code that uses x and y */ }
}

The two versions are semantically equivalent. The expression p << s denotes a match constraint between the pattern p and the subject s. The advantage of this new syntax is the modularity that it offers. For instance, multiple constraints can be combined with the boolean connectors '&&' and '||', leading to a greater expressivity, sometimes closed to the one offered by regular expressions.

The two syntaxes are compatible. For instance the following construct is valid:

%match(s) {
  f(a(),g(y)) && ( y << a() || y << b() ) -> { /* action 1 : code that uses y */ }
  f(x,g(b())) -> { /* action 2 : code that uses x */ }
}

Note: The right-hand sides of the constraints may contain variables that can be found in the left-hand side of other constraints. For instance we may write:

%match(s) {
  f(a(),g(y)) && ( g(z) << y || f(z,_) << y ) -> { /* action 1 : code that uses y and z */ }
  f(x,g(b())) && f(y,a()) << someHostFunction(x) -> { /* action 2 : code that uses x and y */ }
}

where someHostFunction can be an arbitrary function from the host language.

Note: Constraints built using an operator are trivially translated into host code. For instance, the constraint x == g(a()) ... is translated into the Java code

if (x == g(a())) {
  ...
}

For expository reasons, we consider that a %match construct is evaluated in the following way:

  • given a list of subjects (they correspond to objects only composed of constructors, therefore without variables, usually called ground terms), the execution control is transferred to the first PatternAction whose patterns match the list of ground terms (in the case constraints are used, the execution control is transferred to the first ConstraintAction whose Constraint is evaluated to true).

Note: since version 2.4, a subject is no longer restricted to a host language variable. A Tom term built upon variables, constructors and host-language functions can be used. Note also that the sort (the type of the subjects) is now optional — it is inferred from the type of the patterns.

  • given such a PatternAction (respectively ConstraintAction), its variables are instantiated and the associated semantic action is executed. The instantiated variables are bound in the underlying host-language, and thus can be used in the action part.
  • if the execution control is transferred outside the %match instruction (by a goto, break or return for example), the matching process is finished. Otherwise, the execution control is transferred to the next PatternAction whose patterns match the list of ground terms (respectively to the next ConstraintAction whose Constraint evaluates to true).
  • when there is no more PatternAction whose patterns match the list of subjects (respectively no more ConstraintAction whose Constraint evaluates to true), the %match instruction is finished, and the execution control is transferred to the next instruction.

The semantics of a match construct may remind the switch/case construct. However, there is a big difference. For instance, in Tom we can have patterns (list-patterns) that can match in several ways a subject. Informally, when considering the subject conc(a(),b(),c()), and the pattern conc(_*,x,_*), there are three possible match for this problem: either x=a(), either x=b(), either x=c(). Note that _* is a special hole which can capture any sublist of conc(…). The list-matching is also known as associative matching. Besides this, some other matching theories are supported in Tom.

When taking this new possibility into account, the evaluation of a %match construct gets a little bit more complex:

  • given a PatternAction whose patterns match the list of ground terms (or a ConstraintAction whose Constraint is evaluated to true), the list of variables is instantiated and the associated semantic action is executed.
  • if the execution control is not transferred outside the %match instruction, in addition to the previous explanations, if the considered matching theory may return several matches, for each match, the free variables are instantiated and the associated semantic action is executed. This means that the same action may be executed several times, but in a different context: i.e. the variables have different instantiations.
  • when all matches have been computed (there is at most one match in the syntactic theory, i.e. when no special theory, as associativity for instance, is associated to the symbols used in the patterns), the execution control is transferred to the next PatternAction whose patterns match the list of ground terms (respectively to the next ConstraintAction whose Constraint evaluates to true).
  • as before, when there is no more PatternAction whose patterns match the list of subject (or respectively no ConstraintAction whose Constraint evaluates to true), the %match instruction is finished, and the execution control is transferred to the next instruction.

As mentioned in the BNF-syntax, a label may be attached to a pattern. In that case, in C and Java, it becomes possible to exit from the current PatternAction (using a goto or a break), without exiting from the whole %match construct. The control is transferred to the next pattern which matches the subjects (respectively to the next constraint). This feature is useful to exit from a complex associative-matching for which, for any reason, we are no longer interested in getting other solutions.

Note: the behavior is not determined if inside an action the subjects of the %match instruction under evaluation are modified.

When using the new syntax based on constraints, there are several restrictions that are imposed by the compiler (an error is generated if they are not respected):

  • no circular references among variables are allowed. For instance, something like x << x will generate an error. This verification works even when the cycles are less evident, like for instance for the following constraint: f(g(x),a()) << y && f(b(),y) << z && g(f(z,c())) << x.
  • when using disjunctions, all the variables that are used in the action have to be found in each member of the disjunction (for ensuring that no non-instantiated variable is used in the action). For instance, using the following ConstraintAction generates an error: x << s1 || y << s2 -> { /* code that uses y */ }

Tom pattern

As we can imagine, the behavior of a %match construct strongly depends on the patterns which are involved. The formalism which defines the syntax of a pattern is also an essential component of Tom. But because there exist several ways to define patterns with equivalent behavior, its formal definition is not so simple. Although, the different shortcuts help the programmer to simplify the definitions of patterns.

Informally, a pattern is a term built with constructors and variables (please note that x denotes a variable, whereas a() is a constructor). A variable can also be anonymous, and it is denoted by _. Let’s look at some examples of patterns: x, a(), f(a()), g(a(),x), or h(a(),_,x). When a pattern matches a subject, it may be useful to keep a reference to a matched subterm. The annotation mechanism (z@g(y) for example) can be used for this purpose. Thus, considering the matching between the pattern f(x,z@g(y)) and the subject f(a(),g(h(b()))), y is instantiated by h(b()), and z is instantiated by g(h(b())). This can be useful in C, to free the memory for example.

When identical actions have to be performed for a set of patterns which share a common structure, the disjunction of symbols may be used: pattern (f∣g)(a()) is equivalent to the set {f(a()), g(a())}. The disjunction of symbols may also be used in subterms, like in h( (f∣g)(x) ).

More formally, a Tom pattern and a Tom term has the following syntax:

Term ::= VariableName['*']
Name '('[Term ( ',' Term )*]')'
Pattern ::= [ AnnotedName '@' ] PlainPattern
PlainPattern ::= ['!']VariableName [ '*' ]
['!'] HeadSymbolList (ExplicitTermListImplicitPairList)
ExplicitTermList
'_'
'_*'
XMLTerm
HeadSymbolList ::= HeadSymbol [ '?' ]
'(' HeadSymbol ( '∣' HeadSymbol )+ ')'
ExplicitTermList ::= '(' [ Pattern ( ',' Pattern )*] ')'
ImplicitPairList ::= '[' [ PairPattern ( ',' PairPattern )*] ']'
PairPattern ::= SlotName '=' Pattern

Concerning the Term syntax, both Tom and host-language variables can be used and Name can be either a function symbol declared in Tom or the name of a method from the host language.

Note: since version 2.4, negative patterns, called anti-patterns have been introduced. See section Anti-Pattern for a detailed description.



A pattern is a term which could contain variables. When matching a pattern against a subject (a ground term), these variables are instantiated by the matching procedure (generated by Tom). In Tom, the variables do not have to be declared: their type is inferred automatically, depending on the context in which they appear.

As described previously, Tom offers several mechanisms to simplify the definition of a pattern:

  • standard notation: a pattern can be defined using a classical prefix term notation. To make a distinction between variables and constants, the latter have to be written with explicit parentheses, like x(). In this case, the corresponding Tom operator (%op x()) should have been declared. When omitting parentheses, like x, this denotes a variable.
  • unnamed variables: the _ notation denotes an anonymous variable. It can be used everywhere a variable name can be used. It is useful when the instance of the variable does not need to be used. Similarly, the _* notation can be used to denote an anonymous list-variable. This last notation can improve the efficiency of list-matching because the instances of anonymous list-variables do not need to be built.
  • annotated variable: the @ operator allows to give a variable name to a subterm. In f(x@g(_)) for example, x is a variable that will be instantiated by the instance of the subterm g(_). The variable x can then be used as any other variable.
  • implicit notation: as explained below, the %op operator forces to give name to arguments. Assuming that the operator f has two arguments, named arg1 and arg2, then we can write the pattern f[arg1=a()] which is equivalent to f(a(),_). This notation is interesting mostly when using constructors with many subterms. Besides that, using this notation can avoid changing the patterns when the signature slightly changes (the order of the arguments, adding a new argument etc).
  • symbol disjunction notation: to factorize the definition of pattern which have common subterms, it is possible to describe a family of patterns using a disjunction of symbols. The pattern (f∣g)(a(),b()) corresponds to the disjunction f(a(),b()) or g(a(),b()). To be allowed in a disjunction (in standard notation), the constructors should have the same signature (arity, domain and codomain). Thus the disjunction of list operators is not allowed since these operators do not have fixed arities and can not be compared.

In practice, it is usually better to use the disjunction notation with the explicit notation Tom offers: ((f∣g)[arg1=a()]). In that case, the signatures of symbols do not have to be identical: only involved slots have to be common (same names and types). Thus, the pattern (f∣g)[arg1=a()] is correct, even if g has more slots than f: it only has to have the slot arg1, with the same sort.

Note: the disjunction of symbol can also be used in Xml notation: <(A∣B)>...</(A∣B)>.

The use of '?' after a list operator enables real associative with neutral elements (AU) matchings. By using conc? for example, we simply specify that the subject may not start with a conc symbol. For instance, the following code would produce the output matched, because x and respectively y can be instantiated with the neutral element of conc (please see the documentation on Gom for further details about specifying symbols’ type and their neutral elements).

L l = `a();
%match(l) {
  conc?(x*,y*) -> { System.out.println("matched"); }
}

Note that without the use of '?', the subject must start with a conc in order to have a match.

Note: since version 2.5 we can use, inside a list, a subpattern that also denotes a list; moreover, this subpattern can be annotated. For instance, patterns like conc(_*,p@conc(a(),_*,b()),_*) are now valid.

Tom anti-pattern

The notion of anti-pattern offers more expressive power by allowing complement constructs: a pattern can describe what should not be in the matched subject.

The notion of complement is introduced by the symbol '!', as illustrated by the following grammar fragment:

PlainPattern ::= ['!'] VariableName
['!'] HeadSymbolList ( ExplicitTermListImplicitPairList )
...

The semantics of anti-patterns can be best understood when regarding them as complements. For example, a pattern like car[color=blue()] will match all the blue cars. If we add an '!' symbol, the anti-pattern !car[color=blue()] will match everything that is not a blue car, i.e all objects that are not cars or the cars that have a color different from blue. The grammar allows also car[color=!blue()] - matches all cars that are not blue, or !car[color=!blue()] - either everything that is not a car, or a blue car.

Using the non-linearity combined with anti-patterns allows to express interesting searches also. For example, car[interiorColor=x,exteriorColor=x] will match the cars that have the same interior and exterior color. By using the anti-pattern car[interiorColor=x,exteriorColor=!x], the result is as one would expect: it will match all the cars with different interior - exterior colors.

It is also possible to use the anti-patterns in list constructions. Please refer to the tutorial for more examples.

Note: The use of annotations (@) is forbidden below a '!' symbol and is prevented by a compilation error. This is due to the fact that what is under this symbol generally will not match, therefore it cannot be used in the action part of the rules. You will also get an error if you try to put '!' before an '_' or '_*' because a construct like '!_' in a pattern will never match anything, therefore it is useless.

Backquote construct

Backquote construct (`) can be used to build an algebraic term or to retrieve the value of a Tom variable (a variable instantiated by pattern-matching).

BackQuoteTerm ::= ['`'] CompositeTerm

The syntax of CompositeTerm is not fixed since it depends on the underlying language.

However, CompositeTerm should be of the following form:

  • Name: to denote a {{tom}} variable
  • Name'*': to denote a {{tom}} list-variable
  • Name( ... ): to build a prefix term
  • ( ... ): to build an expression
  • xml( ... ): to build an Xml term

In general, it is sufficient to add a backquote before the term you want to build to have the desired behavior. The execution of `f(g(a())) will build the term f(g(a)), assuming that f, g, and a are Tom operators. Suppose now that g is no longer a constructor but a function of the host-language. The construction `f(g(a())) is still valid, but the semantics is the following: the constant a is built, then the function g is called, and the returned result is put under the constructor f. Therefore, the result of g must be a correct term, which belongs to the right type (i.e. the domain of f).

To simplify the interaction with the host-language, it is also possible to use “unknown symbols” like f(x.g()) or f(1+x). The scope of the backquote construct is determined by the scope of the most external enclosing braces, except in two case: `x and `x* which allow you to use variables instantiated by the pattern part. In that case the scope is limited to the length of the variable name, eventually extended by the '’*’'. Sometimes, when writing complex expression like if(`x==`y || `x==`z), it can be useful to introduce extra parenthesis (if( `(x==y || x==z) )) in order to extend the scope of the backquote.

Backquote construct can use default values if they have been defined in the mapping (get_default construct). Assuming that f, a and b are Tom operators of sort T, and f has two fields x and y of sort T, where a default value for x is set to a. Two notations are available to build a f by using default value: the explicit one (`f[y=b()]) and the implicit one (`f(_,b())). Both notations will return the term `f(a(),b()).

Note: parsing backquote expressions is quite difficult. In some cases, Tom does not manage to parse correctly the expression:

  • identifier immediatly followed by a closing brace: {`x} for instance. Solution: use extra parenthesis: {`(x)},
  • spaces in backquotes expressions are not preserved: `f(new Apple()) generates a code that looks like make_f(newApple()). Workaround: introduce an intermediate variable to store the object
  • confusion between star and multiplication: `x*y is interpreted as `x* followed by y (instead of x times y). Solution: introduce extra spaces `x * y.

Meta-quote construct

Tom provides the construct '%['']%' that allows to build formatted strings without the need to encode special characters such as tabulations and carriage returns as it is usually done in Java. For example, to build a string containing the HelloWorld program, one can simply write:

String hello = %[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld !");
    }
  }
]%

Additionally, it is possible to insert in this string the content of a String variable, or the result of a function call (if it is a String) using the '@' as escape character: the code contained between '@' in this construct will be evaluated and the result inserted in the surrounding formatted string. Then, to add to the HelloWorld example a version string, we can use:

String version = "v12";
String hello2=%[
  public class Hello {
    public static void main(String[] args) {
      System.out.println("Hello\n\tWorld   @version@");
    }
  }
]%;

Even if the contents of the meta-quote construct is a formatted string, it is required that this string contains correctly balanced curly braces.

Note: the expression between two '@' can contain Tom constructs, like backquote constructs for example.

Note: use '@@' to insert the character '@'

Transformation construct

Transformation construct is a high level construct dedicated to EMF Ecore models transformations.

Here is the %transformation grammar:

TransformationConstruct ::= '%transformation' TransformationName '(' [TransformationArguments] ')' ' : ' '"'FileName'"' '->' '"'FileName'"' '{' (ElementaryTransformation)* '}'
TransformationName ::= Identifier
FileName ::= Identifier
TransformationArguments ::= SubjectName ':' AlgebraicType ( ',' SubjectName ':' AlgebraicType )*
| AlgebraicType SubjectName ( ',' AlgebraicType SubjectName )*
ElementaryTransformation ::= 'definition' ElementaryTransformationName 'traversal' Strategy '{' (ElementaryTransformationRule)* '}'
ElementaryTransformationName ::= Identifier
ElementaryTransformationRule ::= SimplePattern '->' '{' BlockList '}'

Work in progress

Here is the %tracelink grammar:

ReferenceTerm ::= '%tracelink' '(' VarName ':' TypeName ',' BackQuoteTerm ')'
TypeName ::= Identifier
VarName ::= Identifier

Work in progress

Here is the %resolve grammar:

ResolveConstruct ::= '%resolve' '(' VarName ':' TypeName ',' VarName ':' TypeName ')'

Work in progress

Tom signature constructs (*)

Sort and subsort definition constructs

Note: Subtyping and consequently subsort constructs are not yet available in Tom version 2.8. They are still under development. They can be used with the src version of Tom.

To define the mapping between the algebraic constructors and their concrete implementation, Tom provides a signature-mapping mechanism composed of several constructs. In addition to predefined mapping for usual builtin sorts (int, long, double, boolean, string, and char), all other algebraic sorts have to be declared using the %typeterm construct. Their name have to be different from the predefined builtin sorts names.

To use predefined sorts (in a '%match' construct or in the definition of an algebraic operator), it is sufficient to use the '%include' construct (%include { int.tom } for example).

When defining a new type with the '%typeterm' construct, some information has to be provided:

  • the 'implement' construct describes how the new type is implemented. The host language part written between braces ('{' and '}') is never parsed. It is used by the compiler to declare some functions and variables.
  • the 'is_sort(t)' construct specifies how to check the sort of an object of this type (in Java this is usually done with 'instanceof'). It is only required when using this type in a '%match' construct.
  • the 'equals(t1,t2)' construct corresponds to a predicate (parameterized by two term variables). This predicate should return true if the terms are “equal”. The true value should correspond to the builtin true value of the considered host language. This last optional predicate is used to compare builtin values and to compile non-linear left-hand sides.

Given a Java class Person we can define an algebraic mapping for this class:

%typeterm TomPerson {
  implement { Person }
  is_sort(t) { t instanceof Person }
  equals(t1,t2) { t1.equals(t2) }
}

Here, we assume that the method equals implements a comparison function over instances of Person. Note that we used TomPerson to make a clear distinction between algebraic sorts (defined in Tom) and implementation sorts (defined in Java, via the use of classes). In practice, we usually use the same name to denote both the algebraic sort and the implementation sort.

In order to define algebraic subsorts representing subtype binary relations between the algebraic sorts we use the keyword 'extends'. Given a Java class Woman we can define an algebraic mapping TomWoman as a subsort of TomPerson:

%typeterm TomWoman extends TomPerson {
  implement { Woman }
  is_sort(t) { t instanceof Woman }
  equals(t1,t2) { t1.equals(t2) }
}

Note that multiple inheritance is not allowed in Tom since a given algebraic sort can not be defined twice. The supertypes of a sort are obtained by reflexive and transitive closure over the direct supertype relation.

The grammar is the following:

IncludeConstruct ::= '%include' '{' FileName '}'
GoalLanguageBlock ::= '{' BlockList '}'
TypeTerm ::= '%typeterm' Type ['extends' Type] '{'
KeywordImplement [KeywordIsSort] [KeywordEquals]
'}'
KeywordImplement ::= 'implement' GoalLanguageBlock
KeywordIsSort ::= 'is_sort' GoalLanguageSortCheck
KeywordEquals ::= 'equals' '(' Name ',' Name ')' GoalLanguageBlock
OptionString ::= '--fresh' | '--termgraph' | ... (see Gom tool for the list of available options)

Note: since version 2.6, the variables used in the host-code may be prefixed by a '$' sign. This allows the compiler to inline the definition, making the code often smaller and more efficient. For example:

%typeterm TomPerson {
  implement { Person }
  is_sort(t) { $t instanceof Person }
  equals(t1,t2) { $t1.equals($t2) }
}

Note: '$' should not be used if a backquote construct is used in a mapping

Constructor definition constructs

Once algebraic sorts are declared (using '%typeterm'), Tom provides a mechanism to define signatures for constructors of these sorts using '%op', '%oplist' or '%oparray' constructs. When defining a new symbol with the '%op' construct, the user should specify the name of the operator, its codomain, and its domain. The later one is defined by a list of pairs (slot-name, sort).

Let us consider again the class Person, and let us suppose that an instance of Person has two fields (name and age), we can define the following operator:

%op TomPerson person(name:String, age:int)

In this example, the algebraic operator person has two slots (name and age) respectively of sorts String and int, where String and int are pre-defined sorts.

In addition to the signature of an operator, several auxiliary functions have to be defined:

  • The 'is_fsym(t)' construct is used to check if a term 't' is rooted by the considered symbol. The true value should correspond to the builtin true value of the considered host language (true in Java or Caml, and something different from 0 in C for example).
  • The 'make(t1,...,tn)' construct is parameterized by several variables (i.e. that should correspond to the arity of the symbol). A call to this 'make' function should return a term rooted by the considered symbol, where each subterm correspond to the terms given in arguments to the function. When defining a constant (i.e. an operator without argument, make can be defined without braces: make { ... }).
  • The 'get_slot(slotName,t)' construct has to be defined for all slots of the signature. The implementation of these constructs should be such that the corresponding subterm is returned.
  • The 'get_default(slotName)' construct is optional but may be defined to assign a default value for a given slot.

Coming back to our example, checking if an object 't' is rooted by the symbol person can be done by checking that 't' is an instance of the class Person. Building a person can be done via the Java function new Person(...). Accessing to the slots name and age could be implemented by an access to the variables of the class Person. In practice, the following operator definition should work fine:

%op TomPerson person(name:String, age:int) {
  is_fsym(t)  { $t instanceof Person }
  make(t1,t2) { new Person($t1,$t2) }
  get_slot(name,t) { $t.name } // assuming that 'name' is public
  get_slot(age,t)  { $t.age  } // assuming that 'age' is public
  get_default(name) { "Doe" }
  get_default(age) { 42 }
}

When defining a new symbol with the '%oplist' construct, the user has to specify how the symbol is implemented. In addition, the user has to specify how a list can be built and accessed:

  • the 'make_empty()' construct should return an empty list.
  • the 'make_insert(e,l)' construct corresponds to a function parameterized by a list variable and a term variable. This function should return a new list l’ where the element e has been inserted at the head of the list l (i.e. 'equals(get_head(l’),e)' and 'equals(get_tail(l’),l)' should be true).
  • the 'get_head(l)' function is parameterized by a list variable and should return the first element of the considered list.
  • the 'get_tail(l)' function is parameterized by a list variable and should return the tail of the considered list.
  • the 'is_empty(l)' constructs corresponds to a predicate parameterized by a list variable. This predicate should return true if the considered list contains no element.

Similarly, when defining a new symbol with the '%oparray' construct, the user has to specify how the symbol is implemented, how an array can be built, and accessed:

  • the 'make_empty(n)' construct should return a list such that n successive 'make_append(e,l)' can be made.
  • the 'make_append(e,l)' construct corresponds to a function parameterized by a list variable and a term variable.

Warning: This function should return a list l’ such that the element e is at the end of l.

  • the 'get_element(l,n)' construct is parameterized by a list variable and an integer. This should correspond to a function that return the n-th element of the considered list l.
  • the 'get_size(l)' constructs corresponds to a function that returns the size of the considered list (i.e. the number of elements of the list). The size of an empty list is 0.

The '%oplist' or '%oparray' is complex but not difficult to use. Let us consider the ArrayList Java class, and let us define a Tom mapping over this data-structure. The first thing to do consists in defining the sort for the elements and the sort for the list-structure:

%typeterm Object {
  implement     { Object          }
  equals(l1,l2) { $l1.equals($l2) }
}
%typeterm TomList {
  implement     { ArrayList       }
  equals(l1,l2) { $l1.equals($l2) }
}

Once defined the sorts, it becomes possible to define the list-operator TomList conc( Object* ). This operator has a variadic arity: it takes several Object and returns a TomList.

%oparray TomList conc( Object* ) {
  is_fsym(t)       { $t instanceof ArrayList }
  make_empty(n)    { new ArrayList($n)       }
  make_append(e,l) { myAdd($e,(ArrayList)$l) }
  get_element(l,n) { (Object)$l.get($n)      }
  get_size(l)      { $l.size()               }
}
private static ArrayList myAdd(Object e,ArrayList l) {
  l.add(e);
  return l;
}

An auxiliary function myAdd is used since the make_append construct should return a new list. The get_element should return an element whose sort belongs to the domain (Object) in this example. Although not needed in this example, in general, a cast ((Object)$l.get($n)) is needed.

The grammar for the mapping constructs is the following:

Operator ::= '%op' Type Name'(' [ SlotName ':' Type ( ',' SlotName ':' Type )* ] ')'
'{' ( KeywordIsFsym | KeywordMakeKeywordGetSlotKeywordGetDefault )* '}'
OperatorList ::= '%oplist' Type Name '(' Type '*' ')'
'{' KeywordIsFsym ( KeywordMakeEmptyList
KeywordMakeInsertKeywordGetHead
KeywordGetTailKeywordIsEmpty )* '}'
OperatorArray ::= '%oparray' Type Name '(' Type '*' ')'
'{' KeywordIsFsym ( KeywordMakeEmptyArray
KeywordMakeAppendKeywordElement
KeywordGetSize )* '}'
KeywordIsFsym ::= 'is_fsym' '(' Name ')' GoalLanguageBlock
KeywordGetSlot ::= 'get_slot' '(' Name ',' Name ')' GoalLanguageBlock
KeywordGetDefault ::= 'get_default' '(' Name ')' GoalLanguageBlock
KeywordMake ::= 'make' [ '(' Name ( ',' Name )* ')' ] GoalLanguageBlock
KeywordGetHead ::= 'get_head' '(' Name ')' GoalLanguageBlock
KeywordGetTail ::= 'get_tail' '(' Name ')' GoalLanguageBlock
KeywordIsEmpty ::= 'is_empty' '(' Name ')' GoalLanguageBlock
KeywordMakeEmptyList ::= 'make_empty' [ '(' ')' ] GoalLanguageBlock
KeywordMakeInsert ::= 'make_insert' '(' Name ',' Name ')' GoalLanguageBlock
KeywordGetElement ::= 'get_element' '(' Name ',' Name ')' GoalLanguageBlock
KeywordGetSize ::= 'get_size' '(' Name ')' GoalLanguageBlock
KeywordMakeEmptyArray ::= 'make_empty' '(' Name ')' GoalLanguageBlock
KeywordMakeAppend ::= 'make_append' '(' Name ',' Name ')' GoalLanguageBlock

Note: since version 2.5, is_fsym and make functions are optional (therefore the %op construct could be empty). When not declared, is_fsym default value is false. When nothing is declared, there is a make default value: a call to the function which has the same name as the considered operator. This shortcut is useful to provide type information about a Java function for example.

Predefined sorts and operators

See Section Predefined mappings in the Runtime Library Chapter.

Gom construct

The grammar for the Gom construct is as follows:

GomConstruct ::= '%gom' ['(' OptionString ')'] '{' GomGrammar '}'

It allows to define a Gom signature (for more details about Gom see Chapter Gom). The Gom compiler is called on the GomGrammar, and the construct is replaced by the produced Tom mapping. The OptionString is composed by a list of command line options to pass to the underlying Gom compiler, as described in Section Using Gom.

XML pattern

To deal with XML documents, the XML notation can be used (<A><code><B attribute="name"/></A></code> for example).

When manipulating XML documents, we distinguish two main kinds of operations: retrieving information and transforming a document. Tom provides three different XML notations that ought to simplify the definition of patterns: the “standard ” and the “implicit” XML notations are used to define compact (but incomplete) patterns. This notation is well suited to retrieve information. The “explicit” XML notation is used to precisely describe an XML pattern and all the variables that have to be instantiated. This notation is particularly well suited to perform XML transformation since it allows the programmer to precisely describe how variables have to be instantiated.

To make the XML notation understandable, we have to explain how XML documents are handled by Tom. To each XML document corresponds a DOM (Document Object Model) representation. In Tom, we have defined a mapping from DOM sorts to abstract algebraic sorts: TNode and TNodeList, which correspond respectively to Node and NodeList, defined by the Java DOM implementation.

Thus, a Node object becomes a ternary operator Element whose first subterm is the name of the XML node, the second subterm is a list of attributes and the third subterm is a list of subterms (which correspond to XML sub-elements). The second and the third elements are terms of sort TNodeList (because they are implemented by NodeList objects in DOM).

Thus, when considering the <A></A> XML document, the corresponding algebraic term is Element("A",[],[]), where [] denotes the empty list. Similarly, <A><code><B attribute="name"/></A></code> is encoded into Element("A",[],[Element("B",[Attribute("attribute","name")],[])]).

When defining an XML pattern, the user has to introduce extra list-variables to precisely describe the XML pattern and capture the different contexts. Suppose that we are interested in finding a node <B></B> which is a subterm of a node <A></A> (but not necessary the first subterm). The algebraic pattern should be Element("A",[_*],[_*,Element("B",[_*],[_*]),_*]). Using the XML notation, this pattern should be expressed as follows: <A(_*)>(_*,<code><B(_*)>(_*)</B>,_*)</A></code>. This notation (called explicit) is precise but error prone. This is why we have introduced the explicit notation, where all context variable can be removed (and () are replaced by []): <A[]><code><B[]>[]</B></A></code>. The last notation (called standard XML notation) allows the user to remove the [] and replace the list-separator (,) by spaces. The previous pattern can be written: <A><code><B></B></A></code>.

These three different notations allow the user to choose the level of control he wants to have on the XML pattern matching algorithm.

The formal description of the syntax is the following:

XMLTerm ::= '<' XMLNameList XMLAttrList '/>'
'<' XMLNameList XMLAttrList '>' XMLChilds '</' XMLNameList '>'
'#TEXT' '(' IdentifierString ')'
'#COMMENT' '(' IdentifierString ')'
'#PROCESSING-INSTRUCTION'
'(' (IdentifierString) ',' (IdentifierString) ')'
XMLNameList ::= XMLName
'(' XMLName ( '∣' XMLName )* ')'
XMLAttrList ::= '[' [ XMLAttribute (',' XMLAttribute)* ] ']'
'(' [ XMLAttribute (',' XMLAttribute)* ] ')'
( XMLAttribute )*
XMLAttribute ::= '_*'
VariableName '*'
AttributeName '=' [AnnotedName '@'] ( IdentifierString )
[AnnotedName '@'] '_' '=' [AnnotedName '@'] ( IdentifierString )
XMLChilds ::= ( Term )*
'[' Term ( ',' Term )* ']'
Language Reference

Tom  > Gom  > Strategies  > Runtime Library  > Migration guide  > EMF

Tom Documentation
Guided Tour :: Tutorial :: Language Reference :: Tools
Personal tools
Create a book