5.1 Notations
The syntax of the language is given in BNF-like notation. Terminal
symbols are set in typewriter font (`like this'). Non-terminal
symbols are set in italic font (like that). Square brackets
[...] denote optional components. Parentheses with a trailing
star sign (...)* denotes zero, one or several repetitions of the
enclosed components. Parentheses with a trailing
plus sign (...)+ denote one or several repetitions of the enclosed
components. Parentheses (...) denote grouping.
5.1.1 Lexical conventions
Identifier |
::= |
Letter ( Letter ∣ Digit
∣ `_' ∣ `-' )* |
Integer |
::= |
(Digit )+ |
Double |
::= |
(Digit)+ [`.'] (Digit)* ∣ `.' (Digit)+ |
String |
::= |
`"'
(Letter ∣
(`\'
(`n' ∣ `t' ∣ `b' ∣ `r' ∣
`f' ∣ `\' ∣ `'' ∣ `"') ) )* `"' |
Letter |
::= |
`A' ... `Z' ∣ `a' ... `z' |
Digit |
::= |
`0' ... `9' |
Char |
::= |
`'' (Letter ∣Digit) `'' |
5.1.2 Names
SubjectName |
::= |
Identifier |
Type |
::= |
Identifier |
SlotName |
::= |
Identifier |
HeadSymbol |
::= |
Identifier |
|
∣ |
Integer |
|
∣ |
Double |
|
∣ |
String |
|
∣ |
Char |
VariableName |
::= |
Identifier |
AnnotedName |
::= |
Identifier |
LabelName |
::= |
Identifier |
FileName |
::= |
Identifier |
AttributeName |
::= |
Identifier |
XMLName |
::= |
Identifier |
Name |
::= |
Identifier |
5.2 Tom constructs
A Tom program is a host language program (namely C, Java,
or Caml)
extended by several new constructs such as %match,
%strategy, %rule, %include, %gom, or backquote.
Tom is a multi-languages compiler, so, its syntax depends on
the host language syntax. But for simplicity, we only present
the syntax of its constructs and explain how they can be integrated
into the host language.
Using Java as host-language, the following Tom program is
correct:
public class HelloWorld {
%include { string.tom }
public final static void main(String[] args) {
String who = "World";
%match(who) {
"World" -> { System.out.println("Hello " + who); }
_ -> { System.out.println("Don't panic"); }
}
}
}
5.2.1 Tom program
Basically, a Tom program is list of blocks, where each block is
either a Tom construct, or a sequence of characters.
The idea is that after transformation, the sequence of characters
merged with the compiled Tom constructs should be a valid host
language program.
In the previous example, %include and %match constructs
are replaced by function definitions and Java instructions, making
the resulting program a correct Java program.
Syntax of a Tom program:
Tom |
::= |
BlockList |
BlockList |
::= |
|
( |
MatchConstruct |
|
∣ |
StrategyConstruct |
|
∣ |
RuleConstruct |
|
∣ |
BackQuoteTerm |
|
∣ |
IncludeConstruct |
|
∣ |
GomConstruct |
|
∣ |
TypeTerm |
|
∣ |
Operator |
|
∣ |
OperatorList |
|
∣ |
OperatorArray |
|
∣ |
`{' BlockList `}' |
|
)* |
-
MatchConstruct is translated into a list of
instructions. This construct may appear anywhere a list of
instructions is valid in the host language.
- StrategyConstruct is translated into a class
definition. Since it is translated into a class,
this construct is valid only for Java.
- RuleConstruct is translated into a function
definition. This construct may appear anywhere a function declaration
is valid in the host language.
- BackQuoteTerm is translated into a function call.
- IncludeConstruct is replaced by the content of the
file referenced by the construct. Tom looks for include files
in:
./packageName/
, $TOM_HOME/share/jtom/
and <path>
,
where <path>
is specified by option: --import <path>
.
If the file contains some Tom constructs, they are expanded.
- GomConstruct allows to define a Gom grammar. This construct is
replaced by the content of the generated mapping. See the
Chapter 6 for more details.
- TypeTerm, as well as Operator, OperatorList, and
OperatorArray are replaced by some functions definitions.
5.2.2 Match construct
The %match construct (MatchConstruct) is one of the main
contributions of Tom.
This construct can be seen as an extension of the SwitchCase
construct in C or Java, except that patterns are no longer restricted
to constants (chars or integers).
Given an object (the subject) and a list of patterns, our goal is to find
the first pattern that matches the subjects (i.e. that have a
compatible shape).
More formally, a pattern is a term built over variables and
constructors. The latter ones describe the shape of the
pattern, whereas the former ones are holes that can be
instantiated to capture a value.
When we consider the term f(a(),g(b())), this has to be viewed as a tree
based data-structure where a() is the first child of the root, labeled
by f. Similarly, b() is the unique child of g, which is the second
child of the root.
We say that the pattern f(x,y) matches this term (called subject),
because we can give values to x and y such that the pattern and
the subject become equal: we just have to assign a() to x and g(b())
to y. Finding this assignment is called matching and
instantiating. This is exactly what Tom is supposed to do.
A pattern may of course contains subterms. Therefore, f(x,g(b())) or
f(a(),g(y)) are valid patterns which match against the subject.
Assuming that s is a Java variable,
referencing a term (the tree based object f(a(),g(b())) for
example), the following Tom construct is valid:
%match(s) {
f(a(),g(y)) -> { /* code that uses y */ }
f(x,g(b())) -> { /* code that uses x */ }
f(x,y) -> { /* code that uses x and y */ }
}
A MatchConstruct is composed of two parts:
-
a list of subjects,
- a list of PatternAction:
this is a list of pairs (pattern,action), where an action is a set
of host language instructions which is executed
each time a pattern matches the subjects.
The construct is defined as follow:
MatchConstruct |
::= |
`%match' `('
MatchArguments `)' `{'
( PatternAction )*
`}' |
MatchArguments |
::= |
[Type] Term ( `,' [Type] Term )* |
PatternAction |
::= |
[LabelName`:'] PatternList ( `|' PatternList )*
`->' `{' BlockList `}' |
PatternList |
::= |
Pattern ( `,' Pattern )* |
For expository reasons, we consider that a %match construct is
evaluated in the following way:
The semantics of a match constructs may remind the switch/case
construct, however, a big difference exists.
In Tom, a theory like associativity may be attached to a
constructor. Thus, given a subject and a pattern, there may exist
several ways to match the pattern against the subject.
Informally, when considering the subject conc(a(),g(b()),c(),g(d())), and
the pattern conc(_*,g(x),_*), there are two possible match for
this problem: either x=b, either x=d(). Note that _* is a
special hole which can capture any sublist of conc(…).
When taking this new possibility into account, the evaluation of a
%match construct is a bit more complex:
-
given a PatternAction whose patterns match the list of
ground terms, the list of variables is instantiated and the
associated semantic action is executed.
- if the execution control is not transferred outside the
%match instruction, in addition to the previous
explanations, if the considered matching theory may return several
matches, for each match, the free variables are instantiated and the
associated semantic action is executed.
This means that a same action may be executed several times, but in
a different context: i.e. the variable have different instantiations.
- when all matches have been computed (there is at most one match
in the syntactic theory), the execution control is transferred to
the next PatternAction whose patterns match the list of
ground terms.
- as before, when there is no more PatternAction whose
patterns match the list of subject, the %match
instruction is finished, and the execution control is transferred to
the next instruction.
As mentioned in the BNF-syntax, a label may be attached to a
pattern. In that case, in C and Java, it becomes possible to
exit from the current PatternAction (using a goto or a
break), without exiting from the whole %match
construct. The control is transferred to the next pattern which matches
the subjects. This feature is useful to exit from a complex
associative-matching pattern which, for any reason, is no longer
interesting.
Note:
the behavior is not determined if a semantic action
modifies a host language variable which is an argument of a
%match instruction under evaluation.
Note:
the `'|'' symbol can be used to introduce
disjunction of pattern. This is a shortcut which avoid duplicating a
PatternAction.
However, this construct is deprecated. We recommend to use
the disjunction of symbols construct, presented in the next
section.
5.2.3 Tom pattern
As we can imagine, the behavior of a %match construct
strongly depends on the patterns which are involved.
The formalism which defines the syntax of a pattern is also an
essential component of Tom.
Unfortunately, its formal definition is complex, simply because there
exist several ways to define patterns which have equivalent behaviors.
On the other hand, the different shortcuts may help the programmer to
simplify the definitions of patterns.
Informally, a pattern is a term that can be either a variable or an
anonymous variable (x or _ for example).
A pattern can also be composed of constructors (a(), f(a()),
g(a(),x), or h(a(),_,x) for example).
When a pattern matches a subject, it may be useful to keep a reference
to a matched subterm. The annotation mechanism (z@g(y) for
example) can be used for this purpose. Thus, considering the pattern
f(x,z@g(y)) and the subject f(a(),g(h(b()))), y is instantiated
by h(b()), and z is instantiated by g(h(b())). This can be useful in
C, to free the memory for example.
When identical actions have to be performed for a set of patterns which
share a common structure, the disjunction of symbols may be
used: pattern (f∣ g)(a()) is equivalent to the set {f(a()),
g(a())}. The disjunction of symbols may also be used in subterm, like
in h( (f∣ g)(x) ).
More formally, a Tom term has the following syntax:
Term |
::= |
VariableName |
|
∣ |
HeadSymbol [ `(' Term ( `,' Term )*`)' ] |
Pattern |
::= |
[ AnnotedName `@' ] PlainPattern |
PlainPattern |
::= |
VariableName [ `*' ] |
|
∣ |
HeadSymbolList [ ExplicitTermList ∣ ImplicitPairList ] |
|
∣ |
ExplicitTermList |
|
∣ |
`_' |
|
∣ |
`_*' |
|
∣ |
XMLTerm |
HeadSymbolList |
::= |
HeadSymbol |
|
∣ |
`(' HeadSymbol ( `∣' HeadSymbol )+ `)' |
ExplicitTermList |
::= |
`(' Pattern ( `,' Pattern )* `)' |
ImplicitPairList |
::= |
`[' PairPattern ( `,' PairPattern )* `]' |
PairPattern |
::= |
`[' SlotName `=' Pattern `]' |
Note:
since version 2.4, negative patterns, called
anti-patterns have been introduced.
See section
5.2.4 for a detailed description.
A pattern is a term which could contain variables.
When matching a pattern against a subject (a ground term), these
variables are instantiated by the matching procedure (generated by
Tom).
In Tom, the variables do not have to be declared: their type is
inferred automatically, depending on the context in which they appear.
As described previously, Tom offers several mechanisms to simplify
the definition of a pattern:
5.2.4 Tom anti-pattern
Since version 2.4, the notion of anti-pattern has been introduced. This offer
more expressive power by allowing complement constructs: a pattern can now
describe what should not be in the matched subject.
Therefore, the grammar presented in section 5.2.3 is now extended with
a new symbol: `!'.
The new syntax is the following:
PlainPattern |
::= |
(`!')* VariableName |
|
∣ |
(`!')* HeadSymbolList [ ExplicitTermList ∣ ImplicitPairList ] |
|
∣ |
... |
The semantics of anti-patterns can be best understood when regarding them as complements. For example, a
pattern like car[color=blue()] will match all the blue cars. If we add an `!' symbol,
the anti-pattern !car[color=blue()] will match everything that is not a blue car, i.e
all objects that are not cars or the cars that have a color different from blue. The grammar allows
also car[color=!blue()] - matches all cars that are not blue, or !car[color=!blue()] - either everything that is not a car, or a blue car.
Using the non-linearity combined with anti-patterns allows to express interesting searches also. For
example, car[interiorColor=x,exteriorColor=x] will match the cars that have the same
interior and exterior color. By using the anti-pattern car[interiorColor=x,exteriorColor=!x],
the result is as one would expect: it will match all the cars with different interior - exterior
colors.
It is also possible to use the anti-patterns in list constructions. Please refer to the tutorial for more
examples.
Note:
The use of annotations (@) is forbidden bellow a `!' symbol and is prevented
by a compilation error. This is due to the fact that what is under this symbol generally will
not match, therefore it cannot be used in the action part of the rules. You will also get an
error if you try to put `!' before an `_' or `_*' because a construct like
`!_' in a pattern will never match anything, therefore it is useless.
5.2.5 Rule construct
In Tom, we can also define a set of rewrite rules. All the
left-hand sides should begin with the same root symbol:
RuleConstruct |
::= |
`%rule' `{' ( Rule )* `}' |
Rule |
::= |
RuleBody ( RuleCondition )* |
RuleBody |
::= |
Pattern `->' Term |
RuleCondition |
::= |
`where' Pattern `:=' Term |
|
∣ |
`if' Term `==' Term |
The %rule construct is composed of a list of conditional
rewrite rules (the left-hand side is a term and the right-hand side is
a term). All these rules (enclosed into a %rule { ... }
construct) should begin with the same root symbol. The
Tom compiler will generate a function (with a number of arguments
equals to the arity of this root symbol) whose
name corresponds to the name of this unique root symbol.
Given a ground term, applying this function returns the instantiated
and normalized right-hand side of the first rule (from top to bottom)
whose pattern matches the considered subject and whose conditions are
satisfied. When no rule can be applied (i.e. no pattern matches the
subject, or no condition is satisfied), the given ground term, rooted
by the root symbol of the rewrite system is returned.
In Tom, we consider two kinds of conditions:
-
an equality condition (t1 == t2) is a pair of ground terms that
belong to the same type.
The condition is satisfied when the normal forms of the two terms
(t1 and t2) are equal modulo the equals predicate defined in
the definition of the type associated to t1 and t2.
- a matching condition (p := t) is a pair of terms where p may
contain free variables. The condition is satisfied if the pattern p
can be matched against the normal form of t. In this case, the
free variables of p are instantiated and can be used in other
conditions or the right-hand side of the rule.
Note:
since the introduction of Gom, the %rule construct is a bit deprecated.
We recommend to use hooks since they ensure that the reduction will always be
performed, in particular during the application of a strategy.
5.2.6 Backquote construct
Another construct of Tom is the backquote (`).
This construct can be used to build an algebraic term or to retrieve
the value of a Tom variable (a variable instantiated by pattern-matching).
The syntax of this operator is not fixed since it depends on the underlying language.
However, a backquote term should be of the following form:
-
`Name: to denote a Tom variable
- `Name`*': to denote a Tom list-variable
- `Name( ... ): to build a prefix term
- `( ... ): to build an expression
- `xml( ... ): to build an Xml term
In general, it is sufficient to add a backquote before the term you
want to build to have the wanted behavior.
The execution of `f(g(a())) will build the term f(g(a)),
assuming that f, g, and a are Tom operators.
Suppose now that g is no longer a constructor but a function of the
host-language. The construction `f(g(a())) is still valid, but
the semantic is the following: the constant a is built, then the
function g is called, and the returned is put under the
constructor f. Therefore, the result of g must be a correct term,
which belongs to the right type (i.e. the domain of f).
To simplify the interaction with the host-language, it is also
possible to use “unknown symbols” like f(x.g() or
f(1+x).
The scope of the backquote construct is determined by the scope of the
most external enclosing braces, except in two case: `x and
`x* which allows you to use variables instantiated by the
pattern part. In that case the scope is limited to the length of the
variable name, eventually extended by the `'*''.
Sometime, when writing complex expression like if(`x==`y ||
`x==`z), it can be useful to introduce extra braces (if(
`(x==y || x==z) )) to extend the scope of the backquote.
5.2.7 Meta-quote construct
Tom provide a construct `%[' … `]%' allowing to build
formatted strings without the need to encode special characters such as
tabulations and carriage return as usually done in Java. For example, to
build a string containing the HelloWorld program, one can simply use:
String hello = %[
public class Hello {
public static void main(String[] args) {
System.out.println("Hello\n\tWorld !");
}
}
]%
Additionally, it is possible to insert in this string the content of a
String variable, or the result of a function call (if it is a
String) using the `@' as escape character: the code contained
between `@' in this construct will be evaluated and the result inserted in
the surrounding formatted string. Then, to add to the HelloWorld example
a version string, we can use:
String version = "v12";
String hello2=%[
public class Hello {
public static void main(String[] args) {
System.out.println("Hello\n\tWorld @version@");
}
}
]%;
Even if the contents of the meta-quote construct is a formatted string,
it is required that this string contains correctly balanced braces.
Note:
the expression between two `@' can contain Tom constructs,
like backquote expressions for example.
5.3 Tom signature constructs
5.3.1 Sort definition constructs
To define the mapping between the algebraic constructors and their
concrete implementation, Tom provides a signature-mapping mechanism
composed of several constructs.
In addition to predefined mapping for usual builtin sorts (int,
long, double, boolean, string, and
char), all other algebraic sorts have to be declared using the
%typeterm construct.
To use predefined sorts (in a %match construct or in the
definition of a new operator), it is sufficient to use the
%include construct (%include { int.tom }
for example).
When defining a new type with the %typeterm construct,
at least two extra informations have to be provided:
-
the implement construct describes how the new type is
implemented. The host language part written between braces
('{' and '}') is never parsed. It is used by
the compiler to declare some functions and variables.
- the visitor_fwd represents a Java class.
In order to use strategies, sorts must define a Java class. This class must extend
VisitorFwd and must implement the interface VisitableVisitor.
The goal of the VisitorFwd is to provide a base class for building
strategies. Thus, it has to provide typed visitors for each defined sort, and
implement its visit method to call the typed visit_sort
functions depending on the dynamic type of the subject.
More information about visitor_fwd can be found in 7.7.
- the equals(t1,t2) construct corresponds to a
predicate (parametrized by two term variables).
This predicate should return true if the terms are
“equal”. The true value should correspond to the
builtin true value of the considered host language.
This last optional predicate is used to compare builtin values and
to compile non-linear left-hand sides.
Given a Java class Person we can define an algebraic mapping for this class:
%typeterm TomPerson {
implement { Person }
equals(t1,t2) { t1.equals(t2) }
}
Here, we assume that the method equals implement a comparison
function over instances of Person.
Note also that we have used TomPerson to make a clear
distinction between algebraic sorts (defined in Tom) and
implementation sorts (defined in Java, via the use of classes). In
practice, we usually use the same name to denote both the algebraic
sort and the implementation sort.
The grammar is the following:
IncludeConstruct |
::= |
`%include' `{' FileName `}' |
GomConstruct |
::= |
`%gom' `{' GomGrammar `}' |
GoalLanguageBlock |
::= |
`{' BlockList `}' |
TypeTerm |
::= |
`%typeterm' Type `{' |
|
|
KeywordImplement [KeywordEquals] [KeywordVisitorFwd] |
|
|
`}' |
KeywordImplement |
::= |
`implement' GoalLanguageBlock |
KeywordEquals |
::= |
`equals' `(' Name `,' Name `)' GoalLanguageBlock |
KeywordVisitorFwd |
::= |
`visitor_fwd' GoalLanguageVisitorFwd |
Note:
in order to use strategy constructs (
%strategy), an extra information
have to be defined using the
visitor_fwd construct.
This sort information should correspond to a
Java class.
In order to use strategies, sorts must define a
Java class. This class must extend
VisitorFwd and must implement the interface
VisitableVisitor.
The goal of the
VisitorFwd is to provide a base class for building
strategies. Thus, it has to provide typed visitors for each defined sort, and
implement its
visit method to call the typed
visit_sort
functions depending on the dynamic type of the subject.
See Section
7.7 for a detailed example of strategy support for
arbitrary
Java classes.
5.3.2 Constructor definition constructs
Once algebraic sorts are declared, Tom provides a mechanism to define
sorted signatures for constructors using %op, %oplist or
%oparray constructs.
When defining a new symbol with the %op construct, the user
should specify the name of the operator, its codomain, and its
domain. The later one is defined by a list of pairs (slot-name, sort).
Let us consider again the class Person, and let us suppose that an
instance of Person has two fields (name and
age), we can define the following operator:
%op TomPerson person(name:String, age:int)
In this example, the algebraic operator person has two slots
(name and age) respectively of sorts String and
int, where String and int are pre-defined sorts.
In addition to the signature of an operator, several auxiliary
functions have to be defined:
-
The is_fsym(t) { predicate(t) } construct is
used to check if a term t is rooted by the considered symbol.
The true value should correspond to the builtin true
value of the considered host language (true in Java or
Caml, and something different from 0 in C for
example).
-
The make(t1,...,tn) construct is parametrized by several variables
(i.e. that should correspond to the arity of the symbol).
A call to this make function should return a term rooted by
the considered symbol, where each subterm correspond to the terms
given in arguments to the function. When defining a constant
(i.e. an operator without argument, make can be defined
without brace: make { ... }).
-
The get_slot(slotName,t) construct has to be defined for all
slots of the signature. The implementation of these constructs
should be such that the corresponding subterm is returned.
Coming back to our example, checking if an object t is rooted by the
symbol person can be done by checking that t is an instance
of the class Person.
Building a person can be done via the Java function new
Person(...).
Accessing to the slots name and age could be implemented
by an access to the variables of the class Person.
In practice, the following operator definition should work fine:
%op TomPerson person(name:String, age:int) {
is_fsym(t) { t instanceof Person }
make(t1,t2) { new Person(t1,t2) }
get_slot(name,t) { t.name } // assuming that 'name' is public
get_slot(age,t) { t.age } // assuming that 'age' is public
}
When defining a new symbol with the %oplist construct,
the user has to specify how the symbol is implemented.
In addition, the user has to specify how a list can be built and accessed:
-
the make_empty() construct should return an empty
list.
- the make_insert(e,l) construct corresponds to a function
parametrized by a list variable and a term variable. This function
should return a new list l' where the element e
has been inserted at the head of the list l
(i.e. equals(get_head(l'),e) and
equals(get_tail(l'),l) should be true).
- the get_head(l) function is parametrized by a list
variable and should return the first element of the considered list.
- the get_tail(l) function is parametrized by a list
variable and should return the tail of the considered list.
- the is_empty(l) constructs corresponds to a
predicate parametrized by a list variable.
This predicate should return true if the considered list
contains no element.
Similarly, when defining a new symbol with the %oparray construct,
the user has to specify how the symbol is implemented, how an array
can be built, and accessed:
-
the make_empty(n) construct should return a list of
size n.
- the make_append(e,l) construct corresponds to a
function parametrized by a list variable and a term variable.
Warning:
This function should return a list l' such that
the element e is at the n-th position.
- the get_element(l,n) construct is parametrized by a list
variable and an integer. This should correspond to a function that
return the n-th element of the considered list l.
- the get_size(l) constructs corresponds to a function
that returns the size of the considered list.
By convention, an empty list contains 0 element.
The %oplist or %oparray is complex but not difficult
to use. Let us consider the ArrayList Java class, and let us
define a Tom mapping over this data-structure. The first thing to do
consists in defining the sort for the elements and the sort for the list-structure:
%typeterm Object {
implement { Object }
equals(l1,l2) { l1.equals(l2) }
}
%typeterm TomList {
implement { ArrayList }
equals(l1,l2) { l1.equals(l2) }
}
Once defined the sorts, it becomes possible to define the
list-operator TomList conc( Object* ). This operator has a
variadic arity: it takes several Object and returns a
TomList.
%oparray TomList conc( Object* ) {
is_fsym(t) { t instanceof ArrayList }
make_empty(n) { new ArrayList(n) }
make_append(e,l) { myAdd(e,(ArrayList)l) }
get_element(l,n) { (Object)l.get(n) }
get_size(l) { l.size() }
}
private static ArrayList myAdd(Object e,ArrayList l) {
l.add(e);
return l;
}
An auxiliary function myAdd is used since the
make_append construct should return a new list.
The get_element should return an element whose
sort belongs to the domain (Object) in this example.
Although not needed in this example, in general, a cast
((Object)l.get(n)) is needed.
The grammar for the mapping constructs is the following:
Operator |
::= |
`%op' Type Name |
|
|
`(' [ SlotName `:' Type ( `,'
SlotName `:' Type )* ] `)' |
|
|
`{' KeywordIsFsym ( KeywordMake ∣
KeywordGetSlot )* `}' |
OperatorList |
::= |
`%oplist' Type Name `(' Type `*'
`)' |
|
|
`{' KeywordIsFsym ( KeywordMakeEmptyList |
|
|
∣ KeywordMakeInsert ∣ KeywordGetHead |
|
|
∣ KeywordGetTail ∣ KeywordIsEmpty )* `}' |
OperatorArray |
::= |
`%oparray' Type Name `(' Type `*'
`)' |
|
|
`{' KeywordIsFsym ( KeywordMakeEmptyArray ∣ |
|
|
KeywordMakeAppend ∣ KeywordElement |
|
|
∣ KeywordGetSize )* `}' |
KeywordIsFsym |
::= |
`is_fsym' `(' Name `)' GoalLanguageBlock |
KeywordGetSlot |
::= |
`get_slot' `(' Name `,' Name `)' GoalLanguageBlock |
KeywordMake |
::= |
`make' [ `(' Name ( `,' Name )* `)' ]
GoalLanguageBlock |
KeywordGetHead |
::= |
`get_head' `(' Name `)' GoalLanguageBlock |
KeywordGetTail |
::= |
`get_tail' `(' Name `)' GoalLanguageBlock |
KeywordIsEmpty |
::= |
`is_empty' `(' Name `)' GoalLanguageBlock |
KeywordMakeEmptyList |
::= |
`make_empty' [ `(' `)' ] GoalLanguageBlock |
KeywordMakeInsert |
::= |
`make_insert' `(' Name `,' Name `)' GoalLanguageBlock |
KeywordGetElement |
::= |
`get_element' `(' Name `,' Name `)' GoalLanguageBlock |
KeywordGetSize |
::= |
`get_size' `(' Name `)' GoalLanguageBlock |
KeywordMakeEmptyArray |
::= |
`make_empty' `(' Name `)' GoalLanguageBlock |
KeywordMakeAppend |
::= |
`make_append' `(' Name `,' Name `)' GoalLanguageBlock |
5.3.3 Predefined sorts and operators
See Section 8.1.
5.4 XML pattern
To deal with XML documents, the XML notation can be used
(<A><B attribute="name"/></A> for example).
When manipulating XML documents, we distinguish two main kinds of
operations: retrieving information and transforming a document.
Tom provides three different XML notations that ought to simplify
the definition of patterns:
the “standard ” and the “implicit” XML notations are used to define
compact (but incomplete) patterns. This notation is well suited to
retrieve information. The “explicit” XML notation is
used to precisely describe an XML pattern and all the variables that
have to be instantiated. This notation is particularly well suited to
perform XML transformation since it allows the programmer to precisely
describe how variables have to be instantiated.
To make the XML notation understandable, we have to explain how XML
documents are handled by Tom.
To each XML document corresponds a DOM (Document Object Model)
representation. In Tom, we have defined a mapping from DOM sorts to
abstract algebraic sorts: TNode and TNodeList, which
correspond respectively to Node and NodeList, defined by
the Java DOM implementation.
Thus, a Node object becomes a ternary operator Element whose
first subterm is the name of the XML node, the second subterm is a
list of attributes and the third subterm is a list of subterms (which
correspond to XML sub-elements). The second and the third elements are
terms of sort TNodeList (because they are implemented by
NodeList objects in DOM).
Thus, when considering the <A></A> XML document, the
corresponding algebraic term is Element("A",[],[]), where
[] denotes the empty list.
Similarly, <A><Battribute="name"></B></A> is encoded into
Element("A",[],[Element("B",[Attribute("attribute","name")],[])]).
When defining an XML pattern, the user has to introduce extra
list-variables to precisely describe the XML pattern and capture the
different contexts.
Suppose that we are interested in finding a node <B></B> which is a
subterm of a node <A></A> (but not necessary the first subterm).
The algebraic pattern should be
Element("A",[_*],[_*,Element("B",[_*],[_*]),_*]).
Using the XML notation, this pattern should be expressed as follows:
<A(_*)>(_*,<B(_*)>(_*)</B>,_*)</A>.
This notation (called explicit) is precise but error prone.
This is why we have introduced the explicit notation, where all
context variable can be removed (and () are replaced by
[]): <A[]><B[]>[]</B></A>.
The last notation (called standard XML notation) allows the user to
remove the [] and replace the list-separator (,) by
spaces.
The previous pattern can be written: <A><B></B></A>.
These three different notations allow the user to choose the level of
control he wants to have on the XML pattern matching algorithm.
The formal description of the syntax is the following:
XMLTerm |
::= |
`<' XMLNameList XMLAttrList `/>' |
|
∣ |
`<' XMLNameList XMLAttrList `>'
XMLChilds `</' XMLNameList `>' |
|
∣ |
`#TEXT' `(' Identifier ∣ String `)' |
|
∣ |
`#COMMENT' `(' Identifier ∣ String `)' |
|
∣ |
`#PROCESSING-INSTRUCTION' |
|
|
`('
(Identifier ∣ String) `,' (Identifier ∣ String) `)' |
XMLNameList |
::= |
XMLName |
|
∣ |
`(' XMLName ( `∣' XMLName )* `)' |
XMLAttrList |
::= |
`[' [ XMLAttribute (`,'
XMLAttribute)* ] `]' |
|
∣ |
`(' [ XMLAttribute (`,' XMLAttribute)* ] `)' |
|
∣ |
( XMLAttribute )* |
XMLAttribute |
::= |
`_*' |
|
∣ |
VariableName `*' |
|
∣ |
AttributeName `=' [AnnotedName `@'] ( Identifier ∣ String ) |
|
∣ |
[AnnotedName `@'] `_' `=' [AnnotedName `@'] ( Identifier ∣ String ) |
XMLChilds |
::= |
( Term )* |
|
∣ |
`[' Term ( `,' Term )* `]' |