Even if this section mostly deals with Xml features, every explained
technique can be used with other data-structures as well.
Note:
this section is not very up-to-date. In particular, the explanations about
genericTraversal functions should be replaced by some strategies.
4.1 Manipulating Xml documents
This example is inspired from a practical study. It involves a
simple modeling of the Platform for Privacy Preferences Project
(P3P), developed by the World Wide Web Consortium, which is emerging
as an industry standard providing a simple, automated way for users
to gain more control over the use of personal information on Web
sites they visit.
Given a client and a server, the problem
consists in verifying that the client's policy is compliant with the
server's policy.
Both policies preferences are expressed in APPEL (A P3P Preference
Exchange Language) and written in Xml.
The server's policy is the following file server.xml:
<POLICIES xmlns="http://www.w3.org/2002/01/P3Pv1">
<POLICY name="mypolicy" discuri="http://www.ibm.com/privacy"
opturi="http://www.ibm.com/privacy" xml:lang="en">
<STATEMENT>
<RECIPIENT> <delivery/> </RECIPIENT>
<PURPOSE> <contact/> </PURPOSE>
<DATA-GROUP>
<DATA ref="#dynamic.clickstream"/>
<DATA ref="#dynamic.http"/>
<DATA ref="#dynamic.clientevents"/>
<DATA ref="#user.home-info.postal.country"/>
<DATA ref="#dynamic.cookies"> <CATEGORIES> <content/> </CATEGORIES> </DATA>
</DATA-GROUP>
</STATEMENT>
</POLICY>
</POLICIES>
The client's policy is the following file client.xml:
<RULESET appel="http://www.w3.org/2002/04/APPELv1"
p3p="http://www.w3.org/2000/12/P3Pv1"
crtdby="W3C" crtdon="2001-02-19T16:21:21+01:00">
...
<RULE behavior="limited1" prompt="yes">
<POLICY>
<STATEMENT>
<PURPOSE connective="and-exact"> <current/> </PURPOSE>
<RECIPIENT connective="or">
<other-recipient/>
<public/>
<unrelated/>
</RECIPIENT>
</STATEMENT>
</POLICY>
</RULE>
...
<RULE behavior="request" prompt="yes">
<POLICY>
<STATEMENT>
<DATA-GROUP>
<DATA ref="#dynamic.clientevents"> </DATA>
<DATA ref="#dynamic.clickstream"/>
</DATA-GROUP>
</STATEMENT>
</POLICY>
</RULE>
...
</RULESET>
For expository reasons, we consider only a sub-problem in which we say
that a client's policy is compatible with the server's policy if
all ref attributes from a <DATA></DATA> node also appear on the
server side.
In the considered examples, the policies are compatible because
<DATA-GROUP>
<DATA ref="#dynamic.clientevents"> </DATA>
<DATA ref="#dynamic.clickstream"/>
</DATA-GROUP>
is included in:
<DATA-GROUP>
<DATA ref="#dynamic.clickstream"/>
<DATA ref="#dynamic.http"/>
<DATA ref="#dynamic.clientevents"/>
<DATA ref="#user.home-info.postal.country"/>
<DATA ref="#dynamic.cookies"> <CATEGORIES> <content/> </CATEGORIES> </DATA>
</DATA-GROUP>
The problem consists in implementing such a verification tool in Tom
and Java.
4.1.1 Loading Xml documents
The first part of the program declares imports, and defines the
main and the run methods.
import tom.library.xml.*;
import tom.library.adt.tnode.*;
import tom.library.adt.tnode.types.*;
import aterm.*;
import java.util.*;
public class Evaluator {
%include{ adt/tnode/TNode.tom }
private XmlTools xtools;
public static void main (String args[]) {
xtools = new XmlTools();
TNode server = (TNode)xtools.convertXMLToATerm("server.xml");
TNode client = (TNode)xtools.convertXMLToATerm("client.xml");
boolean compatible = compareDataGroup(getDataGroup(server.getDocElem()),
getDataGroup(client.getDocElem()));
System.out.println("result = " + compatible);
}
...
}
As explained in Part II (Tom language description), to each
Xml document
corresponds a DOM (Document Object Model) representation. In Tom, we
have defined a mapping from DOM sorts to abstract algebraic sorts:
TNode and TNodeList, which correspond respectively to
Node and NodeList, defined by the Java DOM implementation.
This mapping has to be initialized in the following way:
-
the import of tom.library.adt.tnode.* and
tom.library.adt.tnode.types.* defines the sorts TNode and
TNodeList and allows us to build objects over these two
sorts.
- the %include{ TNode.tom } Tom construct is similar
to the #include C preprocessor construct. In this case,
it imports the definition of Tom algebraic constructors needed
to manipulate Xml documents.
In complement to Tom, we provide a runtime library which contains
several methods to manipulate Xml documents. In particular, we
provide the XmlTools class (defined in the
tom.library.xml.* package).
The run method takes two filenames as arguments, reads the
associated files and convert their Xml content into TNode terms
(using the convertXMLToATerm function).
The getDataGroup function is used to retrieve a
<DATA></DATA> node in the considered Xml document. Note that
getDocElem is first applied to Xml documents in order to
consider subterms of the DocumentNode element. Then, the
compareDataGroup function is used to check that the client's
policy is compatible with the server's policy. The content of these
functions is detailed in the next sections.
4.1.2 Retrieving information
Given an Xml subtree, the problem consists in extracting a
<DATA-GROUP></DATA-GROUP> node.
For expository reasons, we first consider that such a <DATA-GROUP></DATA-GROUP>
node can only appear at two different specific places:
one for the client's definition and one for the server's definition.
Thus, in the Tom program we naturally consider two cases:
private static TNode getDataGroup(TNode doc) {
%match(doc) {
<POLICIES>
<POLICY>
<STATEMENT>
datagroup@<DATA-GROUP></DATA-GROUP>
</STATEMENT>
</POLICY>
</POLICIES> -> { return datagroup; }
<RULESET>
<RULE>
<POLICY>
<STATEMENT>
datagroup@<DATA-GROUP></DATA-GROUP>
</STATEMENT>
</POLICY>
</RULE>
</RULESET> -> { return datagroup; }
}
return `xml(<DATA-GROUP/>);
}
The first pattern means that a <DATA-GROUP></DATA-GROUP> node has to be found
under a <STATEMENT></STATEMENT> node, which should be under a
<POLICY></POLICY> node, which should be under a <POLICIES></POLICIES>
node.
Once such a pattern is found, the mechanism allows us to
give a name (datagroup) to this subterm and reuse it in the
action part: return datagroup;.
The Xml notation implicitly extends the given patterns by adding
context variables. Thus, the <DATA-GROUP></DATA-GROUP> pattern means that
a subterm those head symbol is <DATA-GROUP></DATA-GROUP> is searched. But this
subterm can contain attributes and children even if it is not
explicitly defined by the notation. To make the definition of
patterns more precise, the user can use the explicit notation and
define the following pattern: <DATA-GROUP (_*)>(_*)</DATA-GROUP>. A
more detailed explanation can be found in Part II (Tom
language description).
4.1.3 Comparing two Xml subtrees
Given two <DATA-GROUP></DATA-GROUP> nodes, the problem consists in checking
that all <DATA></DATA> nodes from the first tree also appear in the
second one.
This can be easily implemented by the following couple of functions:
private static boolean compareDataGroup(TNode server, TNode client) {
boolean res = true;
%match(client) {
<DATA-GROUP><DATA ref=reftext></DATA></DATA-GROUP>
-> { res = res && appearsIn(reftext,server); }
}
return res;
}
Given a <DATA-GROUP></DATA-GROUP> tree, we look for a <DATA></DATA>
subtree which contains an attribute named ref. When such an
attribute exists, its content (a string) is stored into the Java
reftext variable. The appearsIn function is then used to
check that the attribute also appears on the server side.
private static boolean appearsIn(String refclient, TNode server) {
%match(server) {
<DATA-GROUP><DATA ref=reftext></DATA></DATA-GROUP>
-> {
if(reftext.equals(refclient)) {
return true;
}
}
}
return false;
}
This piece of code is interesting because it introduces what we call
"conditional rules". Given the string refclient we look for a
<DATA></DATA> subtree containing a ref attribute with
exactly the same string. Since it is not possible to use an
instantiated variable in a pattern (something like <DATA
ref=refclient>...</DATA>, we have to introduce a fresh variable
reftext and check in a condition that this variable is equal
to refclient. This is done via a Java condition (the
if clause): the action part (return true;) is executed
only when the condition is satisfied, and clearly it will end the
method. If the condition is not satisfied, then the next
<DATA></DATA> subtree is searched. It is exactly the case of a
switch statement, when the action part is not exited by a
return, break or goto, and the control flow is
transferred to the next matching solution or the next matching
pattern.
If no such subtree exists, this means that the two policies are not
compatible and false is returned.
4.1.4 Retrieving information using traversal functions
The Tom runtime library provides a set of generic functions in
order to perform various kinds of traversal over a term. This is
useful when searching a pattern somewhere in a term, at any
position. As the Xml documents are seen by Tom as terms, the
different traversals on terms are also valid when dealing with
Xml.
In the current example, the getDataGroup functions looks for
a <DATA-GROUP></DATA-GROUP> node. Instead of statically specifying the
path (POLICIES, POLICY, STATEMENT, DATA-GROUP and
RULESET, RULE, POLICY, STATEMENT, DATA-GROUP) we could have
used the generic traversal mechanism provided by the package
tom.library.traversal.*.
private static GenericTraversal traversal = new GenericTraversal();
In the following, we generalize the previous problem by considering
that we can have more than one appearance of the
<DATA-GROUP></DATA-GROUP> node per Xml document.
We further expose the use of the generic traversal mechanism. The
collectDatagroup method creates an inner class
Collect1 with one method, apply, and then it calls the
genericCollect method of the traversal object created
previously with an object of this inner class as a parameter.
protected static void collectDatagroup(final Collection collection, TNode subject) {
Collect1 collect = new Collect1() {
public boolean apply(ATerm t) {
if(t instanceof TNode) {
%match(t) {
<DATA-GROUP> </DATA-GROUP> -> {
collection.add(t);
return false;
}
}
}
return true;
} // end apply
}; // end new
traversal.genericCollect(subject, collect);
}
Let us now analyze the inner class: firstly, it extends the class
Collect1, where 1 stands for the number of arguments
of the apply method, and secondly it implements the method
apply. The method call traversal.genericCollect will
call this method for all subtrees of the subject term. Since
subject may contain subterms of different sorts (integers or
strings for example), the instanceof construct is used as a
first filter to select only the subterms which correspond to Xml
nodes. When such a subterm is rooted by a <DATA-GROUP></DATA-GROUP>, the
subterm is added to the collection (by a side-effect). The
return false; statement indicates that it is no longer
necessary to traverse the considered term (in our example, a
<DATA-GROUP></DATA-GROUP> node cannot be found inside a
<DATA-GROUP></DATA-GROUP> itself). Respectively, the return true;
statement indicates that the apply function should be
recursively applied to the current term in order to continue the
traversal.
Given a collection, the getDataGroup method uses an iterator to
select a <DATA-GROUP></DATA-GROUP> node.
private static TNode getDataGroup(TNode doc) {
HashSet c = new HashSet();
collectDatagroup(c,doc);
Iterator it = c.iterator();
while(it.hasNext()) {
TNode datagroup = (TNode)it.next();
return datagroup;
}
return `xml(<DATA-GROUP/>);
}
4.2 Building and sorting Xml/DOM documents
In this section, we consider a DOM mapping for Xml documents. This
means that Xml documents are still considered as algebraic terms
(built over TNode and TNodeList), but their internal
representation is backed by the W3C DOM library.
In the following, we consider a small data-base represented by an Xml
document person.xml:
<Persons>
<Person Age="30"> <FirstName> Paul </FirstName> </Person>
<Person Age="28"> <FirstName> Mark </FirstName> </Person>
<Person Age="21"> <FirstName> Jurgen </FirstName> </Person>
<Person Age="21"> <FirstName> Julien </FirstName> </Person>
<Person Age="24"> <FirstName> Pierre-Etienne </FirstName> </Person>
</Persons>
The problem consists in sorting this document according to different
criteria (age or first-name for example).
4.2.1 Loading Xml documents
The first part of the program declares imports, and defines the
main method.
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.File;
public class PersonSort1 {
private static Document dom;
%include{ dom.tom }
public static void main (String args[]) {
try {
dom = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse("person.xml");
Element e = dom.getDocumentElement();
dom.replaceChild(sort(e),e);
Transformer transform = TransformerFactory.newInstance().newTransformer();
StreamResult result = new StreamResult(new File("Sorted.xml"));
transform.transform(new DOMSource(dom), result);
} catch (Exception e) {
e.printStackTrace();
}
}
...
}
The mapping has to be initialized in the following ways:
-
the import of org.w3c.dom.* and
javax.xml.* packages in order to use the DOM library.
- the %include{ dom.tom } construct imports the
definition of Tom algebraic constructors needed to manipulate
DOM objects.
- the dom variable has to be instantiated. This variable
corresponds to the notion of Document in the DOM terminology.
4.2.2 Comparing two nodes
In order to implement a sorting algorithm, we need to know how to
compare two elements.
The following function takes two Xml documents in argument and compares
their attribute Age, using the string comparison operator compareTo.
private static int compare(Node t1, Node t2) {
%match(t1, t2) {
<Person Age=a1></Person>, <Person Age=a2></Person> -> {
return `a1.compareTo(`a2);
}
}
return 0;
}
In this example, it is interesting to note that an Xml node is seen as
an associative operator.
This feature, combined with the implicit notation, is such that
<Person Age=a1></Person> will match any Xml node headed by
Person which contains the attribute Age.
This Xml node may contain several sub-nodes, but they will not be
stored in any variable.
4.2.3 Sorting a list of nodes
In this section we use a swap sort algorithm which consists in finding two
elements that are not in the correct order. Once they are found, they
are swapped before calling the algorithm recursively.
When no two such elements exist, this means that the list is sorted.
private static Node sort(Node subject) {
%match(subject) {
<Persons>(X1*,p1,X2*,p2,X3*)</Persons> -> {
if(compare(`p1,`p2) > 0) {
return sort(`xml(dom,<Persons>X1* p2 X2* p1 X3*</Persons>));
}
}
}
return subject;
}
The pattern <Persons>(X1*,p1,X2*,p2,X3*)</Persons> looks for two elements
p1 and p2 and stores the remaining contexts in
X1*, X2*, and X3*.
In order to give a name to contexts (and then retrieve
them to build a new list), the pattern is written in explicit
notation. This notation prevents Tom to add extra
variables.
Otherwise, the expanded form of <Persons>X1* p1 X2* p2 X3*</Persons>
(written in explicit notation) would have been
<Persons>(_*,X1*,_*,p1,_*,X2*,_*,p2,_*,X3*,_*)</Persons>.
Note that the action part is guarded by a condition
(if(compare(p1,p2) > 0)). This means that return
sort(...) is executed only when two bad-ordered elements are found.
When p1 and p2 are correctly ordered, the
matching procedure continues and extracts another couple p1 and
p2.
As mentioned in Part II, when
Tom cannot find two elements p1 and p2 such that the
condition is satisfied (the list is sorted), the return
sort(...) statement is not executed and the control flow is
transferred to the following statement:
return subject;
As mentioned previously, when two elements have to be swapped, a new
list is built. This done via the ``' (backquote) construct. This
construct, used before to retrieve instantiated variables, can also
be use to build a new term.
In order to build an Xml document, the xml(...) function has to
be used. The number of parameters depends on the Xml mapping which is
used.
In our case, when manipulating DOM objects, a new subtree is built is
the context of a main document. This extra-parameter is given in the
first argument of the xml function. Thus,
`xml(dom,<Persons>...<Persons>) builds a new <Person></Person>
node, as a child of the dom document.
When using the TNode.tom mapping, a library based on Gom
is used. In this case, it is no longer necessary (and even not correct) to use
this extra argument.
4.2.4 Sorting by side effect
In the previous section, we considered a sorting algorithm expressed
in a pure functional programming style: when two elements have to be
swapped, a new list is built and returned.
In the following example, we exploit OO-style of the DOM library and
perform a swap in place: the list is updated by swapping two elements (with
side effect):
private static void sort(Node subject) {
%match(subject) {
r @ <_>p1@<_ Age=a1></_> p2@<_ Age=a2></_></_> -> {
if(`a1.compareTo(`a2) > 0) {
`r.replaceChild(`p2.cloneNode(true),`p1);
`r.replaceChild(`p1,`p2);
`sort(r);
return;
}
}
}
}
In this example, we use the notion of anonymous Xml node (<_>...</_>)
which allows to describe more generic algorithms.