Documentation:XML

From Tom

Jump to: navigation, search
Doc : Tutorial

Level 1 - Introduction  > Level 2 - List matching  > Level 3 - Strategies  > Advanced - Mappings  > Writing a (small) Parser/Compiler/Interpreter  > XML  > Playing with EMF

Even if this section mostly deals with Xml features, every explained technique can be used with other data-structures as well.

Contents

Manipulating Xml documents

This example is inspired from a practical study. It involves a simple modeling of the Platform for Privacy Preferences Project (P3P), developed by the World Wide Web Consortium, which is emerging as an industry standard providing a simple, automated way for users to gain more control over the use of personal information on Web sites they visit.

Given a client and a server, the problem consists in verifying that the client’s policy is compliant with the server’s policy. Both policies preferences are expressed in APPEL (A P3P Preference Exchange Language) and written in Xml. The server’s policy is the following file server.xml:

<POLICIES xmlns="http://www.w3.org/2002/01/P3Pv1">
  <POLICY name="mypolicy" discuri="http://www.ibm.com/privacy"
          opturi="http://www.ibm.com/privacy" xml:lang="en">
    <STATEMENT>
      <RECIPIENT> <delivery/> </RECIPIENT>
      <PURPOSE> <contact/> </PURPOSE>
      <DATA-GROUP>
        <DATA ref="#dynamic.clickstream"/>
        <DATA ref="#dynamic.http"/>
        <DATA ref="#dynamic.clientevents"/>
        <DATA ref="#user.home-info.postal.country"/>
        <DATA ref="#dynamic.cookies"> <CATEGORIES> <content/> </CATEGORIES> </DATA>
      </DATA-GROUP>
    </STATEMENT>
  </POLICY>
</POLICIES>

The client’s policy is the following file client.xml:

<RULESET appel="http://www.w3.org/2002/04/APPELv1"
         p3p="http://www.w3.org/2000/12/P3Pv1"
         crtdby="W3C" crtdon="2001-02-19T16:21:21+01:00">
  ...
  <RULE behavior="limited1" prompt="yes">
    <POLICY>
      <STATEMENT>
        <PURPOSE connective="and-exact"> <current/> </PURPOSE>
        <RECIPIENT connective="or">
          <other-recipient/>
          <public/>
          <unrelated/>
        </RECIPIENT>
      </STATEMENT>
    </POLICY>
  </RULE>
  ...
  <RULE behavior="request" prompt="yes">
    <POLICY>
      <STATEMENT>
        <DATA-GROUP>
          <DATA ref="#dynamic.clientevents"> </DATA>
          <DATA ref="#dynamic.clickstream"/>
        </DATA-GROUP>
      </STATEMENT>
    </POLICY>
  </RULE>
  ...
</RULESET>

For expository reasons, we consider only a sub-problem in which we say that a client’s policy is compatible with the server’s policy if all ref attributes from a <DATA></DATA> node also appear on the server side. In the considered examples, the policies are compatible because

<DATA-GROUP>
  <DATA ref="#dynamic.clientevents"> </DATA>
  <DATA ref="#dynamic.clickstream"/>
</DATA-GROUP>

is included in:

<DATA-GROUP>
  <DATA ref="#dynamic.clickstream"/>
  <DATA ref="#dynamic.http"/>
  <DATA ref="#dynamic.clientevents"/>
  <DATA ref="#user.home-info.postal.country"/>
  <DATA ref="#dynamic.cookies"> <CATEGORIES> <content/> </CATEGORIES> </DATA>
</DATA-GROUP>

The problem consists in implementing such a verification tool in Tom and Java.

Loading Xml documents

The first part of the program declares imports, and defines the main and the run methods.

import tom.library.xml.*;
import tom.library.adt.tnode.*;
import tom.library.adt.tnode.types.*;
import aterm.*;
import java.util.*;
  public class Evaluator {
    %include{ adt/tnode/TNode.tom }
    private static XmlTools xtools;
    public static void main (String args[]) {
    xtools = new XmlTools();
    TNode server = xtools.convertXMLToTNode("server.xml");
    TNode client = xtools.convertXMLToTNode("client.xml");
    boolean compatible = compareDataGroup(getDataGroup(server.getDocElem()),
                                          getDataGroup(client.getDocElem()));
    System.out.println("result = " + compatible);
  }
  ...
}

As explained in Part documentation:Tom (Tom language description), to each Xml document corresponds a DOM (Document Object Model) representation. In Tom, we have defined a mapping from DOM sorts to abstract algebraic sorts: TNode and TNodeList, which correspond respectively to Node and NodeList, defined by the Java DOM implementation.

This mapping has to be initialized in the following way:

  • the import of tom.library.adt.tnode.* and tom.library.adt.tnode.types.* defines the sorts TNode and TNodeList and allows us to build objects over these two sorts.
  • the %include{ TNode.tom } Tom construct is similar to the #include C preprocessor construct. In this case, it imports the definition of Tom algebraic constructors needed to manipulate Xml documents.

In complement to Tom, we provide a runtime library which contains several methods to manipulate Xml documents. In particular, we provide the XmlTools class (defined in the tom.library.xml.* package).

The run method takes two filenames as arguments, reads the associated files and convert their Xml content into TNode terms (using the convertXMLToTNode function).

The getDataGroup function is used to retrieve a <DATA></DATA> node in the considered Xml document. Note that getDocElem is first applied to Xml documents in order to consider subterms of the DocumentNode element. Then, the compareDataGroup function is used to check that the client’s policy is compatible with the server’s policy. The content of these functions is detailed in the next sections.

Retrieving information

Given an Xml subtree, the problem consists in extracting a <DATA-GROUP></DATA-GROUP> node. For expository reasons, we first consider that such a <DATA-GROUP></DATA-GROUP> node can only appear at two different specific places: one for the client’s definition and one for the server’s definition.

Thus, in the Tom program we naturally consider two cases:

private static TNode getDataGroup(TNode doc) {
  %match(doc) {
    <POLICIES>
       <POLICY>
         <STATEMENT>
           datagroup@<DATA-GROUP></DATA-GROUP>
         </STATEMENT>
       </POLICY>
    </POLICIES> -> { return `datagroup; }
      <RULESET>
      <RULE>
        <POLICY>
          <STATEMENT>
            datagroup@<DATA-GROUP></DATA-GROUP>
           </STATEMENT>
        </POLICY>
      </RULE>
    </RULESET> -> { return `datagroup; }
  }
  return `xml(<DATA-GROUP/>);
}

The first pattern means that a <DATA-GROUP></DATA-GROUP> node has to be found under a <STATEMENT></STATEMENT> node, which should be under a <POLICY></POLICY> node, which should be under a <POLICIES></POLICIES> node. Once such a pattern is found, the mechanism allows us to give a name (datagroup) to this subterm and reuse it in the action part: return datagroup;.

The Xml notation implicitly extends the given patterns by adding context variables. Thus, the <DATA-GROUP></DATA-GROUP> pattern means that a subterm those head symbol is <DATA-GROUP></DATA-GROUP> is searched. But this subterm can contain attributes and children even if it is not explicitly defined by the notation. To make the definition of patterns more precise, the user can use the explicit notation and define the following pattern: <DATA-GROUP (_*)>(_*)</DATA-GROUP>. A more detailed explanation can be found in the XML Pattern part (Tom language description).

Comparing two Xml subtrees

Given two <DATA-GROUP></DATA-GROUP> nodes, the problem consists in checking that all <DATA></DATA> nodes from the first tree also appear in the second one. This can be easily implemented by the following couple of functions:

private static boolean compareDataGroup(TNode server, TNode client) {
  boolean res = true;
  %match(client) {
    <DATA-GROUP><DATA ref=reftext></DATA></DATA-GROUP>
    -> { res = res && appearsIn(`reftext,server); }
  }
  return res;
}

Given a <DATA-GROUP></DATA-GROUP> tree, we look for a <DATA></DATA> subtree which contains an attribute named ref. When such an attribute exists, its content (a string) is stored into the Java reftext variable. The appearsIn function is then used to check that the attribute also appears on the server side.

private static boolean appearsIn(String refclient, TNode server) {
  %match(server) {
    <DATA-GROUP><DATA ref=reftext></DATA></DATA-GROUP>
    -> {
       if(`reftext.equals(refclient)) {
         return true;
       }
     }
  }
  return false;
}

This piece of code is interesting because it introduces what we call "conditional rules". Given the string refclient we look for a <DATA></DATA> subtree containing a ref attribute with exactly the same string. Since it is not possible to use an instantiated variable in a pattern (something like <DATA ref=refclient>...</DATA>, we have to introduce a fresh variable reftext and check in a condition that this variable is equal to refclient. This is done via a Java condition (the if clause): the action part (return true;) is executed only when the condition is satisfied, and clearly it will end the method. If the condition is not satisfied, then the next <DATA></DATA> subtree is searched. It is exactly the case of a switch statement, when the action part is not exited by a return, break or goto, and the control flow is transferred to the next matching solution or the next matching pattern.

If no such subtree exists, this means that the two policies are not compatible and false is returned.

Retrieving information using strategies

The Tom strategy library provides a set of generic operators in order to perform various kinds of traversal over a term. This is useful when searching a pattern somewhere in a term, at any position. As the Xml documents are seen by Tom as terms, the strategy library is also valid when dealing with Xml.

In the current example, the getDataGroup functions looks for a <DATA-GROUP></DATA-GROUP> node. Instead of statically specifying the path (POLICIES, POLICY, STATEMENT, DATA-GROUP and RULESET, RULE, POLICY, STATEMENT, DATA-GROUP) we could have used the composed strategy TopDownCollect provided by the package tom.library.sl.*. In the following, we generalize the previous problem by considering that we can have more than one appearance of the <DATA-GROUP></DATA-GROUP> node per Xml document.

We further expose the use of the strategy library. The collectDatagroup method creates a strategy TopDownCollect(collectDatagroup(set)) and then it calls the visit method on the Xml document. This strategy collects every occurence of <DATA-GROUP></DATA-GROUP> in the XML document. Using the %strategy statement, we define the basic strategy collectDatagroup parameterized by a Java set:

%include{ java/util/types/Set.tom }
%include{ sl.tom }
 
%strategy collectDatagroup(set:Set) extends `Identity() {
   visit TNode {
     t@<DATA-GROUP> </DATA-GROUP> -> {
       set.add(`t);
       throw new VisitFailure();
     }
   }
}
 
 
protected static void collectDatagroup(final Set bag, TNode doc) {
   try {
     `TopDownCollect(collectDatagroup(bag)).visitLight(doc);
   } catch(VisitFailure e) {}
}

Given a collection, the getDataGroup method uses the Java for statement to iterate on the set and select the first collected <DATA-GROUP></DATA-GROUP> node.

private static TNode getDataGroup(TNode doc) {
  HashSet<TNode> c = new HashSet();
  collectDatagroup(c,doc);
  for(TNode datagroup: c) {
    return datagroup;
  }
  return `xml(<DATA-GROUP/>);
}

Note: Do not forget to add import tom.library.sl.*; to be able to compile and execute your code.

Building and sorting Xml/DOM documents

In this section, we consider a DOM mapping for Xml documents. This means that Xml documents are still considered as algebraic terms (built over TNode and TNodeList), but their internal representation is backed by the W3C DOM library.

In the following, we consider a small data-base represented by an Xml document person.xml:

<Persons>
  <Person Age="30"> <FirstName> Paul   </FirstName> </Person>
  <Person Age="28"> <FirstName> Mark   </FirstName> </Person>
  <Person Age="21"> <FirstName> Jurgen </FirstName> </Person>
  <Person Age="21"> <FirstName> Julien </FirstName> </Person>
  <Person Age="24"> <FirstName> Pierre-Etienne </FirstName> </Person>
</Persons>

The problem consists in sorting this document according to different criteria (age or first-name for example).

Loading Xml documents

The first part of the program declares imports, and defines the main method.

import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.File;
  public class PersonSort1 {
  private static Document dom;
  %include{ dom.tom }
    public static void main (String args[]) {
    try {
      dom = DocumentBuilderFactory.newInstance()
        .newDocumentBuilder().parse("person.xml");
      Element e = dom.getDocumentElement();
      dom.replaceChild(sort(e),e);
      Transformer transform = TransformerFactory.newInstance().newTransformer();
      StreamResult result = new StreamResult(new File("Sorted.xml"));
      transform.transform(new DOMSource(dom), result);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
  ...
}

The mapping has to be initialized in the following ways:

  • the import of org.w3c.dom.* and javax.xml.* packages in order to use the DOM library.
  • the %include{ dom.tom } construct imports the definition of Tom algebraic constructors needed to manipulate DOM objects.
  • the dom variable has to be instantiated. This variable corresponds to the notion of Document in the DOM terminology.

Comparing two nodes

In order to implement a sorting algorithm, we need to know how to compare two elements.

The following function takes two Xml documents in argument and compares their attribute Age, using the string comparison operator compareTo.

private static int compare(Node t1, Node t2) {
  %match(t1, t2) {
    <Person Age=a1></Person>, <Person Age=a2></Person> -> {
      return `a1.compareTo(`a2);
    }
  }
  return 0;
}

In this example, it is interesting to note that an Xml node is seen as an associative operator. This feature, combined with the implicit notation, is such that <Person Age=a1></Person> will match any Xml node headed by Person which contains the attribute Age. This Xml node may contain several sub-nodes, but they will not be stored in any variable.

Sorting a list of nodes

In this section we use a swap sort algorithm which consists in finding two elements that are not in the correct order. Once they are found, they are swapped before calling the algorithm recursively. When no two such elements exist, this means that the list is sorted.

private static Node sort(Node subject) {
  %match(subject) {
   <Persons>(X1*,p1,X2*,p2,X3*)</Persons> -> {
      if(compare(`p1,`p2) > 0) {
        return sort(`xml(dom,<Persons>X1* p2 X2* p1 X3*</Persons>));
      }
    }
  }
  return subject;
}

The pattern <Persons>(X1*,p1,X2*,p2,X3*)</Persons> looks for two elements p1 and p2 and stores the remaining contexts in X1*, X2*, and X3*. In order to give a name to contexts (and then retrieve them to build a new list), the pattern is written in explicit notation. This notation prevents Tom to add extra variables. Otherwise, the expanded form of <Persons>X1* p1 X2* p2 X3*</Persons> (written in explicit notation) would have been <Persons>(_*,X1*,_*,p1,_*,X2*,_*,p2,_*,X3*,_*)</Persons>.

Note that the action part is guarded by a condition (if(compare(p1,p2) > 0)). This means that return sort(...) is executed only when two bad-ordered elements are found. When p1 and p2 are correctly ordered, the matching procedure continues and extracts another couple p1 and p2. As mentioned in the Tom Pattern part, when Tom cannot find two elements p1 and p2 such that the condition is satisfied (the list is sorted), the return sort(...) statement is not executed and the control flow is transferred to the following statement: return subject;

As mentioned previously, when two elements have to be swapped, a new list is built. This done via the (backquote) construct. This construct, used before to retrieve instantiated variables, can also be use to build a new term. In order to build an Xml document, the xml(...) function has to be used. The number of parameters depends on the Xml mapping which is used. In our case, when manipulating DOM objects, a new subtree is built is the context of a main document. This extra-parameter is given in the first argument of the xml function. Thus, ‘xml(dom,<Persons>...<Persons>) builds a new <Person></Person> node, as a child of the dom document. When using the TNode.tom mapping, a library based on Gom is used. In this case, it is no longer necessary (and even not correct) to use this extra argument.

Sorting by side effect

In the previous section, we considered a sorting algorithm expressed in a pure functional programming style: when two elements have to be swapped, a new list is built and returned.

In the following example, we exploit OO-style of the DOM library and perform a swap in place: the list is updated by swapping two elements (with side effect):

private static void sort(Node subject) {
  %match(subject) {
    r @ <_>p1@<_ Age=a1></_> p2@<_ Age=a2></_></_> -> {
      if(`a1.compareTo(`a2) > 0) {
        `r.replaceChild(`p2.cloneNode(true),`p1);
        `r.replaceChild(`p1,`p2);
        `sort(r);
        return;
      }
    }
  }
}

In this example, we use the notion of anonymous Xml node (<_>...</_>) which allows to describe more generic algorithms.

Tutorial

Level 1 - Introduction  > Level 2 - List matching  > Level 3 - Strategies  > Advanced - Mappings  > Writing a (small) Parser/Compiler/Interpreter  > XML  > Playing with EMF

Tom Documentation
Guided Tour :: Tutorial :: Language Reference :: Tools
Personal tools
Create a book