FAQ for the XML package in R/S-Plus

  • My XML document has attributes that have a namespace prefix (e.g. <node mine:foo="abc" /> ) When I parse this document into S, the namespace prefix on the attribute is dropped. Why and how can I fix it?
    The first thing to do is use a value of TRUE for the addAttributeNamespaces argument in the call to xmlTreeParse.
    The next thing is to ensure that the namespace (mine, in our example) is defined in the document. In other words, there must be be an xmlns:mine="some url" attribute in some node before or in the node that is being processed. If no definition for the namespace is in the document, the libxml parser drops the prefix on the attribute.
    The same applies to namespaces for node names, and not just attributes.
  • I define a method in the closure, but it never gets called.
    The most likely cause is that you omitted to add it to the list of functions returned by the closure. Another possibility is that you have mis-spelled the name of the method. The matching is case-sensitive and exact. If the function corresponds to a particular XML element name, check whether the value of the argument useTagName is T, and also that there really is a tag with this name in the document. Again, the case is important.
  • When I compile the source code, I get lots of warning messages such as
    "RSDTD.c", line 110: warning: argument #2 is incompatible with prototype:
            prototype: pointer to const uchar : "unknown", line 0
            argument : pointer to const char   
          
    This is because the XML libraries work on unsigned characters for UniCode. The R and S facilities do not. I am not yet certain which direction to adapt things for this package. The warnings are typically harmless.
  • When I compile the chapter for Splus5/S4, I get warning messages about SET_CLASS being redefined.
    This is ok, in this situation. The warning is left there to remind people that there are some games being played and that if there are problems, to consider these warnings. The SET_CLASS macro being redefined is a local version for S3/R style classes. The one in the Splus5/S4 header files is for the S4 style classes.
  • On which platfforms does it compile?
    I have used gcc on both Linux (RedHat 6.1) (egcs-2.91.66) and Solaris (gcc-2.7.2.3), and the Sun compilers, cc 4.2 on Solaris 2.6/SunOS 5.6 and cc 5.0 on Solaris 2.7/SunOS 5.7.
  • I can't seem to use conditional DTD segments via the IGNORE/INCLUDE mechanism.
    Libxml doesn't support this. Perhaps we will add code for this.

    Daneil Veillard might add this.

  • When I read a relatively simple tree in Splus5 and print it to the terminal/console, I get an error about nested expressions exceeding the limit of 256.
    The simple fix is to set the value of the expressions option to a value larger than 256.
     options(expressions=1000)
    
    The main cause of this is that S and R are programming languages not specialized for handling trees. (They are functional languages and have no facilities for pointers or references as in C or Java.)
  • I get errors when using parameter entities in DTDs?
    This was true in version 1.7.3 and 1.8.2 of libxml. Thanks to Daneil Veillard for fixing this quickly when I pointed it out.

    Parameters are allowed, but the libxml parsing library is fussy about white-space, etc. The following is is ok

    <!ELEMENT legend  (%PlotPrimitives;)* >
    
    but
    <!ELEMENT legend  (%PlotPrimitives; )* >
    
    is not. The extra space preceeding the ) causes an error in the parser something like
    1: XML Parsing Error: ../Histogram.dtd:80: xmlParseElementChildrenContentDecl : ',' '|' or ')' expected 
    2: XML Parsing Error: ../Histogram.dtd:80: xmlParseElementChildrenContentDecl : ',' expected 
    3: XML Parsing Error: ../Histogram.dtd:80: xmlParseElementDecl: expected '>' at the end 
    4: XML Parsing Error: ../Histogram.dtd:80: Extra content at the end of the document 
    
    This can be fixed by adding a call to SKIP_BLANKS at the end of the loop while(CUR!= ')' { ... } in the routine xmlParseElementChildrenContentDecl() in parser.c The problem lies in the transition between the different input buffers introduced by the entity expansion.
  • I am trying to use XPath and getNodeSet(). But I am not matching any nodes.
    If you are certain that the XPath expression should match what you want, then it is probably a matter of namespaces. If the document in which you are trying to find the nodes has a default namespace (at the top-level node or a sub-node involved in your match), then you have to explicitly identify the namespace. Unfortunately, XPath doesn't use the default namespace of the target document, but requires the namespace to be explicitly mentioned in the XPath expression.

    For example, suppose we have a document that looks like

    
         My Title
           
    ]]>
    
    and we want to use an XPath expression to find the title node. We might think that "/doc/topic/title" would do the trick. But in fact, we need
      /ns:doc/ns:topci/ns:title    
    
    And then we need to map ns to the URI "http://www.omegahat.org". We do this in a call to getNodeSet() as
      getNodeSet(doc, "/ns:doc/ns:topci/ns:title", c(ns = "http://www.omegahat.org"))
    

    As a simplification, getNodeSet() will create the map between the prefix and the URI of the default namespace of the XML document if you specify a single string with no name as the value of the namespaces argument, e.g.

      getNodeSet(doc, "/ns:doc/ns:topci/ns:title", "ns")
    

    There are some additional comments here.


  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Thu May 3 09:04:23 PDT 2007