Home · All Classes · Modules

QtXmlPatterns Module

The QtXmlPatterns module implements PyQt's XQuery support. More...

Types


Detailed Description

An introduction to PyQt's XQuery support.

To import the module use, for example, the following statement:

from PyQt4 import QtXmlPatterns

XQuery is a pragmatic language that allows XML to be queried and created in fast, concise and safe ways.

Introduction

 <bibliography>
 {
     doc("library.xml")/bib/book[publisher = "Addison-Wesley" and @year > 1991]/
     <book year="{@year}">{title}</book>
 }
 </bibliography>

The query opens the file library.xml, and for each book element that is a child of the top element bib, and whose attribute by name year is larger than 1991 and has Addison-Wesley as a publisher, it constructs a book element and attaches it to the parent element called bibliography.

Why use XQuery?

XQuery is tailor made for selecting and aggregating information in safe and efficient ways. Hence, if an application selects and navigates data, XQuery could be a possible candidate for implementing that in a quick and bug-free manner. With QAbstractXmlNodeModel, these advantages are not constrained to XML files, but can be applied to other data as well.

Maybe XQuery can be summarized as follows:

On top of that the language is designed to be high level such that it is easy to analyze what the user is computing. With this, it is easier to optimize both speed and memory use of XML operations.

Using QtXmlPatterns

Evaluating queries can be done via an ordinary Qt C++ API and using a command line interface.

C++ API

Configuring the Build Process

Applications that use QtXmlPatterns' classes need to be configured to be built against the QtXmlPatterns module. To include the definitions of the module's classes, use the following directive:

      #include <QtXmlPatterns>

To link against the module, add this line to your qmake .pro file:

      QT += xmlpatterns

QtXmlPatterns is part of the Qt Desktop Edition, Qt Open Source Edition and the Qt Console Edition. Note that QtXmlPatterns is disabled when building Qt, if exceptions are disabled or if a compiler that doesn't support member templates, such as MSVC 6, is used.

See QXmlQuery for how to use the C++ API.

Command line utility

A command line utility called xmlpatterns is installed and available like the other command line utilities such as moc or uic. It takes a single argument that is the filename of the query to execute:

xmlpatterns myQuery.xq

The query will be run and the output written to stdout.

Pass in the -help switch to get a brief description printed to the console, such as how to bind variables using the command line.

The command line utility's interface is stable for scripting, but descriptions and help messages are not designed for the purpose of automatic parsing, and can change in undefined ways in a future release of Qt.

XQuery HOWTO

See A Short Path to XQuery for a round of XQuery.

Qt's and XQuery's Data Model

XQuery and Qt has different data models. All data in XQuery takes the form of sequences of items, where an item is either a node, or an atomic value. Atomic values are the primitives found in W3C XML Schema, and nodes are usual XML nodes, although they might represent other things using QXmlNodeModelIndex and QAbstractXmlNodeModel.

Atomic values, when not being serialized, are represented with QVariant. The mappings are as follows.

From XQuery To Qt
xs:integer QVariant.LongLong
xs:string QVariant.String
xs:double QVariant.Double
xs:float QVariant.Double
xs:boolean QVariant.Bool
xs:decimal QVariant.Double
xs:hexBinary QVariant.ByteArray
xs:base64Binary QVariant.ByteArray
xs:time Not supported because xs:time has a zone offset, and QTime does not. Use xs:dateTime, or convert the value to xs:string.
xs:date QVariant.DateTime
xs:dateTime QVariant.DateTime
xs:gYear QVariant.DateTime
xs:gYearMonth QVariant.DateTime
xs:gMonthDay QVariant.DateTime
xs:gDay QVariant.DateTime
xs:gMonth QVariant.DateTime
xs:string* QVariant.StringList
xs:anyURI QVariant.Url
xs:untypedAtomic QVariant.String
xs:ENTITY QVariant.String
xs:QName QXmlName. Note that the returned QXmlName can only be used with the QXmlQuery instance that it was created with.


From Qt To XQuery
QVariant.LongLong xs:integer
QVariant.Int xs:integer
QVariant.UInt xs:nonNegativeInteger
QVariant.ULongLong xs:unsignedLong
QVariant.String xs:string
QVariant.Double xs:double
QVariant.Bool xs:boolean
QVariant.Double xs:decimal
QVariant.ByteArray xs:base64Binary
QVariant.Date xs:date. The QDate is assumed to be in timezone UTC.
QVariant.Time QTime cannot properly represent xs:time. Convert QTime to a QDateTime with a valid arbitrary date, and bind the time as a QDateTime instead.
QVariant.DateTime xs:dateTime
QVariant.StringList xs:string*
QVariant.Url xs:string
QVariantList A sequence of atomic values, whose type is the same as the first item in the QVariantList instance. If all the items in the QVariantList is not of the same type, behavior is undefined.
Any other type It is not supported and will either lead to undefined behavior, or an unexisting variable binding, depending on context.


Integrating with Custom Data

XQuery is a language designed for, and modeled on XML. However, it doesn't have to be constrained to that. By sub-classing QAbstractXmlNodeModel one can write queries on top of any data that can be modeled as XML.

By default when QtXmlPatterns is asked to open files or to produce content, this is done using an internal representation. For instance, in this query:

 <result>
     <para>The following Acne removers have shipped, ordered by shipping date(oldest first):</para>
     {
         for $i in doc("myOrders.xml")/orders/order[@product = "Acme's Acne Remover"]
         order by xs:date($i/@shippingDate) descending
         return $i
     }
 </result>

an efficient internal representation is used for the file myOrders.xml. However, by sub-classing QAbstractXmlNodeModel one can write a query on any data, by mapping XML elements and attributes to the custom data model. For instance, one could write a QAbstractXmlNodeModel sub-class that mirrors the file system hierarchy like this:

 <?xml version="1.0" encoding="UTF-8"?>
 <directory name="home">

     <file name="myNote.txt" mimetype="text/plain" size="8" extension="txt" uri="file:///home/frans/myNote.txt">
         <content asBase64Binary="TXkgTm90ZSE=" asStringFromUTF-8="My Note!"/>
     </file>

     <directory name="src">
         ...
     </directory>

     ...

 </directory>

and hence have a convenient way to navigate the file system:

 <html>
     <body>
         {
             $myRoot//file[@mimetype = 'text/xml' or @mimetype = 'application/xml']
             /
             (if(doc-available(@uri))
              then ()
              else <p>Failed to parse file {@uri}.</p>)
         }
     </body>
 </html>

Converting a data model to XML(text) and then read it in with an XML tool has been one approach to this, but that has disadvantages such as being inefficient. The XML representation is separated from the actual data model, and that two representations needs to be maintained simultaneously in memory.

With QAbstractXmlNodeModel this conversion is not necessary, nor are two representation kept at the same time, since QXmlNodeModelIndex is a small, efficient, stack allocated value. Also, since the XQuery engine asks the QAbstractXmlNodeModel for the actual data, the model can create elements, attributes and data on demand, depending on what the query actually requests. For instance, in the file system model above, the model doesn't have to read in the whole file system or encoded the content of a file until it is actually asked for.

In other words, with QAbstractXmlNodeModel it's possible to have one data model, and then use the power of the XQuery language on top.

Some examples of possible data models could be:

The documentation for QAbstractXmlNodeModel has the details for implementing this.

More on Custom Data

Since QtXmlPatterns isn't constrained to XML but can use custom data directly, it turns XQuery into a mapping layer between different custom models or custom models and XML. Once QtXmlPatterns can understand the data, simple queries can be used to select in it, or to simply write it out as XML using QXmlQuery.serialize().

Consider a word processor application that needs to be able to import and export different formats. Instead of having to write C++ code that converts between the different formats, one writes a query that goes from on type of XML, such as MathML, to another XML format: the one for the document representation that the DocumentRepresentation class below exposes.

In the case of CSV files, which are text, a QAbstractXmlNodeModel sub-class is used again in order to expose the comma-separated file as XML, such that a query can operate on it.

Security Considerations

Query Injection

XQuery is subject to query injection in the same manner that SQL is. If a query is constructed by concatenating strings where some of the strings are from user input, the query can be altered by carefully crafting malicious strings, unless they are properly escaped.

The best solution against these attacks is typically to never construct queries from user-written strings, but instead input the user's data using variable bindings. This avoids all query injection attacks.

See Avoid the dangers of XPath injection, Robi Sen or Blind XPath Injection, Amit Klein for deeper discussions.

Denial of Service Attacks

QtXmlPatterns has, as all other systems, limits. Generally, these are not checked. This is not a problem for regular use, but it does mean that a malicious query can relatively easy be constructed that causes code to crash or to exercise undefined behavior.

Features and Conformance

Conformance

QtXmlPatterns aims at being a conformant XQuery implementation. In addition to supporting minimal conformance, the serialization and full-axis features are supported. 97% of the tests in W3C's test suite for XQuery passes, as of this writing, and it is expected this will improve over time. Areas where conformance is not tip top and where behavior changes may happen in future releases are:

XML 1.0 and XML Namespaces 1.0 are supported, as opposed to the 1.1 versions. When strings are fed into the query using QStrings, the characters must be XML 1.0 characters. Otherwise, the behavior is undefined. This is not checked.

Since XPath 2.0 is a subset of XQuery 1.0, that is supported too.

The specifications discusses conformance further: XQuery 1.0: An XML Query Language. W3C's XQuery testing effort can be of interest as well, XML Query Test Suite.

Currently fn:collection() does not access any data set, and there is no API for providing data through the collection. As a result, evaluating fn:collection() returns the empty sequence. We hope to provide functionality for this in a future release of Qt.

When opening XML files, this is done with support for xml:id. In practice this means elements that has an attribute by name xml:id, can be looked up fairly quickly with the fn:id() function. See xml:id Version 1.0 for details.

Note: Only queries encoded in UTF-8 are supported.

Resource Loading

When QtXmlPatterns attempts to load XML resources, such as via XQuery's fn:doc() function, the following schemes are supported:

Scheme Name Description
file Local files.
data The bytes are encoded in the URI itself. For instance, data:application/xml,%3Ce%2F%3E is <e/>.
ftp Resources retrieved via FTP.
http Resources retrieved via HTTP.
https Resources retrieved via HTTPS. This will succeed if no SSL errors are encountered.
qrc Qt Resource files. Expressing it as an empty scheme, :/..., is not supported.


URIs are first passed to QAbstractUriResolver(see QXmlQuery.setUriResolver()) for possible rewrites.


PyQt 4.4.2 for X11Copyright © Riverbank Computing Ltd and Trolltech AS 2008Qt 4.4.0