1. Motivation

Due to the wide acceptance of the World Wide Web, a multitude of information has been made available electronically in the past, and the growth rate of online resources is still increasing. The World Wide Web is used for information, communication, entertainment, commerce, research, et cetera. Web content can be obtained from "free for all" to "the sky is the limit" for a few selected, the quality ranges from valuable to sheer nonsense. While the WWW initially was mainly tailored for human consumption, the demand for machine understandable data is actually increasing [lee02semantic]. This is mainly due to the huge amount of online information, making feasible information retrieval for humans more and more difficult. The technology of the WWW -- originally mainly document oriented -- is passing through a metamorphosis into a data oriented system. This vision will be implemented through the use of the generic markup language XML. XML is an instance of the semi structured data model.

The data oriented view [abiteboul00data] of the WWW brought up the need for general purpose database tools, like query and transformation systems. These demands are mainly satisfied by the standards XPath, XQuery [wadler02xquery] and XSLT. Further query systems for XML are an active field of research [bry02gentle], [schl01b], [bry02xpath], [may01logic], [baru98features].

While reflecting over traditional database query systems, it gets apparent that textual query languages like SQL are mainly used by database experts, administrators or programmers, while visual systems, like those based on QBE [zloof77query], have a stronger acceptance among casual users. Visual query languages seem to be more intuitive and easier to apply, indeed.

As most computer users have access to the WWW, we believe in a broad user base for a visual query system for XML data on the web. Furthermore, the user should be able to describe his query, without having to bother about efficiency -- the queries should be as descriptive as possible, because this leaves much space for automatic query optimisation.

For a wide acceptance of the semantic web vision, a widely spread comprehension of the semi-structured data model is necessary. For this purpose the general sensibility for recognition of hierarchical structures must be aided and supported. A visual query system for XML therefore needs a striking visualisation of hierarchical structures. To minimize the concepts, a pattern based approach is helpful. As soon as the overall nature of semistructured data and the concrete structure of an application domain have been understood, the additional effort to understand patterns is very low. A pattern can be understood as an example using the same formalism as used for a database itself. Non pattern-based query paradigms, e.g. navigational queries, require the introduction of a new formalism to express queries. A novice user hence needs to learn two formalisms for non pattern-based query systems (the database and the query formalism) while only having to learn one formalism when using a pattern-based query approach.

The existing textual query and transformation language Xcerpt [bry02gentle], [bry03pattern], [bry02towards], [bry02the] is used as back end of the proposed query system. Xcerpt is a textual query language based on graph unification. The language follows a rule-based approach comparable to Prolog. A rule consists of a construct term and and/or connected query terms. Thus the queried or transformed XML data itself can also be interpreted as term. Term based rules have been used for a long time in logic programming, and reasoning (aka. program evaluation) is usually based on unification. Unfortunately a classical term interpretation for XML data is too rigid compared to the expressive power of XML data. While term unification relies on the arity and the argument order of the matched terms, XML patterns with varying arity and varying child element order can be interpreted as equal, based on partial equality. Those type of patterns are known as incomplete patterns. For unification of incomplete patterns on graph structures, a technique called simulation unification is used [bry02towards].

In "A Gentle Introduction into Xcerpt, a Rule-based Query and Transformation Language for XML" [bry02gentle] the syntax of Xcerpt is explained with many examples. An outlook about different possible procedural semantics is also given. The declarative semantics of Xcerpt as far as defined is explained in "Towards a Declarative Query and Transformation Language for XML and Semistructured Data: Simulation Unification" [bry02towards]. "Pattern Queries for XML and Semistructured Data (revised version)" [bry03pattern] focuses only on syntax and semantics of query patterns in Xcerpt and visXcerpt. In "The XML Query Language Xcerpt: Design Principles, Examples, and Semantics" [bry02the] an overview of the declarative semantic and language features of Xcerpt is given. This article also presents planned features for future development. The syntax of Xcerpt will also be explained in detail in this thesis.