REWERSE logo

EU IST Network of Excellence, REWERSE logo

RDF Querying

Language Constructs and Evaluation Methods Compared

© 2006 Tim Furche, Benedikt Linse, François Bry, Dimitris Plexousakis, and Georg Gottlob

Abstract (or “What to Expect?”)

Contents

  1. Introduction
    1. Focus of this Article
    2. Structure of this Article
  2. Preliminaries
    1. A Brief Introduction to RDF and RDFS
    2. Running Example: Classification-Based Book Recommender
    3. Sample Queries
  3. RDF Query Language Families
    1. Relational Query Languages: SPARQL, RQL, TRIPLE, Xcerpt
    2. Reactive Rule Query Query Languages: Algae
    3. Navigational Access Query Languages: Versa
  4. Language Constructs Compared
    1. Selection
    2. Construction
    3. Procedural Abstraction
  5. Query Evaluation
    1. Storage of RDF Data
    2. Schema- and Reasoning-aware RDF Querying
  6. Conclusion

1 Introduction

Query Answering on the Semantic Web

Introduction—Focus of this Article

  • Two-proned focus:
    1. introduction into RDF query languages for the Semantic Web
    2. in-depth comparison of the languages introduced along prominent language constructs and concepts
  • Example-centered approach around a sample set of data and queries implemented over all surveyed languages
  • Language selection (arguably a good coverage of proposals, but still subjective):
    • only RDF query languages considered, OWL languages still open issue
    • ‘relational’ or ‘pattern-based’ query languages, SPARQL, RQL, TRIPLE, Xcerpt
    • reactive rule query language Algae
    • ‘navigational access’ query language Versa
  • Complemented by the more extensive, but less focused Web and Semantic Web Query Languages: A Survey of the Summer School Reasoning Web 2005

Introduction—Structure of this Article

Answering Three Questions
  1. What are the core paradigms of each query language?
  2. What language constructs do different languages offer to solve tasks such as path traversal, optional selection, or grouping?
  3. How are they realized?

2 Preliminaries

  1. A Brief Introduction to RDF and RDFS
  2. Running Example: Classification-Based Book Recommender
  3. Sample Queries

Preliminaries—A Brief Introduction to RDF and RDFS

  • RDF data are sets of ‘triples’ (aka ‘statements’) of the form (Subject, Property, Object)
  • RDF data are seen as (unranked, node- and edge-labeled) directed graphs
    • nodes of which are statement's subjects and objects and are either labeled
      • by URIs an thus representing Web resources
      • by literals, such as strings or numbers, thus representing literal resources
      • by ‘local’ identifiers thus representing ‘anonymous’ or ‘blank’ nodes.
    • arcs of which correspond to statement's properties
  • Properties are also called ‘predicates’ (statement analogy)
  • Blank nodes commonly used to aggregate or group statements
    • e.g., in containers or collections
    • or for n-ary relations

Preliminaries—RDF Schema

  • RDF/S allows so-called RDF Schemas (or ontologies) similar to object-oriented class hierarchies or taxonomies
  • Inheritance model of RDF/S exhibits the following peculiarities:
    • same resource may be classified in different, unrelated classes
    • class hierarchy may be cyclic → all classes on cycle equivalent
    • properties are first-class
      • associates range and domain to property, rather than which properties a class can carry
  • Inference rules are used to define the semantics (or entailment) of an RDF/S schema
    • e.g., transitivity of the class hierarchy or
    • inferred type of an untyped resource in the domain of a property

Preliminaries—Predefined Properties and Classes in RDF/S

  • Specific properties are predefined in the RDF and RDFS recommendations, e.g.
    • rdf:type type of resources)
    • rdfs:subClassOf (class-subclass relationships between subjects/objects)
    • rdfs:subPropertyOf (property-subproperty relationships between properties)
  • RDFS uses ‘meta’-classes, e.g.,
    • rdfs:Class (the class of all classes)
    • rdf:Property (the class of all properties)
  • All RDF resources and properties are uniformly treated
    • e.g., classes may be instances of other classes
    • e.g., there is no fixed limit on the number of meta-schema layers
    • e.g., same resource may be property, class, and instance

Preliminaries—RDF Serialization

  • Multitude of serialization formats for RDF
  • ”Official“ W3C serialization format: RDF/XML
    • XML-based
    • high degree of variability, presumably to make data look more "natural"
    • hard to process by non RDF-aware XML tools due to variability
  • Recently many alternative formats
    • Text-based: N3, N-Triples, Turtle
    • XML-based: Simplified RDF/XML, XMP, TriX, RXR

Preliminaries—RDF Serialization

History of RDF
    serialization formats

Preliminaries—Classification-Based Book Recommender

Sample data represented as a (simplified) RDF graph: some facts about Bellum Civile and an excerpt of a topic ontology.

Preliminaries—Representation in Turtle RDF

@prefix foaf: <http://xmlns.org/foaf/0.1/> .
:Writing a rdfs:Class ;   rdfs:label "Novel" .
:Novel   a rdfs:Class ;   rdfs:label "Novel" ;
         rdfs:subClassOf :Writing .
:Essay   a rdfs:Class ;   rdfs:label "Essay" ;
         rdfs:subClassOf :Writing .
:Historical_Essay a rdfs:Class ;
         rdfs:label "Historical Essay"; rdfs:subClassOf :Essay.
:Historical_Novel a rdfs:Class ;
         rdfs:label "Historical Novel" ;
         rdfs:subClassOf :Novel ;       rdfs:subClassOf :Essay .
:author  a rdf:Property ;
         rdfs:domain :Writing ;         rdfs:range foaf:Person .
:translator a rdf:Property ;
         rdfs:domain :Writing ;         rdfs:range foaf:Person .
_:b1     a :Historical_Novel ;
         :title "The First Man in Rome" ;
         :year  "1990"^^xsd:gYear ;
         :author [foaf:name "Colleen McCullough"] .
_:b2     a :Historical_Essay ;
         :title "Bellum Civile" ;
         :author [foaf:name "Julius Caesar"] ;
         :author [foaf:name "Aulus Hirtius"] ;
         :translator [foaf:name "J. M. Carter"] .

Preliminaries—Semantics

  • Multiple inheritance: "Historical Novel" both "Essay" and "Novel"
  • Scalar data typed by XML Schema datatype expressions
    • Specialized mechanism as normal RDF typing is inapplicable to literals
  • Formal meaning of RDF data based on entailment rules (inference)
    • obviously still describes only a part of "human" semantics
  • RDF query languages should take entailed data into consideration when answering questions

Preliminaries—Sample Queries 1-3

  • Nine queries based on classification in [67] and [34]
  • Selection queries retrieve parts of the data based on content, structure, or position
    Query 1: “Select all essays together with their authors (i.e., author items and corresponding names)”
  • Extraction queries extract substructures
    • can be considered as a special form of Selection Queries returning unknown extent results
    Query 2: “Select all data items with any relation to the book titled ‘Bellum Civile’.”
  • Reduction queries: specifying what parts of the data not to include in the answer:
    Query 3: “Select all data items except ontology information and translators from the book recommender system.”

Preliminaries—Sample Query 4: Restructuring

  • Restructuring queries: In Web applications, it is often desirable to restructure data, possibly into different formats or serializations:
    Query 4: “Invert the relation author (from a book to an author) into a relation authored (from an author to a book).”
  • Restructuring needed in RDF context for reification (i.e., statements about statements)
    • single statement replaced by four new statements specifying subject, predicate, object separately
    • e.g., Julius Caesar is author of Bellum Civile becomes
      _:1 a rdf:Statement .        _:1 rdf:subject Julius Caesar .
      _:1 rdf:predicate author   . _:1 rdf:object Bellum Civile .
      

Preliminaries—Sample Query 5-6: Aggregation

  • Aggregation queries a special form of restructuring aggregating several data items into one new data item
  • Value aggregation aggregates multiple (usually atomic) values into a single (atomic) value
    Query 5: “Return the last year in which an author with name ‘Julius Caesar’ published something.”
    • previous example can, e.g., be expressed using max(·) aggregation
  • Structural aggregation aggregates over (structurally) related information, e.g., count(·)
    Query 6: “Return each of the subclasses of ‘Writing’, together with the average number of authors per publication of that subclass.”

Preliminaries—Sample Query 7-9: Combination and Inference

  • Combination queries: combine (or join) information not explicitly connected (e.g., different sources or substructures)
    Query 7: “Combine the information about the book titled ‘The Civil War’ and authored by ‘Julius Caesar’ with the information about the book with identifier bellum_civile.”
  • Inference queries: also combine data, but use that to infer previously not explicit data
    Query 9: “Return the co-author relation between two persons that stand in author relationships with the same book.”
    • If the books entitled ‘Bellum Civile’ and ‘The Civil War’ are the same book, and if ‘Julius Caesar’ is an author of ‘Bellum Civile’, then ‘Julius Caesar’ is also an author of ‘The Civil War’.
    • RDF/S entailment based on inference, e.g.,
      Query 8: “Return the transitive closure of the subClassOf relation.”

3 The RDF Query Language Families

Overview
  1. Relational Query Languages
    1. SPARQL
    2. RQL
    3. Triple
    4. Xcerpt
  2. Reactive Rule Query Query Languages
    1. Algae
  3. Navigational Access Query Languages
    1. SPARQL

Chronological Overview

Chronological overview of RDF query languages: languages are clustered as in the following sections and influences between them are graphically depicated.

Figure 2: Chronological Overview of RDF Query Languages

Relational Query Languages:
SPARQL, RQL, TRIPLE, and Xcerpt

  • Basic query constructs similar to relational selection-projection-join (SPJ) queries
  • But: different presentation of complex queries:
    • SPARQL/RQL: conjunctive n-ary queries over RDF triples;
      variables are used to correlate properties of the same resource and related resources
    • TRIPLE/Xcerpt: (structural) patterns outlining the shape of the sought-for data items;
      structure (nesting) is used to correlate properties of the same resource and related resources
  • Languages vary from
    • conservative extensions of SPJ queries to
    • languages with a substantial number of novel contstructs aiming at more adequat support for RDF specificities

The SPARQL Family

  • SPARQL (for SPARQL Protocol And RDF Query Language) [84]:
    W3C candidate recommendation since April 2006
  • Influenced by SquishQL [76] and RDQL [91], slightly from SeRQL [22] and RQL [57]
  • Syntactic resemblance to (basic) SQL and the Turtle [22] RDF syntax
  • SPARQL in four points:
    • Graph patterns as conjunctions of RDF triples in WHERE clause
    • Variables used for selection, joins, and complex conditions
    • SELECT defines the answer variables of the query
    • CONSTRUCT allows alternatively to specify a graph pattern that is instantiated against variable bindings
PREFIX books: <http://example.org/books#>
SELECT ?essay ?author ?authorName
FROM   <http://example.org/books>
WHERE  { ?essay rdf:type books:Essay .
         ?essay books:author ?author .
         ?author books:name ?authorName . }

SPARQL—Basic Constructs and Syntax

Graph Pattern Syntax
  • Ground pattern are in the Turtle [22] RDF serialization format
  • Predicate-object lists share subject over several triples (predicates and objects separated by ;)
  • Object lists several triples sharing both subject and predicate (objects separated by ,)
  • Variables marked using ? or $ prefix
SELECT-FROM-WHERE Clause
  • SELECT clause specifies list of answer variables
  • alternatively CONSTRUCT specifies graph pattern instantiated against answer variable bindings
  • FROM specifies the URL(s) of the data graph(s) to be queried
  • WHERE specifies the graph pattern

SPARQL—Conditions, FILTER Clause

  • FILTER contains conditional expression over query variables
  • Similar to SQL's or XQuery's WHERE, but
    • Joins are usually implicit (multiple variable occurrence) in triple patterns
    • Literals and b-nodes can be inlined into triple patterns
  • Types of conditional operators:
    • typed boolean comparators (=, <, ...)
    • instance tests for all RDF node kinds
    • regular expressions on Strings or URIs
PREFIX books: <http://example.org/books#>
SELECT ?person
FROM   <http://example.org/books>
WHERE  { ?book books:author ?person .
         ?book books:title ?title .
         FILTER (?title = 'Bellum Civile') }

SPARQL—Optional Triples

  • Optional data := reported in answer if present, but presence not required
    • similar to SQL (left) outer join
  • E.g., “find all books and report in addition their translators, if they have any”

What is the meaning to the following query?

SELECT   ?writing ?translator ?translator-name
FROM     <http://example.org/books>
WHERE    { ?writing books:author _:Author . 
           OPTIONAL { ?writing books:translator ?translator } .
           OPTIONAL { ?translator foaf:name ?translator-name } .
  • Optional is left-associative
  • Solution for AOPTIONAL B is a solution of either AB or of A ∧ ¬B

“Find me all writings that have an author and return also their translator and its name if they have an translator. If they have no translator, return all pairs of subjects and objects in a triple with predicate foaf:name.”

SPARQL—Optional and Negation

  • Semantics of OPTIONAL uses negatio
  • Consequently, we can use OPTIONAL to implement the otherwise missing negation in SPARQL
PREFIX   books: <http://example.org/books#>
SELECT   ?writing
FROM     <http://example.org/books>
WHERE    { ?writing books:author _:Author . 
           OPTIONAL { ?writing books:translator ?translator } .
           FILTER (!bound(?translator)) }

“Return all resources with an author that have no translator”, on which optional triple yields no ?translator bindings

  • Semantics is (obviously) negation-as-failure
  • Despite SPARQL's UNION, OPTIONAL can not be rewritten to a UNION as common from SQL or XQuery due to the lack of negation as first-class concept
  • Is there a justification for this ‘implicit’ negation?

SPARQL—Graph Construction

  • CONSTRUCT clause specifies graph template (syntactically like graph pattern)
    • instantiated against solutions to the remainder of the query
    • variable occurrences are replaced with bindings for each solution tuple
    • for optional variable x triple patterns with x are omitted if x is not bound in the solution
    • B-nodes are instantiated separately for each solution
  • Range restriction: all variables in CONSTRUCT must also occur in remainder of query
CONSTRUCT {?x books:co-author ?y}
FROM      <http://example.org/books>
WHERE     { ?book books:author ?x .
            ?book books:author ?y .
            FILTER (?x != ?y) }
  • Neither grouping nor aggregation is supported
    • no arbitrary container or collection construction
    • no construction of RDF representations of n-ary relations

SPARQL—Extraction Queries

  • Extraction queries only expressible if the result extent is of fixed size
  • Extraction of subgraphs with unknown extent impossible
  • Approximation of Query 2 with only the direct properties:
PREFIX books: <http://example.org/books#>
SELECT ?essay ?property ?propertyValue
FROM   <http://example.org/books>
WHERE  {?essay books:title "Bellum Civile" .
        OPTIONAL { ?essay ?property ?propertyValue } }
  • Notice: no syntactic separation between property and node variables as, e.g., in RQL
  • DESCRIBE: specialized form of extraction query
    • retrieval of resource "descriptions"
    • semantics is undefined, but cf. concise bounded descriptions [93]

SPARQL—Named Graphs

  • Named graph := access to multiple graphs at the same SPARQL "endpoint"
  • Graphs are identified by IRIs and thus can be subject of statements
  • Introduced in TriQL [16]
  • Provides a scoping mechanism for SPARQL variables and negation
  • Triple patterns can be restricted to solutions from a specific named graph
    • in WHERE clause using GRAPH keyword
  • SPARQL query matched against an RDF dataset consisting in
    • a default graph (e.g., from the merge of all graphs specified in FROM clauses)
    • a set of (IRI, graph) pairs (e.g., from FROM NAMED clauses)

Limitations of the SPARQL Family

  • Queries cannot be composed or nested
  • Neither aggregation nor grouping expressible
    • no arbitrary container or collection construction
    • no construction of RDF representations of n-ary relations
  • Negation only in FILTER clauses using OPTIONAL variables
  • Neither recursion nor arbitrary-length traversal operators
    • no transitive-closure type inference
    • no extraction of subgraphs of unknown extent
  • → Relational spirit, mostly a subset of SQL on a single ternary relation
    • + likely easy to use and learn (except for some remaining "quirks")
    • – some limitations seem difficult to justify and comprehend
    • + almost trivial to implement on top of relational DBS
    • – surprisingly weak expressiveness

The RQL Family

  • (Early) RDF query languages: RQL [55] and SeRQL [20], only RQL in the following
  • Focus: combination of data and schema querying
  • Clear separation of three layers in RDF/S:
    • data—schemas—meta-schemas specifying meta-classes such as rdfs:Class
  • Price: slightly non-standard data model, viz. no cycles in subsumption hierarchy
SELECT X, Y FROM {X;books:Essay}books:author.books:authorName{Y}, 
                 {X}books:title{T}
WHERE  T = "Bellum Civile"
USING  NAMESPACE books = &http://example.org/books#
  • SQL-style syntax as SPARQL, but
    • FROM contains triple patterns, no literals
    • WHERE additional conditions (FILTER in SPARQL) including literal restrictions
    • Basic path expressions adorned with variables

RQL—Basic Schema Queries

  • Schema queries: query relations between schema or meta-schema elements
    • e.g., subClassOf(books:Writing) retrieves sub-classes of books:Writing
    • e.g., topclass(books:Historical_Essay) returns top-level of subsumption hierarchy
    • e.g., SELECT X, Y FROM Class{X}, subClassOf(X){Y} for Query 8
  • Domain and range of author property
    • using class variables:
      SELECT $C1, $C2  FROM {$C1}books:author{$C2}
      
    • using a type constraint:
      SELECT C1, C2  FROM Class{C1}, Class{C2}, {;C1}books:author{;C2}  
      
    • without class variables or type constraints:
      SELECT C1, C2  FROM subClassOf(domain(book:author)){C1}, 
                          subClassOf(range(books:author)){C2}
      
    • nearly equivalent except the last only returns proper subclasses

RQL—Data Queries

  • As in SPARQL: Triple (here called graph) patterns specify the shape of the data
  • Additionally: basic path expressions adorned with variables, e.g., Query 1:
    SELECT X, Y, Z FROM {X;books:Essay}books:author{Y}.books:authorName{Z}
    USING  NAMESPACE books = &http://example.org/books#
    
    • {X;books:Essay} limits bindings for X to type books:Essay
  • Conditional expressions in WHERE clauses (even for basic literals)
    SELECT X, Y FROM {X;books:Essay}books:author.books:authorName{Y}, 
                     {X}books:title{T}
    WHERE  T = "Bellum Civile"
    USING  NAMESPACE books = &http://example.org/books#
    
  • Extent queries: for classes books:Writing{X} and properties books:author
    • both also return resources in sub-classes resp. -properties
    • direct extent can be queried by prefixing with ^, e.g., ^books:Writing{X}

RQL—Mixed Schema and Data Queries

  • Data and schema queries can be mixed in all manners
  • Return a “Description” of a resource:
SELECT $C, ( SELECT @P, Y  FROM {Z ; ^$D} ^@P {Y}
             WHERE  Z = X and $D = $C )
FROM   ^$C {X}, {X}books:title{T}  WHERE T = "Bellum Civile"
USING  NAMESPACE books = &http://example.org/books#
  • All classes the resource is classified together with properties and ranges associated with it
  • Example for a grouping query expressed by query nesting
  • Property variables: used to query properties, prefixed by @
SELECT @P, $V  FROM {;books:Writing}@P{$V}
USING  NAMESPACE books = &http://example.org/books#

RQL—Sample Queries

  • Query 1 see above, Query 2 not expressible (no extraction of subgraphs of unknown extent)
  • Reduction queries like Query 3
SELECT S, @P, O
FROM   (Resources minus (SELECT T FROM {B}books:translator{T})){S}, 
       (Resources minus (SELECT T FROM {B}books:translator{T})){O},
       {S}@P{O}
  • Aggregation queries like Query 5
max(SELECT Y 
    FROM   {B;books:Writing}books:author.books:authorName{A},
           {B}books:pubYear{Y}
    WHERE  A = "Julius Caesar")
  • Inference queries without recursion like Query 9 in RVL [66]
CREATE NAMESPACE mybooks = &http://example.org/books-rdfs-extension#
VIEW   mybooks:co-author(A1, A2)
FROM   {Z}books:author{A1}, {Z}books:author{A2}  WHERE A1 != A2

RQL—Limitations and Critique

  • + far more expressive than most SQL-style RDF query languages such as SPARQL or RDQL
    • only transitive closure of arbitrary relations missing to implement all sample queries
  • + Strong type system and expressive schema queries
  • – Criticized for large number of features and choice of syntactic constructs
    • SeRQL [22]: more accessible derivate of RQL
      • additional syntactic shorthands (optional, object-property and object lists)
      • drops most support for typing to reduce complexity
      • drops support for many advanced query constructs for set operations, quantification, aggregation, etc.
    • eRQL [*]: even more radical simplification
      • Information retrieval-style keyword interface
  • – Original proposal lacked graph construction ability, but rectified by RVL

TRIPLE

  • TRIPLE [51, 92] first rule-based query language for RDF
  • Syntax and semantics close to F-Logic [58], convenient for querying irregular data
    • Other approaches based on F-Logic: XPathLog [75], Ontobroker [*]
  • TRIPLE designed to address two weaknesses of previous approaches:
    • just predefined constructs to express RDF/S's semantics (lack of extensibility)
    • lack of formal semantics
  • Horn logic rules in F-Logic syntax
    • used to implement, e.g., RDF/S entailment
    • inherits much of Logic Programming's formal semantics
  • All sample queries can be (more or less obviously) expressed in TRIPLE

TRIPLE—Comparison to Logic Programming

  • TRIPLE supports resources identified by URIs
  • RDF statements are represented in TRIPLE by slots
    • allows grouping and nesting of statements
    • path expressions inspired from [43] for traversal of several properties
  • Concise support for reified statements (enclosed in angle brackets <·>):
    Julius_Caesar[believes
        →<Junius_Brutus[friend-of → Julius_Caesar]>]
  • Module notion to specify a ‘model’ in which a statement, or an atom, is true
  • Explicit quantification of all variables.
rdf   := 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'.
books := 'http://example.org/books#'.
FORALL B, A, AN  result(B, A, AN)  
     B[rdf:type  books:Essay; 
       books:author  A[books:authorName -> AN]]@'http://example.org/books'.

TRIPLE—RDF/S Semantics through Rules

  • Sets of TRIPLE rules to implement, e.g., RDF/S entailment
  • [93] gives the following rules to implement RDF/S semantics:
rdf     := 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'. 
rdfs    := 'http://www.w3.org/2000/01/rdf-schema#'. 
FORALL Mdl @rdfschema(Mdl) {
  transitive(subPropertyOf).    transitive(subClassOf). 
  FORALL O,P,V O[P→V] ← O[P→V]@Mdl. 
  FORALL O,P,V O[P→V] ← 
               EXISTS S S[rdfs:subPropertyOf→P] AND O[S→V]. 
  FORALL O,P,V O[P→V] ← 
               transitive(P) AND EXISTS W (O[P→W] AND W[P→V]). 
  FORALL O,T   O[rdf:type→T] ← 
               EXISTS S (S[rdfs:subClassOf→T] AND O[rdf:type→S]). }
  • Together with the following rules the full entailment is supported in TRIPLE:
  FORALL S,T  S[type→T] ←
              EXISTS P, O (S[P→O] AND P[rdfs:domain→T]).
  FORALL O,T  O[type→T] ←
              EXISTS P, S (S[P→O] AND P[rdfs:range→T]).

TRIPLE—Limitations and Critique

  • + Rules and views give strong reasoning capabilities
  • + High expressiveness through path expressions and rules
  • + Extensible and modular entailment semantics
  • o F-Logic syntax unfamiliar but well suited for irregular data
  • Scalability of existing implementations very limited
  • – Implementation on top of existing DB technology far from obvious
  • – Support for non-RDF data very limited or awkward

Xcerpt

  • Xcerpt [85, 86] is a versatile, pattern-oriented, rule-based query language for the (Semantic) Web
  • Versatile: access to semi-structured data in different formats (e.g., XML, HTML, RDF, Topic Maps) in the same query program with common language constructs
  • Pattern-oriented: Xcerpt's core query construct
    • QBE-style patterns (or examples) of data with variables to indicate sought-for data items or joins
    • may contain (regular) path traversal expressions
  • Rule-based: Xcerpt uses rules to implement reasoning as well as abstract and group common query tasks
  • Incomplete information: Xcerpt's patterns and constructs are tailored to queries where only parts of the schema are known or required, e.g., optional construct
  • Aims to couple high expressiveness with principled and intuitive query syntax and semantics
    • all sample queries can be expressed in Xcerpt

Xcerpt—Relational (Triple) View

  • Two views [17] on RDF data in Xcerpt
    • plain triples with explicit join for structure traversal
    • proper graph with structure traversal following shape of pattern or traversal expressions
  • Query 1 on Xcerpt's triple view: conjunction of triple patterns
GOAL
  result [
    all essay [
      id [ var Essay ], 
      all author [ 
        id [ var Author ], all name [ var AuthorName ]
      ] ] ]
FROM
  and(
    RDFS-TRIPLE [ var Essay, rdf:type, books:Essay ],
    RDF-TRIPLE  [ var Essay, books:author, var Author ],
    RDF-TRIPLE  [ var Author, books:authorName, var AuthorName ] )
END
  • RDFS-TRIPLE instead of RDF-TRIPLE: use of view over RDF/S entailment graph
  • Construction: arbitrary RDF or XML data
  • Provenance information may be queried as well (cf. named graphs, quads)

Xcerpt—Graph View

  • Query 1 on Xcerpt's RDF graph view: structure of query approx. structure of data
GOAL
  result [
    all essay [
      id [ var Essay ], 
      all author [ 
        id [ var Author ], all name [ var AuthorName ]
      ] ] ]
FROM
  RDFS-GRAPH {{ 
    var Essay {{
      rdf:type {{ books:Essay }},
      books:author {{
        var Author {{ 
          books:name {{ var AuthorName }} }}
      }} }} }}
END
  • Structural joins are hidden (implicit conjunction)
    • similar to path expressions but extended to tree or graph patterns
  • Triples are represented similar to striped RDF/XML
    • as Xcerpt does not allow labeled edges
    • extension of Xcerpt with labeled edges is under consideration

Xcerpt—Rules and Versatility

  • Xcerpt, like TRIPLE, uses deductive rules
    • to realize "procedural abstraction" like views in SQL
    • to provide reasoning capabilities, such as RDF/S entailment
    • to provide versatile, "serialization transparent" data access
  • Example: extract all triples from an RXR [4] document
CONSTRUCT
  RDF-TRIPLE[ var Subject, var Predicate:uri{}, var Object ] 
FROM
  and[
    rxr:graph {{
      rxr:triple {
        var S as rxr:subject{{}},
        rxr:predicate{ attributes{ rxr:uri{ var Predicate } } },
        var O as rxr:object{{}}
      }
    }},
    rxr:NODE2URI[ var S, var Subject ], rxr:NODE2URI[ var O, var Object ] ]
END
  • NODE2URI maps different resource representations to URIs
  • RDF access in Xcerpt is subject of ongoing investigation under the lead of Benedikt Linse

Xcerpt—Visual Querying with visXcerpt

  • visXcerpt [11, 12] is Xcerpt's visual companion language
  • Visual rendering of Xcerpt programs using (mostly) CSS rules
  • Visual editing of programs using AJAX-style template-based editor

Xcerpt—Visual Querying with visXcerpt, Example 1

Screenshot and explanation
   of visXcerpt, visual companion language for Xcerpt

Xcerpt—Visual Querying with visXcerpt, Example 2

Second part of screenshot and explanation of visXcerpt, visual companion language for Xcerpt

Xcerpt—Limitations and Critique

  • + High expressiveness, very powerful, yet concise query constructs
  • + Strong rule-based foundation
    • convenient versatile data access & reasoning for, e.g., RDF/S entailment similar to TRIPLE
  • + Dual view of RDF concise formulation of structure- and value-dominated queries
  • + Rich set of constructs specialized to queries in heterogeneous environments with incomplete information about semi-structured data
    • OPTIONAL as in SPARQL but with stronger semantics
    • (regular) path expressions and partial terms for incompleteness in depth and breadth
    • ordered and unordered queries
  • o unique syntax aims to be intuitive and consistent → non-standard look-and-feel
  • challenging implementation due to high expressiveness and misalignment of patterns and simulation unification with existing DB technology
  • – implementations still just prototypes (limited scalability, limited robustness)

Reactive Rule Query Languages

Algae: Reactive Rules

  • Algae [83] reactive rule query language
  • Developed as part of the W3C Annotea project
    • on enhancing Web pages with semantic annotations expressed in RDF
  • Two core concepts: actions and (proofs as) answers
    • "Actions" are the directives ask, assert, and fwrule for querying, insertion, and ECA rules
    • Answers are (a) bindings for query variables plus (b) triples from the RDF graph that form justification or proof of the answer
  • Syntax based on N-triples [46] with some extensions from full N3 [14] and
    • syntax for above mentioned actions (expression in round parentheses)
    • syntax for non-equality constraints (enclosed in curly baces)
    ask (?essay books:year ?year {?year >= 62 && ?year < 301} .)
    
read <http://example.org/books> ()
ask (    ?essay  rdf:type         <http://example.org/books#Essay> .
         ?essay  books:author     ?author .
         ?author books:authorName ?authorName )
collect( ?essay, ?author, ?authorName )

Algae—Optional

  • Add to last query “also return translators, if there are any”
  • ~ used to declare `translator' triple optional
ask (    ?essay rdf:type          <http://example.org/books#Essay> .
         ?essay books:author      ?author .
         ?author books:authorName  ``Julius Caesar'' .
         ?essay books:title       ?title .
         ~?essay books:translator ?translator .  )
collect( ?title, ?translator )
?title ?translator Proof
"Bellum Civile" "J. M. Carter"
_:1 rdf:type <http://exam...ks-rdfs#Essay>.
_:1 books:author _:2.
_:2 books:authorName ``Julius Caesar''.
_:1 books:title ``Bellum Civile''.
_:1 books:translator ``J. M. Carter''.

Algae—Limitations and Critique

  • + ECA-rules and insertion for production rule-style reasoning
  • + Selection of optional data items
  • + N3-triples based syntax well-established in RDF community
  • – No selection or extraction of arbitrary extent subgraphs
  • – No graph construction, no aggregation, no grouping
  • – Very limited implementations
  • Other RDF reactive rules query languages:
    • iTQL [2] in Kowari Metastore, RUL [2], an update extension of RQL

Navigational Access Query Languages

Versa: Navigational Access

  • Versa [77,78,79] navigational access query language inspired by XPath
  • Part of the Python-based 4Suite XML and RDF toolkit
  • Can be used as pattern and selection language in 4Suite's XSLT
  • Centered around traversal expressions:
    • forward traversal follows RDF properties, e.g., all() - books:author -> *
      • starting from resources, selected by resource functions such as all(), type()
      • ending in filter expressions over resources and literals
    • backward traversal follows RDF properties in inverse direction all() <- rdf:type - *
    • general traversal function traverse traverse in specified direction even transitively, e.g.,
      traverse(books:Writing, rdf:subClassOf, vtrav:inverse, vtrav:transitive)
      
  • Filter traversal expressions like in XPath for testing without changing of context
    • only forward filter
    • type(books:Essay) |-books:title-> eq("Bellum Civile") selects essays
    • type(books:Essay) -books:title-> eq("Bellum Civile") selects title literals

Versa—Higher-order Functions I: Iteration

  • Higher-order functions form expressive core of Versa
  • iteration
    • over independent expressions: distribute()
      • evaluate second to n-th argument in context of each result of evaluating the first argument
        distribute(type(books:Essay), ".", 
          "distribute(.-books:author->*, ".", ".-books:authorName->*)")
        
      • as in XPath . indicates the context node
    • over conjuncts: filter()
      • for each result x of the first argument all remaining arguments are evaluated as boolean expressions with x as context and x is retained if all expressions evaluate to true
      filter(books:Essay <- rdf:type - *, 
        ". - books:title -> eq('Bellum Gallicum')", 
           ". - books:translator -> books:translatorName -> eq('J. M. Carter')"
      

Versa—Higher-order Functions II: Aggregation and Sets

  • Grouping and aggregation
    • e.g., Query 5, last year an author with name "Julius Caesar" published
    max(filter(all(), 
         ". - books:author -> books:authorName -> eq('Julius Caesar')" ) 
       - books:year -> *)
    
    • e.g., Query 6, each sublass of Writing together with average number of authors per publication
    distribute(traverse(books:Writing, rdf:subClassOf, 
                       vtrav:inverse,vtrav:transitive), 
              ".", 
              "avg(length((. <- rdf:type *) - books:author -> *))" )
    
  • Set functions (sorting, map, filter)
    • e.g., to extract all ontology information:
    difference(all(), 
       union(type(rdfs:Class), 
             union(type(rdf:Property,
                   all() <- books:translator - *) ) ) )
    

Versa—Limitations and Critique

  • + High expressiveness due to arbitrary traversal expressions and higher-order functions
  • + Convenient and highly expressive traversal expressions
  • Lack of procedural abstraction mechanism (embedding in host language may consider that)
  • – Very unfamiliar syntax both in comparison to XPath and to SQL-type languages
  • Navigational access query languages: mostly proposals of limited and immature nature
    • RDF Path [80], RPath [74], RxPath [94], RDFT [38], [62]
  • Proposals using XML query languages on canonized XML representations of RDF
    • [86, 87], TreeHugger [95], and RDF Twig [97]

4 Language Constructs Compared

What you should have learned until now: Languages
What is coming up: Constructs

Overview

  1. Selection
    1. Triple Patterns vs. Path Expressions
    2. Closure Subgraph Extraction
    3. Schema-aware Selection
    4. Optional Selection and Disjunctions
    5. RDF Specificities
  2. Construction
    1. Graph Construction vs. Selection-only
    2. Graph Construction
    3. Construction of XML Results
  3. Procedural Abstraction

Selection

  • Selection := ability to characterize
    • subsets of the queried data that match the user's query intent
  • Basic functionality of any query language
  • Relational data:
    • known schema, thus selection via attribute values and relations to other data items
  • Semi-structured data:
    • centered around position of the sought-for data items in the structure (~ “structural relations”)
  • → richer selection constructs.

Triple Patterns vs. Path expressions

Triple patterns
  • Basic form of selection construct
  • Roughly similar to relational selection-(projection-)join query
  • Consists of a conjunction of one or more query triples: like data triples
    • but extended with query constructs such as variables
Ground triple patterns (~ relational selection) in SPARQL
  ?essay books:title "Bellum Civile"
General triple patterns (~ selection-join query) in SPARQL
  ?essay books:author ?author. 
  ?author foaf:name "Julius Caesar"

Path Expressions

  • Observation: frequent structural joins occur in both XML and RDF data
    • "traverse" and "correlate" nodes in the RDF graph to select the actually used data items
  • Succinct specification core issue of access to tree and graph
    • omitting (existentially quantified) intermediary variables
    • dot-notation in the context of relational (GEM [100]), and object-oriented data (OQL [29])
    • path expressions in the context of object-oriented and semi-structured data (XPath [33]).
  • All the following classes of XPath also exhibit polynomial time complexity
Path-like omission of intermediary b-nodes in SPARQL
  ?essay books:author [ foaf:name "Julius Caesar" ].
Path expressions in RQL
  {Essay}books:author.foaf:name{A}.

Classes of Path Expressions

Basic path expressions
  • Purely abbreviations for triple patterns as seen in SPARQL or RQL
  • Only fixed length traversals, no added expressiveness over triple patterns only
  • Languages: GEM [100], OQL [29], SPARQL [84], RQL [57]
Unrestricted closure path expressions
  • Arbitrary-length unrestricted traversal, i.e., over any nodes and edges
  • Expressible with linear recursive views
  • Infrequent in RDF: lack of dominating hierarchical relation as in XML
  • Languages: XPath [33], Xcerpt desc [88]

Classes of Path Expressions (cont.)

Generalized or regular path expressions
  • Full regular expressions over path traversals (repetition, alternatives)
  • E.g.., a*.((b|c).e)+ traverses all paths of
    • arbitrary many a properties followed by
    • at least one repetition of either a b or a c
    • in each case followed by an e
  • Polynomial time complexity w.r.t. data and query size
  • Expressible with recursive views but result excrutiatingly complex
  • Languages: Versa [79], Lorel, Xcerpt qualified desc

Classes of Path Expressions (cont.)

Classes of Path Expressions (cont.)

Summary
  • Path expressions convenient and succinct for traversals
  • Surprisingly few RDF languages consider closure path traversal
  • Essential for efficient and convenient expression of many queries
    • "Incomplete" information on connection paths
    • Variable connection paths

Closure Subgraph Extraction

  • Schema of RDF data often only vaguely known
  • extent of interesting portions of data often not known statically
    • e.g., extract all information on one given book
  • Solvable in languages providing closure path expressions
    • regular path expressions often needed
    • e.g., to extract all related ontology information (connected using only certain relations)

Constructs for Closure Subgraph Extraction

  • Needed in languages with only triple patterns and basic path expressions
  • Built in closure for certain predefined relations
    • e.g., RQL, see next slide
  • Built in closure subgraph extraction construct
    • concise bounded descriptions (CBDs) of RDF resources
      • immediate properties and all properties reachable with only blank resources in between
    • SPARQL DESCRIBE
      • relevant and representative information about resources
      • semantic not specified in the language
      • CBDs one possible semantics

Schema-aware Selection

  • RQL: matching against RDF entailment graph
    • but acyclic subsumption hierarchy
    • only subset of axiomatic triples to ensure finite answers
    • entailment is coded into the query engine
  • Triple, Xcerpt: use rules for RDFS entailment
    • configurable set of entailment rules
    • Triple allows also external "models"
  • SPARQL/RDQL/SeRQL: some implementations choose built in entailment
    • no provisions in the query language
    • no distinction between "materialized" and "inferred" triples

Optional Selection and Disjunctions

  • So far: selection purely conjunctive queries
  • This section: disjunction or equivalent union constructs
    • “to find colleagues of a researcher from an RDF graph containing bibliography and conference information”, one might choose to select co-authors, as well as co-editors, and members in the same program committee.
  • Disjunction more common place on RDF than on relational data
    • all properties are by default optional
    • many queries retrieve in addition to core properties optional data items to be reported

Examples of Optional Queries

SPARQL Query with Optional
  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?translator } }
Equivalent SPARQL Query with Union
  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
             ?writing books:translator ?translator } 
           UNION
           { ?writing a books:Essay }

Slight difference in semantics: books with at least one translator are still reported also once with empty binding for ?translator

Semantics for Optionals

Aoptional Boptional C

  • How to treat multiple optional parts?
    • if optionals are independent → order irrelevant
    • if optionals are dependent → order has effect:
      match for B might prevent match for C and vice versa
  • Three different semantics for interdependent optionals:
    • Independent treatment of optionals
    • Maximized optionals
    • All-or-nothing optionals

Independent Treatment of Optionals

  • Impose order on optional clauses to resolve interdependencies
  • SPARQL uses the lexical order of the optional clauses
Interdependent Optionals in SPARQL

The following query selects essays together with translators and, if that translator is also an author, also the author name.

  SELECT   ?writing, ?person, ?name
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?person } 
           OPTIONAL { ?writing books:author ?person . 
                      ?person foaf:name ?name } }

Switching order of the optionals changes semantics → second optional superfluous
Select all essays together with authors and author names (if there are any).

Independent Treatment of Optionals (cont.)

  • Lexical order of interdependent optionals equivalent to nested optionals
  SELECT   ?writing, ?person, ?name
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?person 
              OPTIONAL { ?writing books:author ?person . 
                         ?person foaf:name ?name } 
           } }
  • Only applies to interdependent optionals not to independent ones

Union Semantics for Optionals

  • UNION can be used to express OPTIONAL
  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
             ?writing books:translator ?translator } 
           UNION
           { ?writing a books:Essay }
  • Semantics often different as with "real" optional:
    • includes additional null-values bindings for ?translator even if a translator exists
  • Can be used in SPARQL, SQL, RDQL, and similar languages

Maximized optionals

  • Considers any order of optionals
    • e.g., first binds translators, than check whether they are also authors
    • or first binds authors and author names and then checks whether the authors are also translators
  • Returns only results where maximal subset of optional variables are bound
  • More involved than the other semantics
  • No increase in complexity as interdependent optionals are already NP-complete
  • Equivalent to rewriting of otional to disjunctions with negated clauses
    • Aoptional Boptional C equivalent to
    • (A ∧ ¬ B ∧ ¬ C) ∨ (A ∧ ¬ B ∧ C) ∨ (AB ∧ ¬ C) ∨ (ABC)
  • Introduced in and for Xcerpt

All-or-nothing Optionals

  • Rare case of optional
  • Either all optional parts or non are matched
  • Expressible using a single optional clause over conjunctions and disjunctions
  • Can be achieved, e.g., in SPARQL and Xcerpt

RDF Specificities

  • So far: general issues for semi-structured query languages
  • Now: Adequacy of RDF query languages by looking at its specificities
Blank Nodes
  • For matching: (skolemized) blank nodes just like any other resource
  • For results: special consideration, see below
Reification
  • Just syntactical and representational conventions, no specific semantics
  • Still syntax in query language convenient
  • SeRQL and Triple, e.g., provide such syntax

Selection of Collection and Containers

  • Can be handled as merely vocabulary and representational conventions
  • Evaluation engine may optimize for these specific access patterns

Sequence container ⟨A, B, C⟩ is reduced to:

 _:1 rdf:type rdf:Sequence
 _:1 rdf:_1   A
 _:1 rdf:_2   B
 _:1 rdf:_3   C

Similarly, collections are reduced to binary relations of rdf:first and rdf:last:

 _:1 rdf:first A
 _:1 rdf:rest  _:2
 _:2 rdf:first B
 _:2 rdf:rest  _:3
 _:3 rdf:first C
 _:3 rdf:rest rdf:nil

Selection of Collection and Containers (cont.)

  • Querying these representations is challenging in many RDF QLs
    • e.g., just “to select all members of a container or collection”
  • In case of a collection: specific construct or regular path expressions
    • Most RDF query languages can not express this
    • Using regular path expression rdf:first.(rdf:rest.rdf:first)*
  • In case of a container: specific construct or regular expression over URIs
  SELECT   ?contained_resource
  WHERE    { ?C ?P ?contained_resource . 
            FILTER(regex(str(?P),
              "http://www.w3.org/1999/02/22-rdf-syntax-ns#_\d+")) }
  • RQL—specialized constructs: R in C to test membership of resource R in container C

Construction

Overview
  1. Graph Construction vs. Selection-only
  2. Graph Construction
  3. Construction of XML Results

Graph Construction vs. Selection-only

SELECT R, count(SELECT @P FROM {R @P }
FROM  {R}books:author{A}
WHERE  A = "Julius Caesar"

Graph Construction

The basic form of graph construction in SPARQL is

CONSTRUCT { ?R ?P ?O }
WHERE     { ?R books:author "Julius Caesar". ?R ?P ?O }

Collections and Containers

Minimal Result Graphs

Conditional Construction

Unscoped optional construction

Conditional Construction

Scoped optional construction
Full conditional construction

Construction of XML Results

Procedural Abstraction

5 Query Evaluation

Methods for RDF query evaluation differ in several aspects:

Focus of this article: non-distributed evaluation of conjunctive queries on triple/quadruple stores on disks.

Storage of RDF Data

RDF Storage in Berkeley Databases

According to the directory of the Free Software Foundation14, the Berkeley Database is

Usage of Berkeley DB in RDF storage

Storage of RDF at the aid of Relational Database engines

RDF storage in Jena1

Storage of RDF at the aid of Relational Database engines

Lessons learnt in Jena2

Storage of RDF data in 3store

Statements table:

model (int64) subject (int64) predicate (int64) object (int64) literal (boolean) inferred (boolean)

Model-, URI- and Literals-table:

hash (int64) model (text)
hash (int64) uri (text)
hash (int64) literal (text)

Storage of RDF data in 3store (continued)

RDF Storage in Sesame

Postgres back-end
MySQL back-end

RDF Storage in RDFSuite

Path Based Storage of RDF Data

Path Based Storage of RDF Data (continued)

Performance Comparison with Jena2 (by Matono et. al)

RDF Storage in Object Databases

  • Idea: RDF graphs are decomposed to triples for storage in RDBs and must be reconstructed for querying
  • Solution: storage of RDF graphs as objects in an OODB
  • Storage without further reorganization
  • translation of RQL to OQL
  • Fastobjects as underlying OODB
  • All edges/vertices are realized as objects, graph is encoded by references
  • Performance comparison with Sesame on top of Postgres conducted by Bönström et al. show better performance with hybrid and schema queries
object-oriented model of an RDF Graph

Index Structures for RDF

Suffix arrays

Index Structures for RDF (Suffix Arrays 1)

Suffix arrays applied to RDF graphs

Index Structures for RDF (Suffix Arrays 2)

Discussion of Suffix Arrays

Index Structures for RDF (Quadruples and Substring Searches)

Two major challenges spotted by Hart et al:
Ideas and method of the approach
(1) The lexicon index

Index Structures for Quadruples and Substring Searches

(2) Quad Indexes
NoAccess Pattern
1(?:?:?:?)
2(s:?:?:?)
3(s:p:?:?)
4(s:p:o:?)
5(s:p:o:c)
......
spocpococs cspcpos
(?:?:?:?)(?:p:?:?)(?:?:o:?) (?:?:?:c)(?:p:?:c)(s:?:o:?)
(s:?:?:?)(?:p:o:?)(?:?:o:c) (s:?:?:c)
(s:p:?:?)(?:p:o:c)(s:?:o:c) (s:p:?:c)
(s:p:o:?)
(s:p:o:c)

Schema- and Reasoning-aware RDF Querying

Different Perceptions:
Implementation of RDF/S Semantics
Labeling Schemes for RDF/S

Schema- and Reasoning-aware RDF Querying: Labeling Schemes

Bitvector, Prefix and Interval Labeling Schemes

Schema and Reasoning-aware RDF Querying: Rete Algorithm

The Rete algorithm

Schema and Reasoning-aware RDF Querying: Rete Algorithm (continued)

The Rete algorithm

6 Conclusion

Acknowledgments

This research has been funded by the European Commission and by the Swiss Federal Office for Education and Science within the 6th Framework Programme project REWERSE number 506779 (cf. http://rewerse.net).

Bibliography

All references can be found in the article

Tim Furche, Benedikt Linse, François Bry, Dimitris Plexousakis, and Georg Gottlob:
"RDF Querying: Language Constructs and Evaluation Methods Compared".
In: Reasoning Web, Second International Summer School 2006, P. Barahona et al., (Eds.) Springer-Verlag, LNCS 4126, pp. 1–52, 2006.
© Springer-Verlag Berlin Heidelberg 2006

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 License.