Abstract (or “What to Expect?”)

Introduction into query languages for the Semantic Web
In-depth comparison of the introduced languages
Language selection (arguably a good coverage of proposals, but still subjective):
- only RDF query languages considered, OWL languages still open issue
- "relational" or "pattern-based" query languages, SPARQL, RQL, TRIPLE, Xcerpt
- reactive rule query language Algae
- "navigational access" query language Versa

2 Preliminaries

A Brief Introduction to RDF and RDFS
Running Example: Classification-Based Book Recommender
Sample Queries

Preliminaries—A Brief Introduction to RDF and RDFS

RDF data are sets of ‘triples’ (aka ‘statements’) of the form (Subject, Property, Object)
RDF data are seen as (unranked, node- and edge-labeled) directed graphs
- nodes of which are statement's subjects and objects and are either labeled
  - by URIs an thus representing Web resources
  - by literals, such as strings or numbers, thus representing literal resources
  - by ‘local’ identifiers thus representing ‘anonymous’ or ‘blank’ nodes.
- arcs of which correspond to statement's properties
Properties are also called ‘predicates’ (statement analogy)
Blank nodes commonly used to aggregate or group statements
- e.g., in containers or collections
- or for n-ary relations

Preliminaries—RDF Schema

RDF/S allows so-called RDF Schemas (or ontologies) similar to object-oriented class hierarchies or taxonomies
Inheritance model of RDF/S exhibits the following peculiarities:
- same resource may be classified in different, unrelated classes
- class hierarchy may be cyclic → all classes on cycle equivalent
- properties are first-class
  - associates range and domain to property, rather than which properties a class can carry
Inference rules are used to define the semantics (or entailment) of an RDF/S schema
- e.g., transitivity of the class hierarchy or
- inferred type of an untyped resource in the domain of a property

Preliminaries—Predefined Properties and Classes in RDF/S

Specific properties are predefined in the RDF and RDFS recommendations, e.g.
- rdf:type type of resources)
- rdfs:subClassOf (class-subclass relationships between subjects/objects)
- rdfs:subPropertyOf (property-subproperty relationships between properties)
RDFS uses ‘meta’-classes, e.g.,
- rdfs:Class (the class of all classes)
- rdf:Property (the class of all properties)
All RDF resources and properties are uniformly treated
- e.g., classes may be instances of other classes
- e.g., there is no fixed limit on the number of meta-schema layers
- e.g., same resource may be property, class, and instance

Preliminaries—RDF Serialization

Multitude of serialization formats for RDF
”Official“ W3C serialization format: RDF/XML
- XML-based
- high degree of variability, presumably to make data look more "natural"
- hard to process by non RDF-aware XML tools due to variability
Recently many alternative formats
- Text-based: N3, N-Triples, Turtle
- XML-based: Simplified RDF/XML, XMP, TriX, RXR

Preliminaries—RDF Serialization

History of RDF
serialization formats

Preliminaries—Classification-Based Book Recommender

Sample data represented as a (simplified) RDF graph: some facts about Bellum Civile and an excerpt of a topic ontology.

Preliminaries—Representation in Turtle RDF

@prefix foaf: <http://xmlns.org/foaf/0.1/> .
:Writing a rdfs:Class ;   rdfs:label "Novel" .
:Novel   a rdfs:Class ;   rdfs:label "Novel" ;
         rdfs:subClassOf :Writing .
:Essay   a rdfs:Class ;   rdfs:label "Essay" ;
         rdfs:subClassOf :Writing .
:Historical_Essay a rdfs:Class ;
         rdfs:label "Historical Essay"; rdfs:subClassOf :Essay.
:Historical_Novel a rdfs:Class ;
         rdfs:label "Historical Novel" ;
         rdfs:subClassOf :Novel ;       rdfs:subClassOf :Essay .
:author  a rdf:Property ;
         rdfs:domain :Writing ;         rdfs:range foaf:Person .
:translator a rdf:Property ;
         rdfs:domain :Writing ;         rdfs:range foaf:Person .
_:b1     a :Historical_Novel ;
         :title "The First Man in Rome" ;
         :year  "1990"^^xsd:gYear ;
         :author [foaf:name "Colleen McCullough"] .
_:b2     a :Historical_Essay ;
         :title "Bellum Civile" ;
         :author [foaf:name "Julius Caesar"] ;
         :author [foaf:name "Aulus Hirtius"] ;
         :translator [foaf:name "J. M. Carter"] .

Preliminaries—Semantics

Multiple inheritance: "Historical Novel" both "Essay" and "Novel"
Scalar data typed by XML Schema datatype expressions
- Specialized mechanism as normal RDF typing is inapplicable to literals
Formal meaning of RDF data based on entailment rules (inference)
- obviously still describes only a part of "human" semantics
RDF query languages should take entailed data into consideration when answering questions

Preliminaries—Sample Queries 1-3

Nine queries based on classification in [67] and [34]
Selection queries retrieve parts of the data based on content, structure, or position
Query 1: “Select all essays together with their authors (i.e., author items and corresponding names)”
Extraction queries extract substructures
- can be considered as a special form of Selection Queries returning unknown extent results
Query 2: “Select all data items with any relation to the book titled ‘Bellum Civile’.”
Reduction queries: specifying what parts of the data not to include in the answer:
Query 3: “Select all data items except ontology information and translators from the book recommender system.”

Preliminaries—Sample Query 4: Restructuring

Restructuring queries: In Web applications, it is often desirable to restructure data, possibly into different formats or serializations:
Query 4: “Invert the relation author (from a book to an author) into a relation authored (from an author to a book).”
Restructuring needed in RDF context for reification (i.e., statements about statements)
- single statement replaced by four new statements specifying subject, predicate, object separately
- e.g., Julius Caesar is author of Bellum Civile becomes
```
_:1 a rdf:Statement .        _:1 rdf:subject Julius Caesar .
_:1 rdf:predicate author   . _:1 rdf:object Bellum Civile .
```

Preliminaries—Sample Query 5-6: Aggregation

Aggregation queries a special form of restructuring aggregating several data items into one new data item
Value aggregation aggregates multiple (usually atomic) values into a single (atomic) value
Query 5: “Return the last year in which an author with name ‘Julius Caesar’ published something.”
- previous example can, e.g., be expressed using max(·) aggregation
Structural aggregation aggregates over (structurally) related information, e.g., count(·)
Query 6: “Return each of the subclasses of ‘Writing’, together with the average number of authors per publication of that subclass.”

Preliminaries—Sample Query 7-9: Combination and Inference

Combination queries: combine (or join) information not explicitly connected (e.g., different sources or substructures)
Query 7: “Combine the information about the book titled ‘The Civil War’ and authored by ‘Julius Caesar’ with the information about the book with identifier bellum_civile.”
Inference queries: also combine data, but use that to infer previously not explicit data
Query 9: “Return the co-author relation between two persons that stand in author relationships with the same book.”
- If the books entitled ‘Bellum Civile’ and ‘The Civil War’ are the same book, and if ‘Julius Caesar’ is an author of ‘Bellum Civile’, then ‘Julius Caesar’ is also an author of ‘The Civil War’.
- RDF/S entailment based on inference, e.g.,
  Query 8: “Return the transitive closure of the subClassOf relation.”

3 The RDF Query Language Families

Overview

Relational Query Languages
1. SPARQL
2. RQL
3. Triple
4. Xcerpt
Reactive Rule Query Query Languages
1. Algae
Navigational Access Query Languages
1. SPARQL

Chronological Overview

Chronological overview of RDF query languages: languages are clustered as in the following sections and influences between them are graphically depicated.

Figure 2: Chronological Overview of RDF Query Languages

bold: covered in this survey
italic: non-RDF (mostly XML) query languages with proposals/extensions for querying RDF
MetaLog's unique approach to RDF querying based on a natural language interface defies classification in this framework)
N3QL could not be classified due to incomplete description

Relational Query Languages:
SPARQL, RQL, TRIPLE, and Xcerpt

Basic query constructs similar to relational selection-projection-join (SPJ) queries
But: different presentation of complex queries:
- SPARQL/RQL: conjunctive n-ary queries over RDF triples;
  variables are used to correlate properties of the same resource and related resources
- TRIPLE/Xcerpt: (structural) patterns outlining the shape of the sought-for data items;
  structure (nesting) is used to correlate properties of the same resource and related resources
Languages vary from
- conservative extensions of SPJ queries to
- languages with a substantial number of novel contstructs aiming at more adequat support for RDF specificities

The SPARQL Family

SPARQL (for SPARQL Protocol And RDF Query Language) [84]:
W3C candidate recommendation since April 2006
Influenced by SquishQL [76] and RDQL [91], slightly from SeRQL [22] and RQL [57]
Syntactic resemblance to (basic) SQL and the Turtle [22] RDF syntax
SPARQL in four points:
- Graph patterns as conjunctions of RDF triples in WHERE clause
- Variables used for selection, joins, and complex conditions
- SELECT defines the answer variables of the query
- CONSTRUCT allows alternatively to specify a graph pattern that is instantiated against variable bindings

PREFIX books: <http://example.org/books#>
SELECT ?essay ?author ?authorName
FROM   <http://example.org/books>
WHERE  { ?essay rdf:type books:Essay .
         ?essay books:author ?author .
         ?author books:name ?authorName . }

SPARQL—Basic Constructs and Syntax

Graph Pattern Syntax

Ground pattern are in the Turtle [22] RDF serialization format
Predicate-object lists share subject over several triples (predicates and objects separated by ;)
Object lists several triples sharing both subject and predicate (objects separated by ,)
Variables marked using ? or $ prefix

SELECT-FROM-WHERE Clause

SELECT clause specifies list of answer variables
alternatively CONSTRUCT specifies graph pattern instantiated against answer variable bindings
FROM specifies the URL(s) of the data graph(s) to be queried
WHERE specifies the graph pattern

SPARQL—Conditions, `FILTER` Clause

FILTER contains conditional expression over query variables
Similar to SQL's or XQuery's WHERE, but
- Joins are usually implicit (multiple variable occurrence) in triple patterns
- Literals and b-nodes can be inlined into triple patterns
Types of conditional operators:
- typed boolean comparators (=, <, ...)
- instance tests for all RDF node kinds
- regular expressions on Strings or URIs

PREFIX books: <http://example.org/books#>
SELECT ?person
FROM   <http://example.org/books>
WHERE  { ?book books:author ?person .
         ?book books:title ?title .
         FILTER (?title = 'Bellum Civile') }

SPARQL—Optional Triples

Optional data := reported in answer if present, but presence not required
- similar to SQL (left) outer join
E.g., “find all books and report in addition their translators, if they have any”

What is the meaning to the following query?

SELECT   ?writing ?translator ?translator-name
FROM     <http://example.org/books>
WHERE    { ?writing books:author _:Author . 
           OPTIONAL { ?writing books:translator ?translator } .
           OPTIONAL { ?translator foaf:name ?translator-name } .

Optional is left-associative
Solution for A ∧ OPTIONAL B is a solution of either A ∧ B or of A ∧ ¬B

“Find me all writings that have an author and return also their translator and its name if they have an translator. If they have no translator, return all pairs of subjects and objects in a triple with predicate foaf:name.”

SPARQL—Optional and Negation

Semantics of OPTIONAL uses negatio
Consequently, we can use OPTIONAL to implement the otherwise missing negation in SPARQL

PREFIX   books: <http://example.org/books#>
SELECT   ?writing
FROM     <http://example.org/books>
WHERE    { ?writing books:author _:Author . 
           OPTIONAL { ?writing books:translator ?translator } .
           FILTER (!bound(?translator)) }

“Return all resources with an author that have no translator”, on which optional triple yields no ?translator bindings

Semantics is (obviously) negation-as-failure
Despite SPARQL's UNION, OPTIONAL can not be rewritten to a UNION as common from SQL or XQuery due to the lack of negation as first-class concept
Is there a justification for this ‘implicit’ negation?

SPARQL—Graph Construction

CONSTRUCT clause specifies graph template (syntactically like graph pattern)
- instantiated against solutions to the remainder of the query
- variable occurrences are replaced with bindings for each solution tuple
- for optional variable x triple patterns with x are omitted if x is not bound in the solution
- B-nodes are instantiated separately for each solution
Range restriction: all variables in CONSTRUCT must also occur in remainder of query

CONSTRUCT {?x books:co-author ?y}
FROM      <http://example.org/books>
WHERE     { ?book books:author ?x .
            ?book books:author ?y .
            FILTER (?x != ?y) }

Neither grouping nor aggregation is supported
- no arbitrary container or collection construction
- no construction of RDF representations of n-ary relations

SPARQL—Extraction Queries

Extraction queries only expressible if the result extent is of fixed size
Extraction of subgraphs with unknown extent impossible
Approximation of Query 2 with only the direct properties:

PREFIX books: <http://example.org/books#>
SELECT ?essay ?property ?propertyValue
FROM   <http://example.org/books>
WHERE  {?essay books:title "Bellum Civile" .
        OPTIONAL { ?essay ?property ?propertyValue } }

Notice: no syntactic separation between property and node variables as, e.g., in RQL
DESCRIBE: specialized form of extraction query
- retrieval of resource "descriptions"
- semantics is undefined, but cf. concise bounded descriptions [93]

SPARQL—Named Graphs

Named graph := access to multiple graphs at the same SPARQL "endpoint"
Graphs are identified by IRIs and thus can be subject of statements
Introduced in TriQL [16]
Provides a scoping mechanism for SPARQL variables and negation
Triple patterns can be restricted to solutions from a specific named graph
- in WHERE clause using GRAPH keyword
SPARQL query matched against an RDF dataset consisting in
- a default graph (e.g., from the merge of all graphs specified in FROM clauses)
- a set of (IRI, graph) pairs (e.g., from FROM NAMED clauses)

Limitations of the SPARQL Family

Queries cannot be composed or nested
Neither aggregation nor grouping expressible
- no arbitrary container or collection construction
- no construction of RDF representations of n-ary relations
Negation only in FILTER clauses using OPTIONAL variables
Neither recursion nor arbitrary-length traversal operators
- no transitive-closure type inference
- no extraction of subgraphs of unknown extent
→ Relational spirit, mostly a subset of SQL on a single ternary relation
- + likely easy to use and learn (except for some remaining "quirks")
- – some limitations seem difficult to justify and comprehend
- + almost trivial to implement on top of relational DBS
- – surprisingly weak expressiveness

The RQL Family

(Early) RDF query languages: RQL [55] and SeRQL [20], only RQL in the following
Focus: combination of data and schema querying
Clear separation of three layers in RDF/S:
- data—schemas—meta-schemas specifying meta-classes such as rdfs:Class
Price: slightly non-standard data model, viz. no cycles in subsumption hierarchy

SELECT X, Y FROM {X;books:Essay}books:author.books:authorName{Y}, 
                 {X}books:title{T}
WHERE  T = "Bellum Civile"
USING  NAMESPACE books = &http://example.org/books#

SQL-style syntax as SPARQL, but
- FROM contains triple patterns, no literals
- WHERE additional conditions (FILTER in SPARQL) including literal restrictions
- Basic path expressions adorned with variables

RQL—Basic Schema Queries

Schema queries: query relations between schema or meta-schema elements
- e.g., subClassOf(books:Writing) retrieves sub-classes of books:Writing
- e.g., topclass(books:Historical_Essay) returns top-level of subsumption hierarchy
- e.g., SELECT X, Y FROM Class{X}, subClassOf(X){Y} for Query 8

Domain and range of author property

using class variables:

SELECT $C1, $C2  FROM {$C1}books:author{$C2}

using a type constraint:

SELECT C1, C2  FROM Class{C1}, Class{C2}, {;C1}books:author{;C2}

without class variables or type constraints:

SELECT C1, C2  FROM subClassOf(domain(book:author)){C1}, 
                    subClassOf(range(books:author)){C2}

nearly equivalent except the last only returns proper subclasses

RQL—Data Queries

As in SPARQL: Triple (here called graph) patterns specify the shape of the data
Additionally: basic path expressions adorned with variables, e.g., Query 1:
```
SELECT X, Y, Z FROM {X;books:Essay}books:author{Y}.books:authorName{Z}
USING  NAMESPACE books = &http://example.org/books#
```
- {X;books:Essay} limits bindings for X to type books:Essay

Conditional expressions in WHERE clauses (even for basic literals)

SELECT X, Y FROM {X;books:Essay}books:author.books:authorName{Y}, 
                 {X}books:title{T}
WHERE  T = "Bellum Civile"
USING  NAMESPACE books = &http://example.org/books#

Extent queries: for classes books:Writing{X} and properties books:author
- both also return resources in sub-classes resp. -properties
- direct extent can be queried by prefixing with ^, e.g., ^books:Writing{X}

RQL—Mixed Schema and Data Queries

Data and schema queries can be mixed in all manners
Return a “Description” of a resource:

SELECT $C, ( SELECT @P, Y  FROM {Z ; ^$D} ^@P {Y}
             WHERE  Z = X and $D = $C )
FROM   ^$C {X}, {X}books:title{T}  WHERE T = "Bellum Civile"
USING  NAMESPACE books = &http://example.org/books#

All classes the resource is classified together with properties and ranges associated with it
Example for a grouping query expressed by query nesting
Property variables: used to query properties, prefixed by @

SELECT @P, $V  FROM {;books:Writing}@P{$V}
USING  NAMESPACE books = &http://example.org/books#

RQL—Sample Queries

Query 1 see above, Query 2 not expressible (no extraction of subgraphs of unknown extent)
Reduction queries like Query 3

SELECT S, @P, O
FROM   (Resources minus (SELECT T FROM {B}books:translator{T})){S}, 
       (Resources minus (SELECT T FROM {B}books:translator{T})){O},
       {S}@P{O}

Aggregation queries like Query 5

max(SELECT Y 
    FROM   {B;books:Writing}books:author.books:authorName{A},
           {B}books:pubYear{Y}
    WHERE  A = "Julius Caesar")

Inference queries without recursion like Query 9 in RVL [66]

CREATE NAMESPACE mybooks = &http://example.org/books-rdfs-extension#
VIEW   mybooks:co-author(A1, A2)
FROM   {Z}books:author{A1}, {Z}books:author{A2}  WHERE A1 != A2

RQL—Limitations and Critique

+ far more expressive than most SQL-style RDF query languages such as SPARQL or RDQL
- only transitive closure of arbitrary relations missing to implement all sample queries
+ Strong type system and expressive schema queries
– Criticized for large number of features and choice of syntactic constructs
- SeRQL [22]: more accessible derivate of RQL
  - additional syntactic shorthands (optional, object-property and object lists)
  - drops most support for typing to reduce complexity
  - drops support for many advanced query constructs for set operations, quantification, aggregation, etc.
- eRQL [*]: even more radical simplification
  - Information retrieval-style keyword interface
– Original proposal lacked graph construction ability, but rectified by RVL

TRIPLE

TRIPLE [51, 92] first rule-based query language for RDF
Syntax and semantics close to F-Logic [58], convenient for querying irregular data
- Other approaches based on F-Logic: XPathLog [75], Ontobroker [*]
TRIPLE designed to address two weaknesses of previous approaches:
- just predefined constructs to express RDF/S's semantics (lack of extensibility)
- lack of formal semantics
Horn logic rules in F-Logic syntax
- used to implement, e.g., RDF/S entailment
- inherits much of Logic Programming's formal semantics
All sample queries can be (more or less obviously) expressed in TRIPLE

TRIPLE—Comparison to Logic Programming

TRIPLE supports resources identified by URIs
RDF statements are represented in TRIPLE by slots
- allows grouping and nesting of statements
- path expressions inspired from [43] for traversal of several properties

Concise support for reified statements (enclosed in angle brackets <·>):

Julius_Caesar[believes
    →<Junius_Brutus[friend-of → Julius_Caesar]>]

Module notion to specify a ‘model’ in which a statement, or an atom, is true
Explicit quantification of all variables.

rdf   := 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'.
books := 'http://example.org/books#'.
FORALL B, A, AN  result(B, A, AN) ← 
     B[rdf:type → books:Essay; 
       books:author → A[books:authorName -> AN]]@'http://example.org/books'.

TRIPLE—RDF/S Semantics through Rules

Sets of TRIPLE rules to implement, e.g., RDF/S entailment
[93] gives the following rules to implement RDF/S semantics:

rdf     := 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'. 
rdfs    := 'http://www.w3.org/2000/01/rdf-schema#'. 
FORALL Mdl @rdfschema(Mdl) {
  transitive(subPropertyOf).    transitive(subClassOf). 
  FORALL O,P,V O[P→V] ← O[P→V]@Mdl. 
  FORALL O,P,V O[P→V] ← 
               EXISTS S S[rdfs:subPropertyOf→P] AND O[S→V]. 
  FORALL O,P,V O[P→V] ← 
               transitive(P) AND EXISTS W (O[P→W] AND W[P→V]). 
  FORALL O,T   O[rdf:type→T] ← 
               EXISTS S (S[rdfs:subClassOf→T] AND O[rdf:type→S]). }

Together with the following rules the full entailment is supported in TRIPLE:

  FORALL S,T  S[type→T] ←
              EXISTS P, O (S[P→O] AND P[rdfs:domain→T]).
  FORALL O,T  O[type→T] ←
              EXISTS P, S (S[P→O] AND P[rdfs:range→T]).

TRIPLE—Limitations and Critique

+ Rules and views give strong reasoning capabilities
+ High expressiveness through path expressions and rules
+ Extensible and modular entailment semantics
o F-Logic syntax unfamiliar but well suited for irregular data
– Scalability of existing implementations very limited
– Implementation on top of existing DB technology far from obvious
– Support for non-RDF data very limited or awkward

Xcerpt

Xcerpt [85, 86] is a versatile, pattern-oriented, rule-based query language for the (Semantic) Web
Versatile: access to semi-structured data in different formats (e.g., XML, HTML, RDF, Topic Maps) in the same query program with common language constructs
Pattern-oriented: Xcerpt's core query construct
- QBE-style patterns (or examples) of data with variables to indicate sought-for data items or joins
- may contain (regular) path traversal expressions
Rule-based: Xcerpt uses rules to implement reasoning as well as abstract and group common query tasks
Incomplete information: Xcerpt's patterns and constructs are tailored to queries where only parts of the schema are known or required, e.g., optional construct
Aims to couple high expressiveness with principled and intuitive query syntax and semantics
- all sample queries can be expressed in Xcerpt

Xcerpt—Relational (Triple) View

Two views [17] on RDF data in Xcerpt
- plain triples with explicit join for structure traversal
- proper graph with structure traversal following shape of pattern or traversal expressions
Query 1 on Xcerpt's triple view: conjunction of triple patterns

GOAL
  result [
    all essay [
      id [ var Essay ], 
      all author [ 
        id [ var Author ], all name [ var AuthorName ]
      ] ] ]
FROM
  and(
    RDFS-TRIPLE [ var Essay, rdf:type, books:Essay ],
    RDF-TRIPLE  [ var Essay, books:author, var Author ],
    RDF-TRIPLE  [ var Author, books:authorName, var AuthorName ] )
END

RDFS-TRIPLE instead of RDF-TRIPLE: use of view over RDF/S entailment graph
Construction: arbitrary RDF or XML data
Provenance information may be queried as well (cf. named graphs, quads)

Xcerpt—Graph View

Query 1 on Xcerpt's RDF graph view: structure of query approx. structure of data

GOAL
  result [
    all essay [
      id [ var Essay ], 
      all author [ 
        id [ var Author ], all name [ var AuthorName ]
      ] ] ]
FROM
  RDFS-GRAPH {{ 
    var Essay {{
      rdf:type {{ books:Essay }},
      books:author {{
        var Author {{ 
          books:name {{ var AuthorName }} }}
      }} }} }}
END

Structural joins are hidden (implicit conjunction)
- similar to path expressions but extended to tree or graph patterns
Triples are represented similar to striped RDF/XML
- as Xcerpt does not allow labeled edges
- extension of Xcerpt with labeled edges is under consideration

Xcerpt—Rules and Versatility

Xcerpt, like TRIPLE, uses deductive rules
- to realize "procedural abstraction" like views in SQL
- to provide reasoning capabilities, such as RDF/S entailment
- to provide versatile, "serialization transparent" data access
Example: extract all triples from an RXR [4] document

CONSTRUCT
  RDF-TRIPLE[ var Subject, var Predicate:uri{}, var Object ] 
FROM
  and[
    rxr:graph {{
      rxr:triple {
        var S as rxr:subject{{}},
        rxr:predicate{ attributes{ rxr:uri{ var Predicate } } },
        var O as rxr:object{{}}
      }
    }},
    rxr:NODE2URI[ var S, var Subject ], rxr:NODE2URI[ var O, var Object ] ]
END

NODE2URI maps different resource representations to URIs
RDF access in Xcerpt is subject of ongoing investigation under the lead of Benedikt Linse

Xcerpt—Visual Querying with visXcerpt

visXcerpt [11, 12] is Xcerpt's visual companion language
Visual rendering of Xcerpt programs using (mostly) CSS rules
Visual editing of programs using AJAX-style template-based editor

Xcerpt—Visual Querying with visXcerpt, Example 1

Screenshot and explanation
of visXcerpt, visual companion language for Xcerpt

Xcerpt—Visual Querying with visXcerpt, Example 2

Second part of screenshot and explanation of visXcerpt, visual companion language for Xcerpt

Xcerpt—Limitations and Critique

+ High expressiveness, very powerful, yet concise query constructs
+ Strong rule-based foundation
- convenient versatile data access & reasoning for, e.g., RDF/S entailment similar to TRIPLE
+ Dual view of RDF concise formulation of structure- and value-dominated queries
+ Rich set of constructs specialized to queries in heterogeneous environments with incomplete information about semi-structured data
- OPTIONAL as in SPARQL but with stronger semantics
- (regular) path expressions and partial terms for incompleteness in depth and breadth
- ordered and unordered queries
o unique syntax aims to be intuitive and consistent → non-standard look-and-feel
– challenging implementation due to high expressiveness and misalignment of patterns and simulation unification with existing DB technology
– implementations still just prototypes (limited scalability, limited robustness)

Reactive Rule Query Languages

Algae: Reactive Rules

Algae [83] reactive rule query language
Developed as part of the W3C Annotea project
- on enhancing Web pages with semantic annotations expressed in RDF
Two core concepts: actions and (proofs as) answers
- "Actions" are the directives ask, assert, and fwrule for querying, insertion, and ECA rules
- Answers are (a) bindings for query variables plus (b) triples from the RDF graph that form justification or proof of the answer
Syntax based on N-triples [46] with some extensions from full N3 [14] and
- syntax for above mentioned actions (expression in round parentheses)
- syntax for non-equality constraints (enclosed in curly baces)
```
ask (?essay books:year ?year {?year >= 62 && ?year < 301} .)
```

read <http://example.org/books> ()
ask (    ?essay  rdf:type         <http://example.org/books#Essay> .
         ?essay  books:author     ?author .
         ?author books:authorName ?authorName )
collect( ?essay, ?author, ?authorName )

Algae—Optional

Add to last query “also return translators, if there are any”
~ used to declare `translator' triple optional

ask (    ?essay rdf:type          <http://example.org/books#Essay> .
         ?essay books:author      ?author .
         ?author books:authorName  ``Julius Caesar'' .
         ?essay books:title       ?title .
         ~?essay books:translator ?translator .  )
collect( ?title, ?translator )

`?title`	`?translator`	Proof
"Bellum Civile"	"J. M. Carter"	_:1 rdf:type <http://exam...ks-rdfs#Essay>. _:1 books:author _:2. _:2 books:authorName ``Julius Caesar''. _:1 books:title ``Bellum Civile''. _:1 books:translator ``J. M. Carter''.

Algae—Limitations and Critique

+ ECA-rules and insertion for production rule-style reasoning
+ Selection of optional data items
+ N3-triples based syntax well-established in RDF community
– No selection or extraction of arbitrary extent subgraphs
– No graph construction, no aggregation, no grouping
– Very limited implementations

Other RDF reactive rules query languages:
- iTQL [2] in Kowari Metastore, RUL [2], an update extension of RQL

Navigational Access Query Languages

Versa: Navigational Access

Versa [77,78,79] navigational access query language inspired by XPath
Part of the Python-based 4Suite XML and RDF toolkit
Can be used as pattern and selection language in 4Suite's XSLT
Centered around traversal expressions:
- forward traversal follows RDF properties, e.g., all() - books:author -> *
  - starting from resources, selected by resource functions such as all(), type()
  - ending in filter expressions over resources and literals
- backward traversal follows RDF properties in inverse direction all() <- rdf:type - *
- general traversal function traverse traverse in specified direction even transitively, e.g.,
```
traverse(books:Writing, rdf:subClassOf, vtrav:inverse, vtrav:transitive)
```
Filter traversal expressions like in XPath for testing without changing of context
- only forward filter
- type(books:Essay) |-books:title-> eq("Bellum Civile") selects essays
- type(books:Essay) -books:title-> eq("Bellum Civile") selects title literals

Versa—Higher-order Functions I: Iteration

Higher-order functions form expressive core of Versa
iteration
- over independent expressions: distribute()
  - evaluate second to n-th argument in context of each result of evaluating the first argument
```
distribute(type(books:Essay), ".", 
  "distribute(.-books:author->*, ".", ".-books:authorName->*)")
```
  - as in XPath . indicates the context node
- over conjuncts: filter()
  - for each result x of the first argument all remaining arguments are evaluated as boolean expressions with x as context and x is retained if all expressions evaluate to true
```
filter(books:Essay <- rdf:type - *, 
  ". - books:title -> eq('Bellum Gallicum')", 
     ". - books:translator -> books:translatorName -> eq('J. M. Carter')"
```

Versa—Higher-order Functions II: Aggregation and Sets

Grouping and aggregation

e.g., Query 5, last year an author with name "Julius Caesar" published

max(filter(all(), 
     ". - books:author -> books:authorName -> eq('Julius Caesar')" ) 
   - books:year -> *)

e.g., Query 6, each sublass of Writing together with average number of authors per publication

distribute(traverse(books:Writing, rdf:subClassOf, 
                   vtrav:inverse,vtrav:transitive), 
          ".", 
          "avg(length((. <- rdf:type *) - books:author -> *))" )

Set functions (sorting, map, filter)

e.g., to extract all ontology information:

difference(all(), 
   union(type(rdfs:Class), 
         union(type(rdf:Property,
               all() <- books:translator - *) ) ) )

Versa—Limitations and Critique

+ High expressiveness due to arbitrary traversal expressions and higher-order functions
+ Convenient and highly expressive traversal expressions
– Lack of procedural abstraction mechanism (embedding in host language may consider that)
– Very unfamiliar syntax both in comparison to XPath and to SQL-type languages

Navigational access query languages: mostly proposals of limited and immature nature
- RDF Path [80], RPath [74], RxPath [94], RDFT [38], [62]
Proposals using XML query languages on canonized XML representations of RDF
- [86, 87], TreeHugger [95], and RDF Twig [97]

4 Language Constructs Compared

What you should have learned until now: Languages

Each language in detail with all its constructs (vertical perspective)
Impression of their diversity
Rough classification of approaches

What is coming up: Constructs

Each construct in detail over all languages (horizontal perspective)
Constructs essential and pertinent to RDF querying
Better understanding of core issues in RDF language design and choice

Selection

Selection := ability to characterize
- subsets of the queried data that match the user's query intent
Basic functionality of any query language
Relational data:
- known schema, thus selection via attribute values and relations to other data items
Semi-structured data:
- centered around position of the sought-for data items in the structure (~ “structural relations”)
→ richer selection constructs.

Triple Patterns vs. Path expressions

Triple patterns

Basic form of selection construct
Roughly similar to relational selection-(projection-)join query
Consists of a conjunction of one or more query triples: like data triples
- but extended with query constructs such as variables

Ground triple patterns (~ relational selection) in SPARQL

  ?essay books:title "Bellum Civile"

General triple patterns (~ selection-join query) in SPARQL

  ?essay books:author ?author. 
  ?author foaf:name "Julius Caesar"

Path Expressions

Observation: frequent structural joins occur in both XML and RDF data
- "traverse" and "correlate" nodes in the RDF graph to select the actually used data items
Succinct specification core issue of access to tree and graph
- omitting (existentially quantified) intermediary variables
- dot-notation in the context of relational (GEM [100]), and object-oriented data (OQL [29])
- path expressions in the context of object-oriented and semi-structured data (XPath [33]).
All the following classes of XPath also exhibit polynomial time complexity

Path-like omission of intermediary b-nodes in SPARQL

  ?essay books:author [ foaf:name "Julius Caesar" ].

Path expressions in RQL

  {Essay}books:author.foaf:name{A}.

Classes of Path Expressions

Basic path expressions

Purely abbreviations for triple patterns as seen in SPARQL or RQL
Only fixed length traversals, no added expressiveness over triple patterns only
Languages: GEM [100], OQL [29], SPARQL [84], RQL [57]

Unrestricted closure path expressions

Arbitrary-length unrestricted traversal, i.e., over any nodes and edges
Expressible with linear recursive views
Infrequent in RDF: lack of dominating hierarchical relation as in XML
Languages: XPath [33], Xcerpt desc [88]

Classes of Path Expressions (cont.)

Generalized or regular path expressions

Full regular expressions over path traversals (repetition, alternatives)
E.g.., a*.((b|c).e)+ traverses all paths of
- arbitrary many a properties followed by
- at least one repetition of either a b or a c
- in each case followed by an e
Polynomial time complexity w.r.t. data and query size
Expressible with recursive views but result excrutiatingly complex
Languages: Versa [79], Lorel, Xcerpt qualified desc

Classes of Path Expressions (cont.)

Diagram representation of classes of path expressions organized around the type of path traversal expressions and the use of variables in the expressions.

Classes of Path Expressions (cont.)

Summary

Path expressions convenient and succinct for traversals
Surprisingly few RDF languages consider closure path traversal
Essential for efficient and convenient expression of many queries
- "Incomplete" information on connection paths
- Variable connection paths

Closure Subgraph Extraction

Schema of RDF data often only vaguely known
→ extent of interesting portions of data often not known statically
- e.g., extract all information on one given book
Solvable in languages providing closure path expressions
- regular path expressions often needed
- e.g., to extract all related ontology information (connected using only certain relations)

Constructs for Closure Subgraph Extraction

Needed in languages with only triple patterns and basic path expressions
Built in closure for certain predefined relations
- e.g., RQL, see next slide
Built in closure subgraph extraction construct
- concise bounded descriptions (CBDs) of RDF resources
  - immediate properties and all properties reachable with only blank resources in between
- SPARQL DESCRIBE
  - relevant and representative information about resources
  - semantic not specified in the language
  - CBDs one possible semantics

Schema-aware Selection

RQL: matching against RDF entailment graph
- but acyclic subsumption hierarchy
- only subset of axiomatic triples to ensure finite answers
- entailment is coded into the query engine
Triple, Xcerpt: use rules for RDFS entailment
- configurable set of entailment rules
- Triple allows also external "models"
SPARQL/RDQL/SeRQL: some implementations choose built in entailment
- no provisions in the query language
- no distinction between "materialized" and "inferred" triples

Optional Selection and Disjunctions

So far: selection purely conjunctive queries
This section: disjunction or equivalent union constructs
- “to find colleagues of a researcher from an RDF graph containing bibliography and conference information”, one might choose to select co-authors, as well as co-editors, and members in the same program committee.
Disjunction more common place on RDF than on relational data
- all properties are by default optional
- many queries retrieve in addition to core properties optional data items to be reported

Examples of Optional Queries

SPARQL Query with Optional

  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?translator } }

Equivalent SPARQL Query with Union

  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
             ?writing books:translator ?translator } 
           UNION
           { ?writing a books:Essay }

Slight difference in semantics: books with at least one translator are still reported also once with empty binding for ?translator

Semantics for Optionals

A ∧ optional B ∧ optional C

How to treat multiple optional parts?
- if optionals are independent → order irrelevant
- if optionals are dependent → order has effect:
  match for B might prevent match for C and vice versa
Three different semantics for interdependent optionals:
- Independent treatment of optionals
- Maximized optionals
- All-or-nothing optionals

Independent Treatment of Optionals

Impose order on optional clauses to resolve interdependencies
SPARQL uses the lexical order of the optional clauses

Interdependent Optionals in SPARQL

The following query selects essays together with translators and, if that translator is also an author, also the author name.

  SELECT   ?writing, ?person, ?name
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?person } 
           OPTIONAL { ?writing books:author ?person . 
                      ?person foaf:name ?name } }

Switching order of the optionals changes semantics → second optional superfluous
Select all essays together with authors and author names (if there are any).

Independent Treatment of Optionals (cont.)

Lexical order of interdependent optionals equivalent to nested optionals

  SELECT   ?writing, ?person, ?name
  WHERE    { ?writing a books:Essay . 
           OPTIONAL { ?writing books:translator ?person 
              OPTIONAL { ?writing books:author ?person . 
                         ?person foaf:name ?name } 
           } }

Only applies to interdependent optionals not to independent ones

Union Semantics for Optionals

UNION can be used to express OPTIONAL

  SELECT   ?writing, ?translator
  WHERE    { ?writing a books:Essay . 
             ?writing books:translator ?translator } 
           UNION
           { ?writing a books:Essay }

Semantics often different as with "real" optional:
- includes additional null-values bindings for ?translator even if a translator exists
Can be used in SPARQL, SQL, RDQL, and similar languages

Maximized optionals

Considers any order of optionals
- e.g., first binds translators, than check whether they are also authors
- or first binds authors and author names and then checks whether the authors are also translators
Returns only results where maximal subset of optional variables are bound
More involved than the other semantics
No increase in complexity as interdependent optionals are already NP-complete
Equivalent to rewriting of otional to disjunctions with negated clauses
- A ∧ optional B ∧ optional C equivalent to
- (A ∧ ¬ B ∧ ¬ C) ∨ (A ∧ ¬ B ∧ C) ∨ (A ∧ B ∧ ¬ C) ∨ (A ∧ B ∧ C)
Introduced in and for Xcerpt

All-or-nothing Optionals

Rare case of optional
Either all optional parts or non are matched
Expressible using a single optional clause over conjunctions and disjunctions
Can be achieved, e.g., in SPARQL and Xcerpt

RDF Specificities

So far: general issues for semi-structured query languages
Now: Adequacy of RDF query languages by looking at its specificities

Blank Nodes

For matching: (skolemized) blank nodes just like any other resource
For results: special consideration, see below

Reification

Just syntactical and representational conventions, no specific semantics
Still syntax in query language convenient
SeRQL and Triple, e.g., provide such syntax

Selection of Collection and Containers

Can be handled as merely vocabulary and representational conventions
Evaluation engine may optimize for these specific access patterns

Sequence container ⟨A, B, C⟩ is reduced to:

 _:1 rdf:type rdf:Sequence
 _:1 rdf:_1   A
 _:1 rdf:_2   B
 _:1 rdf:_3   C

Similarly, collections are reduced to binary relations of rdf:first and rdf:last:

 _:1 rdf:first A
 _:1 rdf:rest  _:2
 _:2 rdf:first B
 _:2 rdf:rest  _:3
 _:3 rdf:first C
 _:3 rdf:rest rdf:nil

Selection of Collection and Containers (cont.)

Querying these representations is challenging in many RDF QLs
- e.g., just “to select all members of a container or collection”
In case of a collection: specific construct or regular path expressions
- Most RDF query languages can not express this
- Using regular path expression rdf:first.(rdf:rest.rdf:first)*
In case of a container: specific construct or regular expression over URIs

  SELECT   ?contained_resource
  WHERE    { ?C ?P ?contained_resource . 
            FILTER(regex(str(?P),
              "http://www.w3.org/1999/02/22-rdf-syntax-ns#_\d+")) }

RQL—specialized constructs: R in C to test membership of resource R in container C

Construction

So far: selection constructs
Now: construction, i.e., setting the shape of the result

Overview

Graph Construction vs. Selection-only
Graph Construction
Construction of XML Results

Graph Construction vs. Selection-only

Many RDF query languages are surprisingly not closed
- i.e., can not construct new RDF graphs, e.g., RDQL and Versa
- SPARQL: only limited construction, no grouping
Blank nodes (just "internal" identifiers) as part of result
- no identification from outside, at best merely existential information
Aggregation and grouping support lackluster
- e.g., SPARQL and RDQL lack any such support
- Xcerpt and RQL are one of the few such languages, e.g., in RQL:

SELECT R, count(SELECT @P FROM {R @P }
FROM  {R}books:author{A}
WHERE  A = "Julius Caesar"

Graph Construction

Addresses closure for RDF query languages
Many RDF query languages focus on selection only, e.g., Versa and RDQL
SPARQL provides construction but neither grouping no aggregation

The basic form of graph construction in SPARQL is

CONSTRUCT { ?R ?P ?O }
WHERE     { ?R books:author "Julius Caesar". ?R ?P ?O }

SPARQL's constructions just triple patterns instantiated from one result tuple
- Blank nodes in construct patterns are instantiated for each result tuple separately → no sharing
- → no blank node grouping, e.g., for collections and containers

Collections and Containers

Construction of containers and collections requires
- construction of new blank nodes (identity invention) and
- grouping member resources by container node
Impossible in SPARQL
RQL provides specialized constructs for container and collections
- but not for general identity inventiion
- e.g., n-ary relations
Xcerpt provides identity invention and general grouping

Minimal Result Graphs

Blank nodes may be bound to result variables
- but: blank nodes are local identifiers only
- but: blank nodes represent at best existential information
Assume assignment set
- {(R → ⟨http://w3.org/⟩, P → ⟨ns:director⟩, O → "Tim Berners-Lee"),
  (R → ⟨http://w3.org/⟩, P → ⟨ns:director⟩, O → ⟨_:1⟩)}
Generate one result instance for each assignment tuple?
- but: result instances with blank nodes may be implied by other instances
- e.g.: the second result instance from the above assignment set would be superfluous

Conditional Construction

Shape of constructed graph dependend on variable assignments
But beyond mere instantiation, e.g., inclusion of a subgraph if a variable is bound or has a specific value
Most forms can be expressed as UNION over full queries (Algae, Xcerpt, Triple)
Otherwise specific conditional construction is needed

Unscoped optional construction

Construction parts containing optional variables only instantiated if bindings for all optional variables
Does not allow for static parts to depend on variables
Used, e.g., in SPARQL

Conditional Construction

Scoped optional construction

Allows the scoping of optional variables
Parts of the construction can depend on variables that are not contained in the part
Can be used also to express full conditional construction but becomes rather awkward
Used, e.g., in Xcerpt

Full conditional construction

if ... then ... or case constructs
Parts of the construction depend on arbitrary boolean expressions
One might, e.g., want to add the triple ?P rdf:type my:Teen for persons with ?Age between 12 and 18 and the triple ?P rdf:type my:Adult for older persons.

Construction of XML Results

Need for bridge between RDF queries and XML processing evident
(e.g., W3C RDF use cases)
Versatile query languages:
- allow access and construction to both RDF and XML in the same query
Most RDF languages do not consider this issue
SPARQL provides a static schema for representing answers in XML

Procedural Abstraction

Common feature of both programming and expressive query languages
RDF: Efficient rule layer to implement large scale reasoning essential
Separating querying and (rule) reasoning often infeasible
Several languages offer rules and views for reasoning-aware querying
- deductive rules in Triple and Xcerpt
- reactive rules in Algae
RDFS-awareness can be provided by entailment rules just like other reasoning tasks
Data integration and mediation of heterogeneous data using rules and views

5 Query Evaluation

Methods for RDF query evaluation differ in several aspects:

In memory versus disk query evaluation
Distributed collaborative versus local query evaluation
Triples versus quadruples (s,p,o,c) of subject, predicate, object and so-called context information
Decomposed triple/quadruple storage versus document storage versus in-memory graph storage.
Single statement queries (e.g. (?X, foaf:knows, ?Y)), versus conjunctive queries

Focus of this article: non-distributed evaluation of conjunctive queries on triple/quadruple stores on disks.

Storage of RDF Data

Berkeley database
Relational database engines
RDF storage in object oriented and object relational databases

RDF Storage in Berkeley Databases

According to the directory of the Free Software Foundation¹⁴, the Berkeley Database is

embedded database system
access methods: B+tree, Extended Linear Hashing, records of fixed and variable length, persistent queues
transactional support, recovery, backups, separate access to locking, logging, shared memory caching subsystems
Used by MySQL, subversion, OpenLDAP, KDevelop, etc.

Usage of Berkeley DB in RDF storage

Jena1: Redundant storage in 3 hash tables, using subject, predicate and object as hash keys.
Redland RDF Application Framework
rdfDB
RDFStore

Storage of RDF at the aid of Relational Database engines

Most common implementation
The simplistic approach:

subject predicate object

http://example.com#subj1 http://example.com#pred1 "an example literal"

... ... ...

subject	predicate	object
http://example.com#subj1	http://example.com#pred1	"an example literal"
...	...	...

RDF storage in Jena1

The Jena1 approach: Resource and Literals tables.

Resource ID

http://example.com#subj1 id1

... ...

Literal ID

"an example literal" id1

... ...
- Advantage: very efficient in space
- Disadvantage: Joins are required to retrieve statements

Storage of RDF at the aid of Relational Database engines

Lessons learnt in Jena2

Literals and resources are directly stored in the statement table, unless they supersede a configurable maximum size
Multiple tables for different graphs. Assumption: data from different graphs is rarely accessed together.

Property tables

Subject	foaf:homepage	foaf:nick	...
http://example.com#Miller	http://miller.com/index.html	"Milli"	...
...	...	...	...

Reified statement tables.
- In Jena1: stored as ordinary statements with two additional properties.
- In Jena2: stored within property tables.

Storage of RDF data in 3store

C-library developed at the University of Southampton
MySQL database back-end; Intended for very large data-bases (up to 30 million triples)
Statements-, Literals-, and Resources-tables as in Jena1

Statements table:

model (int64)

subject (int64)

predicate (int64)

object (int64)

literal (boolean)

inferred (boolean)

Model-, URI- and Literals-table:

hash (int64)

model (text)

hash (int64)

uri (text)

hash (int64)

literal (text)

Storage of RDF data in 3store (continued)

Keys for literals and resources i computed by hash-function to gain uniform length to improve indexing and joins
Prototype generated an additional (hash, uri) namespace table for efficient RDF/XML output
Low probability of hash collisions.
Division of the hash space to prevent collisions of homonymous Literals und URLs
additional tables for languages and data types of literals
support for SPARQL querying

RDF Storage in Sesame

Storage back-end for SeRQL. ⇒ Necessity for efficient schema querying and inference
Two different back-ends: Postgres and MySQL
SAIL: a Storage and Inference Layer serving as abstraction mechanism

Postgres back-end

RDFS property and class subsumption hierarchies modelled by sub-table relations among tables
dynamic schema: One table for each property and class.
expensive insertion of rdfs:subClassOf-predicates.
The authors of Sesame expect the dynamic schema to perform better when querying RDF stores.

MySQL back-end

static database schema
special tables for rdf:type, rdfs:subClassOf, rdfs:subPropertyOf

RDF Storage in RDFSuite

Tools for querying, validating and storing RDF data
Supports RQL
Based on Postgres
Schema information kept in separate tables subProperty, subClass, Class, Type
Namespaces stored in separate table for space reasons

Path Based Storage of RDF Data

"Decomposed" triple stores efficient for single statement queries, but inefficient for path queries .
(v₀,v₁), (v₁,v₂), ..., (v_k-1,v_k)
Evaluation of path queries requires joins over the statement table.
Tables for class inheritance (CI), property inheritance (PI), type (T), domain and range (DR), and remaining statements (G).
Interval numbering schemes for CI and PI
Memoization of all possible paths in G within a relational table
#title<#sculpts for two triples with predicates title and sculpts.
Reverse paths, because of wildcards at the beginning of path queries.
Association of resources with paths in a separate table

Path Based Storage of RDF Data (continued)

Performance Comparison with Jena2 (by Matono et. al)

for queries of length 1 and 2, Jena2 prevailed.
for queries of length greater than 3, the path based based approach was faster
exponential growth of space

RDF Storage in Object Databases

Idea: RDF graphs are decomposed to triples for storage in RDBs and must be reconstructed for querying
Solution: storage of RDF graphs as objects in an OODB
Storage without further reorganization
translation of RQL to OQL
Fastobjects as underlying OODB
All edges/vertices are realized as objects, graph is encoded by references
Performance comparison with Sesame on top of Postgres conducted by Bönström et al. show better performance with hybrid and schema queries

Index Structures for RDF

Approaches considered thus far: standard DBMS and libraries
Now: index structures specifically aimed at storing RDF :
- Suffix arrays (Matono et al.)
- Index structures for RDF statements with context information (Harth et al.)

Suffix arrays

Pattern P of length p
Search text M of length m
All suffixes of M are sorted lexicographically
Suffix arrays are stored as the string M and a sequence of indexing points p₁, ..., p_m
p_i, 1 <= i <= m is the position of the ith suffix (in lexicographical order) in M.
All instances of P in M can be found in O(p * log(m))

Index Structures for RDF (Suffix Arrays 1)

Suffix arrays applied to RDF graphs

Assumption: The RDF graph is acyclic
The alphabet Σ is the set of all resources and literals
Representation of the paths by the concatenation of their labels
A path expression: alternation of labels for vertices and arcs
Extraction of all paths from root- to leaf-nodes
root-node: no incoming edges; leaf-node: no outgoing edges
Extraction and sorting of all suffixes for all paths
Suffix array for DAG: (pa₁,po₁), ..., (pa_l,po_l)
pa_i: path number of the ith suffix (in lexicographical order)
po_i: position of the ith suffix within pa_i

Index Structures for RDF (Suffix Arrays 2)

Discussion of Suffix Arrays

Performance Comparisons carried out by Matono et al. show a performance increase by a factor of 2 to 9 depending on the type of the query.
It remains unclear whether one can search paths consisting only of predicate or node labels.

Index Structures for RDF (Quadruples and Substring Searches)

Two major challenges spotted by Hart et al:

Importance of trust and provenance information in an open, distributed environment ⇒ quadruples instead of triples.
Necessity of performing substring searches

Ideas and method of the approach

Trade index space for retrieval time
two indexes: (1) lexicon for substring search, (2) quad indexes
Nodes are represented by object identifiers to reduce index size

(1) The lexicon index

oidnode: OID → String
nodeoid: String → OID
keyword index: Words → OIDs

Index Structures for Quadruples and Substring Searches

(2) Quad Indexes

No	Access Pattern
1	(?:?:?:?)
2	(s:?:?:?)
3	(s:p:?:?)
4	(s:p:o:?)
5	(s:p:o:c)
...	...

spoc	poc	ocs	csp	cp	os
(?:?:?:?)	(?:p:?:?)	(?:?:o:?)	(?:?:?:c)	(?:p:?:c)	(s:?:o:?)
(s:?:?:?)	(?:p:o:?)	(?:?:o:c)	(s:?:?:c)
(s:p:?:?)	(?:p:o:c)	(s:?:o:c)	(s:p:?:c)
(s:p:o:?)
(s:p:o:c)

Schema- and Reasoning-aware RDF Querying

Different Perceptions:

major part of languages: does not distinguish between schema information and ordinary triples
SPARQL: computation of derived facts by underlying graph model

Implementation of RDF/S Semantics

labeling schemes
precomputation of derived facts (forward chaining)
backward chaining

Labeling Schemes for RDF/S

bitvector schemes
prefix schemes
interval schemes

Schema- and Reasoning-aware RDF Querying: Labeling Schemes

Bitvector, Prefix and Interval Labeling Schemes

Schema and Reasoning-aware RDF Querying: Rete Algorithm

RDF Querying

Language Constructs and Evaluation Methods Compared

Abstract (or “What to Expect?”)

Contents

1 Introduction

Query Answering on the Semantic Web

Introduction—Focus of this Article

Introduction—Structure of this Article

Answering Three Questions

2 Preliminaries

Preliminaries—A Brief Introduction to RDF and RDFS

Preliminaries—RDF Schema

Preliminaries—Predefined Properties and Classes in RDF/S

Preliminaries—RDF Serialization

Preliminaries—RDF Serialization

Preliminaries—Classification-Based Book Recommender

Preliminaries—Representation in Turtle RDF

Preliminaries—Semantics

Preliminaries—Sample Queries 1-3

Preliminaries—Sample Query 4: Restructuring

Preliminaries—Sample Query 5-6: Aggregation

Preliminaries—Sample Query 7-9: Combination and Inference

3 The RDF Query Language Families

Overview

Chronological Overview

Relational Query Languages: SPARQL, RQL, TRIPLE, and Xcerpt

The SPARQL Family

SPARQL—Basic Constructs and Syntax

Graph Pattern Syntax

SELECT-FROM-WHERE Clause

SPARQL—Conditions, FILTER Clause

SPARQL—Optional Triples

SPARQL—Optional and Negation

SPARQL—Graph Construction

SPARQL—Extraction Queries

SPARQL—Named Graphs

Limitations of the SPARQL Family

The RQL Family

RQL—Basic Schema Queries

RQL—Data Queries

RQL—Mixed Schema and Data Queries

RQL—Sample Queries

RQL—Limitations and Critique

TRIPLE

TRIPLE—Comparison to Logic Programming

TRIPLE—RDF/S Semantics through Rules

TRIPLE—Limitations and Critique

Xcerpt

Xcerpt—Relational (Triple) View

Xcerpt—Graph View

Xcerpt—Rules and Versatility

Xcerpt—Visual Querying with visXcerpt

Xcerpt—Visual Querying with visXcerpt, Example 1

Xcerpt—Visual Querying with visXcerpt, Example 2

Xcerpt—Limitations and Critique

Reactive Rule Query Languages

Algae: Reactive Rules

Algae—Optional

Algae—Limitations and Critique

Navigational Access Query Languages

Versa: Navigational Access

Versa—Higher-order Functions I: Iteration

Versa—Higher-order Functions II: Aggregation and Sets

Versa—Limitations and Critique

4 Language Constructs Compared

What you should have learned until now: Languages

What is coming up: Constructs

Overview

Selection

Triple Patterns vs. Path expressions

Triple patterns

Ground triple patterns (~ relational selection) in SPARQL

General triple patterns (~ selection-join query) in SPARQL

Path Expressions

Path-like omission of intermediary b-nodes in SPARQL

Path expressions in RQL

Classes of Path Expressions

Basic path expressions

Unrestricted closure path expressions

Classes of Path Expressions (cont.)

Relational Query Languages:
SPARQL, RQL, TRIPLE, and Xcerpt

SPARQL—Conditions, `FILTER` Clause