SPARQL CDTs: Representing and Querying Lists and Maps as RDF Literals

Unofficial Draft

More details about this document
Latest published version:
https://w3id.org/awslabs/neptune/SPARQL-CDTs/spec/latest.html
Latest editor's draft:
https://w3id.org/awslabs/neptune/SPARQL-CDTs/spec/editors_draft.html
History:
Commit history
Test suite:
https://github.com/awslabs/SPARQL-CDTs/tree/main/tests
Editors:
Olaf Hartig (Amazon)
Gregory Todd Williams (Amazon)
Feedback:
GitHub awslabs/SPARQL-CDTs (pull requests, new issue, open issues)

Abstract

This specification defines an approach to represent generic forms of composite values (lists and maps, in particular) as literals in RDF, and corresponding extensions of the SPARQL language. These extensions include an aggregation function to produce such composite values, functions to operate on such composite values in expressions, and a new operator to transform such composite values into their individual components.

Status of This Document

This document is a draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organization.

1. Introduction

1.1 Background and Motivation

This section is non-normative.

The Resource Description Framework (RDF) is a general-purpose framework for representing and exchanging data [RDF11-PRIMER], and the SPARQL language is a declarative language for querying and updating RDF data [SPARQL11-OVERVIEW]. Yet, a feature regarding which RDF and SPARQL lag behind other popular data representation forms such as SQL databases and Property Graphs, and their corresponding query languages, is a built-in support for generic types of composite values such as lists, maps, and sets. Instead, RDF introduces so-called containers and collections, which allow users to model composite values through dedicated vocabulary on top of the core data model. This approach has several drawbacks when compared to the alternative of representing composite values as compact, self-contained objects. Namely, it can easily become verbose and bloat up the storage footprint; moreover, extracting information from such containers and collections in SPARQL queries is cumbersome and can even be tricky (e.g., enumerating the elements of an RDF collection in their given order requires a complex query if the size of the collection is not assumed to be known), and manipulating containers and collections in SPARQL is complex (e.g., inserting a value into an RDF collection at a particular position is all but trivial).

1.2 Structure of this Document (TODO)

This section is non-normative.

In addition to sections marked as non-normative, all examples and notes in this specification are non-normative. Everything else in this specification is normative.

TODO

1.3 Document Conventions (TODO)

This section is non-normative.

Examples in this document assume that the following prefixes have been declared to represent the IRIs shown with them here:

:<http://www.example.org/>
ex:<http://example.org/>
rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
foaf:<http://xmlns.com/foaf/0.1/>
cdt:<http://w3id.org/awslabs/neptune/SPARQL-CDTs/>

1.4 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, and MUST NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Informal Description of the Approach

This section is non-normative.

2.1 Representation of Composite Values as RDF Literals

The basis of the approach is to capture composite values (lists and maps) as RDF literals. The components of such a composite value may be RDF terms, including literals representing scalar values (numbers, strings, etc.) as well as literals representing other composite values. The lexical form (i.e., the string representation) of such a literal that represents a composite value contains the components of the composite value serialized in a format that is based on the RDF Turtle format [TURTLE]. For instance, the literals in the object positions of the following three RDF triples represent lists. Notice that, in the lexical form of these literals, the elements of the lists are separated by commas and enclosed in square brackets.

ex:s ex:p1 "[ 0.504, 0.344, 0.002, 0.716 ]"^^cdt:List .
ex:s ex:p2 "[ <http://example.org/alice>, <http://example.org/bob> ]"^^cdt:List .
ex:s ex:p3 "[ '1999-08-16'^^<http://www.w3.org/2001/XMLSchema#date>, 42 ]"^^cdt:List .

In addition to lists, the approach considers maps, which are sets of key-value pairs where the keys must be distinct. Within the lexical form, the key-values pairs of such a map are separated by commas and a colon is used as separator between each key and its value, as demonstrated by the literal in the object position of the following triple.

ex:s ex:p1 "{ 'name' : 'Tuva', 1: <http://example.org/>, 2: null }"^^cdt:Map .

The previous example also illustrates that, in addition to RDF terms, the approach supports null as a special value to be used within maps, as well as in lists. Moreover, the kinds of elements within each list or map may be mixed arbitrarily (defining strongly-typed variations of the proposed composite datatypes is part of our ongoing work).

As the examples illustrate, list and map elements that are numeric literals (of particular types) can be written using the Turtle-specific shorthand notation (e.g., 0.504 is a shortened version of '0.504'^^<http://www.w3.org/2001/XMLSchema#decimal>), and the same holds for boolean-typed literals. Yet, IRIs cannot be shortened in the ways as are possible in Turtle; instead, within the lexical forms of list and map literals, they must be written as absolute IRIs (which is to avoid loosing the definition of the prefix label or the base IRI when the list and map literals are taken out of context). However, as another form of shorthand notation, our approach allows users to directly nest the lexical forms of lists and maps into one another (but still interpreting such contained lists or maps as literals with the corresponding data type). For instance, the literal in the object position of the following triple represents a map in which both of the key-value pairs have a literal as their value, with one of these literals representing a list and the other one a map.

ex:s ex:p1 """{
                'names' : ['Tuva', 'Ada', 'Eva'],
                'address': {
                    'street': 'Harbor St.',
                    'number': 43,
                }
              }"""^^cdt:Map .

The attentive reader may observe that the lexical form of the literal in the previous example looks like a JSON document, which is not a coincidence. In contrast, our approach is designed such that, by and large, JSON is a subset of what can be used in the lexical forms of the list and map literals (see Section 4.4 Relationship to JSON for further details).

For the definition of the datatypes used for such list and map literals, refer to Sections 3. The cdt:List Datatype and 4. The cdt:Map Datatype, respectively.

2.2 SPARQL Functions on Composite Datatype Literals

Given such composite datatype literals, this document defines extensions to the SPARQL language that introduce functionality related to the types of composite values that these literals capture. The first of these extensions is a collection of various functions for such literals that can be used in expressions within BIND clauses, FILTER clauses, and SELECT clauses of SPARQL queries.

For instance, the following SPARQL query (prefix declarations omitted) uses two such functions in a BIND clause; one of these functions (cdt:concat) concatenates two or more lists, returning the resulting list as a literal again, and the other function (cdt:size) returns the cardinality of the resulting list.

SELECT * WHERE {
  ex:s ex:p1 ?l1 .
  ex:s ex:p2 ?l2 .
  ex:s ex:p3 ?l3 .
  BIND( cdt:size( cdt:concat(?l1,?l2,?l3) ) AS ?combinedLength )
}

When executing this query over the first example data above (the example with the three lists), the value produced for the ?combinedLength variable would be 8.

Further functions for lists that are defined in this document (in addition to cdt:concat and cdt:size, as already used in the previous example) are the following:

In addition to functions for lists, the document defines functions for maps. As an example, consider the following SPARQL query (prefix declarations omitted).

SELECT ?newAddressSubMap WHERE {
  ex:s ex:p1 ?map .
  BIND( cdt:get( ?map, "address" ) AS ?addressSubMap )
  BIND( cdt:put( ?addressSubMap, "number", 42 ) AS ?newAddressSubMap )
}

This query focuses on the earlier example with the JSON-like map (Example 3). By using the cdt:get function, the query obtains the 'address' object of that map and, then, creates and returns a version of this object in which the value of the 'number' field is replaced (using cdt:put). Hence, the result of the query, when executed over the data of Example 3, is a literal such as the following one.

"{ 'street': 'Harbor St.', 'number': 42 }"^^cdt:Map .

In addition to cdt:get and cdt:put, further functions for maps in this document are:

While all the functions mentioned above operate on the composite datatype literals introduced in this document, the document also defines functions to construct such literals. In particular, the cdt:List function can be used to construct a literal that represents a list consisting of the elements that are passed as arguments to the function. For instance, the following query constructs and returns such a literal with the three integers 1, 2, and 3 as elements of the represented list.

SELECT ?list WHERE {
  BIND( 1 AS ?arg )
  BIND( cdt:List(?arg, ?arg+1, ?arg+2) AS ?list )
}

Similarly, the cdt:Map function can be used to construct a literal representing a map of the key-value pairs that are passed as arguments to that function.

For both of these constructor functions, it is possible that some of the given arguments evaluate to an error. For instance, the second and the third argument in the previous query would evaluate to an error if the variable ?arg was bound to an IRI (because the + operator is not defined for IRIs). For any such argument that evaluates to an error, the corresponding element in the created list or map literal becomes the null value, unless it is an argument that specifies the key of a key-value pair for a map literal; in such a case, this key-value pair is ignored because keys cannot be null.

2.3 FOLD Aggregate

Another way to create list and map literals from within SPARQL queries (in addition to the aforementioned two constructor functions, cdt:List and cdt:Map) is to use the FOLD function introduced in this document. As a so-called set function, which are typically used for aggregation, FOLD produces composite values for groups of solution mappings. The following query illustrates how this function can be used to create lists of persons that have the same surname.

SELECT ?sname (FOLD(?person) AS ?list) WHERE {
  ?person rdf:type foaf:Person .
  ?person foaf:surname ?sname .
}
GROUP BY ?sname

Notice that the solution mappings within each of the groups in the previous query are not sorted. As a consequence, the order of the elements in the lists created by the query (i.e., the person IRIs in these lists) is undefined and may even be different each time the query is executed. For use cases in which a particular order is required, the FOLD function can be used with an ORDER BY clause. As an example, consider the following query, which extends the previous one with the requirement that the persons within each list are ordered based on their first names.

SELECT ?sname (FOLD(?person ORDER BY ?fname) AS ?list) WHERE {
  ?person rdf:type foaf:Person .
  ?person foaf:surname ?sname .
  ?person foaf:firstName ?fname .
}
GROUP BY ?sname

The ORDER BY feature of FOLD works exactly as the ORDER BY clause that can be used in SPARQL queries. Hence, it is possible to specify multiple ordering comparators, each with the optional keywords ASC and DESC (where ASC is the default).

In addition to creating list literals, the FOLD function can also be used to create map literals. To this end, two arguments must be passed to the function, where the first argument provides the keys of the key-value pairs to be added to the map, and the second argument provides the corresponding values. The following query illustrates this functionality; it creates a single map literal in which the key-value pairs are surnames paired up with the number of persons in the queried graph that have the respective surname.

SELECT (FOLD(?sname, ?cnt) AS ?map) WHERE {
  {
    SELECT ?sname (COUNT(*) AS ?cnt) WHERE {
      ?person rdf:type foaf:Person .
      ?person foaf:surname ?sname .
    }
    GROUP BY ?sname
  }
}

When using the two-argument version of FOLD, it may be the case that multiple solution mappings within a group to which FOLD is applied result in key-value pairs with the same key. If that is the case, only one of these key-value pairs is added to the map (which one is undefined unless ORDER BY is used within FOLD, in which case it is the last one in the specified order).

2.4 UNFOLD Operator

While FOLD compiles values from multiple solution mappings into a composite value, to support the reverse of this process as well this document adds a new operator to SPARQL. This operator, called UNFOLD, breaks composite values into their individual components and assigns these components separately to a new query variable. The following query illustrates how this operator can be used to extract all elements from each list literal of the triples that match a given triple pattern.

SELECT ?p ?element WHERE {
  ex:s ?p ?list .
  UNFOLD( ?list AS ?element )
}

When executing this query over the first example data above, the query result consists of eight solutions as enumerated in the following tabular representation of the result. Notice that the first four solutions in this table are produced from the four elements of the list in the first triple of the example data; the next two solutions are from the two elements of the list in the second triple; and the last two solutions from the two elements of the list in the third triple.

?p ?element
ex:p1 0.504
ex:p1 0.344
ex:p1 0.002
ex:p1 0.716
ex:p2 http://example.org/alice
ex:p2 http://example.org/bob
ex:p3 '1999-08-16'^^<http://www.w3.org/2001/XMLSchema#date>
ex:p3 42

If a list literal that is given to the UNFOLD operator contains null values, then the variable introduced in the UNFOLD operator remains unbound in the solution mappings created for these null elements. Yet, it is also possible to add a second variable in the UNFOLD operator, which then gets assigned an integer that represents the position of the list element assigned to the first variable. The following query illustrates this feature for a list that contains a null value.

SELECT ?element ?pos WHERE {
  BIND( "[42, null, <http://example.org/>]"^^cdt:List AS ?list ).
  UNFOLD( ?list AS ?element, ?pos )
}

The result of this query is illustrated in the following table. The empty cell in the table indicates that the solution mapping represented by the corresponding table row does not contain a binding for the corresponding variable (?element).

?element ?pos
42 1
2
http://example.org/ 3

The UNFOLD operator can be used for map literals as well. In this case, the one-variable version extracts the keys of the key-value pairs, whereas the two-variables version extracts both the keys (into the first variable) and the their corresponding values (into the second variable). As an example of the latter option, consider the following query.

SELECT ?p ?key ?value WHERE {
  ex:s ?p ?map .
  UNFOLD( ?map AS ?key, ?value )
}

When executing this query over the data of the first map-related example above, the query result consists of the three solutions listed in the following table (where the variable ?value is unbound in the last solution because the value of the corresponding key-value pair in the map is null).

?p ?key ?value
ex:p1 "name" "Tuva"
ex:p1 1 http://example.org/
ex:p1 2

2.5 Comparing Composite Datatype Literals

Another important contribution of this document is to extend the definition of the SPARQL comparison operators (=, !=, <, >, <=, and >=) for list and map literals. The general idea captured by this extension is that two composite values of the same type (i.e., either two lists or two maps) are compared based on a pairwise comparison of the components of the compared composite values.

In particular, two list literals are considered equal (=) if they contain the same number of elements and, for each position within the lists, the elements at that position are equal (=). For instance, "[42]"^^cdt:List and "[42,43]"^^cdt:List are not equal, and neither are "[42]"^^cdt:List and "['42']"^^cdt:List. In contrast, "[42,43]"^^cdt:List and "[ 42 , 43 ]"^^cdt:List are equal, and so are "['1'^^<http://www.w3.org/2001/XMLSchema#integer>]"^^cdt:List and "['001'^^<http://www.w3.org/2001/XMLSchema#integer>]"^^cdt:List.

Similarly, two map literals are considered equal (=) if they contain the same number of key-value pairs, they have the exact same keys, and for each such key, the values associated with that key in both maps compare equal (=). The algorithms that define these two comparison processes in detail can be found in Sections 6.1 list-equal and 6.2 map-equal, respectively.

The definition of whether a list literal is smaller (<) than another list literal is also captured in terms of an algorithm that iterates over the two lists (see Section 6.3 list-less-than). In this case, the iteration stops once it arrives at a position where the respective elements at this position in the two lists do not compare equal (=). At this point, if one of these two elements is smaller (<) than the other one, then the list with this element is considered smaller. If, however, comparing the two elements results in an error or one of them is null and the other one not, then the comparison of the two lists also raises an error. In case the iteration arrives at the end of one of the two lists without stopping, then this list (with fewer elements) is considered smaller.

Finally, to define whether a map literal is smaller than another map literal we provide another algorithm (see Section 6.4 map-less-than). This algorithm iterates over the key-value pairs of both maps, assuming these pairs are ordered based on their keys. Once the iteration arrives at two such pairs that do not have both the same key and equal values, the iteration stops. At this point, the algorithm first compares the keys of these two pairs. If the two keys are not the same, then the map with the key that would be ordered before the other one is considered smaller. If the keys are the same, the decision is made by comparing the corresponding values (in the same way as done by the algorithm for the list literals).

2.6 Ordering Composite Datatype Literals

The last main contribution of this document is to define a relative order of list and map literals, to be used in the context of the ORDER BY clause of SPARQL (and, therefore, also for the aforementioned ORDER BY feature of FOLD). This definition is based on algorithms—one for list literals and one for map literals—that are similar to the respective comparison algorithms as mentioned in the previous section. In particular, these algorithms that define the relative order also iterate over the components of a given pair of composite values; i.e., the elements of two lists (resp. the key-value pairs of two maps) are considered in a pairwise manner. The major difference between these algorithms and the comparison algorithms is that the latter apply the smaller-than operator (<) at each iteration step whereas the former consider the relative ordering of the list elements (resp. key-value pairs) considered at each iteration step.

The effect of this difference can be observed for composite values that contain IRIs or blank nodes. For instance, while the expression

"[<http://example.org/a>]"^^cdt:List  <  "[<http://example.org/b>]"^^cdt:List

yields an error because < is not defined for IRIs, the relative order of the two list literals in this expression is defined (namely based on the relative order of the IRIs they contain).

Similarly, the expression

"[<http://example.org/a>]"^^cdt:List  <  "[42]"^^cdt:List

yields an error because < is not defined between IRIs and (numeric) literals, whereas the relative order of the two list literals in this expression is defined (namely based on the fact that literals are ordered higher than IRIs).

Yet, there are also pairs of list literals and pairs of map literals for which the relative order is undefined. For instance, this is the case for "['hello'@en]"^^cdt:List and "['hello']"^^cdt:List, because the relative order between language-tagged literals and string literals without language tag is undefined in SPARQL.

The complete algorithms that define the relative order for list literals and for map literals can be found in Section 10.1 Relative Order of cdt:List Literals and in Section 10.2 Relative Order of cdt:Map Literals, respectively.

3. The cdt:List Datatype

The cdt:List datatype is the RDF datatype that consists of the value space, the lexical space, and the lexical-to-value mapping defined in this section (see Subsections 3.1 Value Space, 3.2 Lexical Space, and 3.3 Lexical-To-Value Mapping, respectively), and that is denoted by the following datatype IRI:

http://w3id.org/awslabs/neptune/SPARQL-CDTs/List

Every literal that has this IRI as its datatype IRI is called a cdt:List literal. If the lexical form of such a literal is in the lexical space of the cdt:List datatype, as defined in Section 3.2 Lexical Space, then the literal is called a well-formed cdt:List literal. In contrast, a cdt:List literal whose lexical form is not in the lexical space of the cdt:List datatype is called an ill-formed cdt:List literal.

3.1 Value Space

The value space of the cdt:List datatype consists of all finite sequences of list elements, where the notion of a list element is defined as follows:

Any such sequence is called term list, and the empty sequence of list elements is the empty term list.

null is a special symbol that can be used as a list element (and as a map value) and that is not an RDF term.

Note

Every RDF term MAY occur multiple times in a term list, and so may null.

3.2 Lexical Space

The lexical space of the cdt:List datatype consists of all strings that are recognized by the List production of the following grammar (the EBNF notation used here is defined in XML 1.0 [EBNF-NOTATION]).

The grammar uses productions from the grammar of the Turtle serialization format for RDF data [TURTLE]. In particular, the terminal and nonterminal symbols for which no production rule is provided below (i.e., IRIREF, BLANK_NODE_LABEL, NumericLiteral, BooleanLiteral, String, and LANGTAG) are defined in the Turtle grammar.

The RDFLiteral production in the given grammar is special as the Turtle grammar also contains a very similar production for the same symbol (see RDFLiteral in [TURTLE]). The difference is that the version of the rule in the Turtle grammar permits the datatype IRI of a literal to be written as a prefixed name, whereas the version of the rule as defined below does not permit datatype IRIs written as prefixed names (but only as absolute IRIs and as relative IRIs).

An additional restriction that is not explicitly captured in the given grammar is that any occurrence of the IRIREF production MUST—after handling of escape sequences—also match the (more restrictive) absolute-IRI production of [RFC3987]. Hence, IRI references in the lexical form of cdt:List literals and cdt:Map literals MUST be absolute; they cannot be written as prefixed names (see the previous paragraph about the RDFLiteral production) nor as relative IRI references.

[1] List  ::=  '[' ( NonEmptyListContent )? ']'
[2] NonEmptyListContent  ::=  ListElement ( ',' ListElement )*
[3] ListElement  ::=  IRIREF | BLANK_NODE_LABEL | RDFLiteral | NumericLiteral | BooleanLiteral | NULL | List | Map
[4] Map  ::=  '{' ( NonEmptyMapContent )? '}'
[5] NonEmptyMapContent  ::=  MapEntry ( ',' MapEntry )*
[6] MapEntry  ::=  MapKey ':' MapValue
[7] MapKey  ::=  IRIREF | RDFLiteral | NumericLiteral | BooleanLiteral
[8] MapValue  ::=  IRIREF | BLANK_NODE_LABEL | RDFLiteral | NumericLiteral | BooleanLiteral | NULL | List | Map
[9] NULL  ::=  'null'
[128s] RDFLiteral  ::=  String ( LANGTAG | '^^' IRIREF )?
Note

The complete grammar can be found in B.1 Composite Datatypes EBNF Grammar.

3.3 Lexical-To-Value Mapping

The lexical-to-value mapping of the cdt:List datatype is defined for every string that is recognized by the List production of the grammar defined in Section 3.2 Lexical Space.

For the definition of this mapping, we assume an injective function bnl2bn from the set of all strings recognized by the BLANK_NODE_LABEL production of the grammar to the set of all blank nodes. Hence, bnl2bn maps every string recognized by the BLANK_NODE_LABEL production to a distinct blank node.

Then, given a string S that is recognized by the List production of the grammar, the term list that S is mapped to is defined as follows.

If the string S does not contain a substring that is recognized by the NonEmptyListContent production of the grammar, then S is mapped to the empty term list.

Otherwise, let S' be the greatest substring of S that is recognized by the NonEmptyListContent production and let (E1, ..., En) be the sequence of all substrings of S' that are recognized by the ListElement production, in the order in which they appear in S'. Then, S is mapped to the term list tm that consists of n list elements such that, for all i in {1, ..., n}, the i-th element in tm is the following list element, depending on which of the cases of the ListElement production recognizes the substring Ei:

Since producing the term list for S as defined above relies on the capability to construct RDF terms according to the table in Section 7.2 of [TURTLE], it requires a parser that maintains state of the bnodeLabels mapping used for parsing Turtle, as defined in Section 7.1 of [TURTLE].

Note

By the definition of the lexical-to-value mapping of the cdt:List datatype, there can be no term list that contains a cdt:List literal whose literal value is that same term list.

3.4 Relationship to RDF Collections

This section is non-normative.

The RDF Collection vocabulary, as introduced in Section 5.2 of [RDF11-SCHEMA], provides vocabulary terms to describe lists of resources explicitly by using RDF triples. For instance, the triples in the following snippet of Turtle (prefix declarations omitted) describe a list consisting of two resources that are denoted by the IRIs http://example.org/alice and http://example.org/bob, respectively.

ex:book1 ex:authors _:b1 .
_:b1 rdf:first <http://example.org/alice> .
_:b1 rdf:rest _:b2 .
_:b2 rdf:first <http://example.org/bob> .
_:b2 rdf:rest rdf:nil .

Some serialization formats for RDF, including Turtle [TURTLE], provide some form of shorthand notation based on which such lists can be written in a more concise way. For instance, by using the Turtle syntax for collections, the snippet of Turtle as given in the previous example can be shortened as follows.

ex:book1 ex:authors (<http://example.org/alice> <http://example.org/bob>) .

Note that even if the latter snippet of Turtle looks more compact, it still captures the same set of five RDF triples as the snippet of Turtle in Example 14.

Such lists can be converted into cdt:List literals and vice versa. The remainder of this section introduces generic approaches to describe such conversions using the SPARQL language. These approaches are generic in the sense that they do not make any assumptions about the given lists that are to be converted (such as assuming knowledge of the list size or of the absence of duplicate entries).

To convert an RDF collection into a cdt:List literal by using a SPARQL query, this query needs to traverse the corresponding rdf:rest triples using a property path expression, collect the object of each rdf:first triple along the path, and finally combine these objects as list elements of a term list for a cdt:List literal by using the FOLD aggregate defined in Section 11. FOLD. Additionally, to preserve the order of the list elements, grouping needs to be used for each position in the given list and the number of path traversal steps to this position need to be counted. The following query illustrates how this approach can be used in the context of the example data above.

SELECT ( FOLD(?elmt ORDER BY ?index) AS ?cdtList ) WHERE {
  {
    SELECT ?position (COUNT(?step) AS ?index) WHERE {
      ex:book1 ex:authors ?rdfList .
      ?rdfList rdf:rest* ?step .
      ?step rdf:rest* ?position .
    }
    GROUP BY ?position
  }

  ?position rdf:first ?elmt .
}

If the given list is known to be free of duplicates, the query can be slightly simplified:

SELECT ( FOLD(?elmt ORDER BY ?index) AS ?cdtList ) WHERE {
  {
    SELECT ?elmt (COUNT(?step) AS ?index) WHERE {
      ex:book1 ex:authors ?rdfList .
      ?rdfList rdf:rest* ?step .
      ?step rdf:rest* ?position .
      ?position rdf:first ?elmt .
    }
    GROUP BY ?elmt
  }
}

If the order of the list elements does not need to be preserved, an even simpler version of the query without grouping and counting can be used (which is also duplicate preserving in case the given list contains duplicate entries):

SELECT ( FOLD(?elmt) AS ?cdtList ) WHERE {
  ex:book1 ex:authors ?rdfList .
  ?rdfList rdf:rest/rdf:first ?elmt .
}

To convert a cdt:List literal into an RDF collection a CONSTRUCT query needs to be used that produces the relevant RDF triples based on the RDF collection vocabulary. The WHERE clause of such a query needs to use the UNFOLD operator (see Section 12. UNFOLD) to obtain each list element together with its position within the term list represented by the given cdt:List literal. The positions can then be associated with blank nodes to be used for the various resources based on which the structure of the created RDF collection is built (the following query creates a cdt:Map literal as an intermediary structure for this purpose). Special care needs to be taken to correctly close the RDF collection with the term rdf:nil, for which the position of each element needs to be compared with the size of the list (as can be obtained with the cdt:size function defined in Section 7.2 cdt:size). Finally, to properly handle empty lists as well, the whole query pattern for the non-empty lists needs to be placed within an OPTIONAL block and a second OPTIONAL block needs to be added to cover the cases of empty lists. The following query illustrates this approach for all cdt:List literals in the object positions of triples with the IRI ex:authorList as predicate.

CONSTRUCT {
  ?s ex:authors ?firstStep .
  ?thisStep rdf:first ?elmt .
  ?thisStep rdf:rest ?rest .
}
WHERE {
  OPTIONAL { # for the cases of non-empty lists
    {
      SELECT ?s ?list (FOLD(?pos, BNODE()) AS ?listHeads) WHERE {
        ?s ex:authorList ?list .
        FILTER( DATATYPE(?list) = cdt:List )
        UNFOLD( ?list AS ?elmt, ?pos )
      }
      GROUP BY ?s ?list
    }

    UNFOLD( ?list AS ?elmt, ?pos )
    BIND( cdt:get(?listHeads, ?pos) AS ?thisStep )
    BIND( cdt:size(?list) AS ?size )
    BIND( IF( ?pos=?size,
              rdf:nil,
              cdt:get(?listHeads, ?pos+1)
          ) AS ?rest )
    BIND( cdt:get(?listHeads, 1) AS ?firstStep )
  }
  OPTIONAL { # for the cases of empty lists
    ?s ex:authorList ?list .
    FILTER( DATATYPE(?list) = cdt:List )
    FILTER( cdt:size(?list) = 0 )
    BIND( rdf:nil AS ?firstStep )
  }
}

It needs to be emphasized that, for cdt:List literals that represent term lists with null values, the given query creates no rdf:first triples for the corresponding positions in the resulting RDF collection (but the rdf:rest triples for these positions are created by the query). The reason for this is that the UNFOLD operator leaves the variable ?elmt unbound for the positions where a term list has a null value.

Notice also that the query contains two FILTER clauses (one in each OPTIONAL block) that ensure the datatype of the literals bound to the variable ?list is indeed cdt:List. The purpose of these two FILTER clauses is to avoid accidentially creating RDF collections from potentially matching cdt:Map literals (as the UNFOLD operator and the cdt:size function can be applied to such literals as well). For cases in which it is guaranteed, however, that ?list would not be bound to any cdt:Map literal, the query can be slightly simplified by removing these two FILTER clauses.

Another possible simplification (orthogonal to aforementioned one about removing the two FILTER clauses) can be applied if the cdt:List literals that are to be converted are guaranteed to represent only nonempty term lists. In this case, the whole pattern within the first OPTIONAL block does not need to be wrapped within an OPTIONAL block and the second OPTIONAL block can be removed completely.

4. The cdt:Map Datatype

The cdt:Map datatype is the RDF datatype that consists of the value space, the lexical space, and the lexical-to-value mapping defined in this section (see Subsections 4.1 Value Space, 4.2 Lexical Space, and 4.3 Lexical-To-Value Mapping, respectively), and that is denoted by the following datatype IRI:

http://w3id.org/awslabs/neptune/SPARQL-CDTs/Map

Every literal that has this IRI as its datatype IRI is called a cdt:Map literal. If the lexical form of such a literal is in the lexical space of the cdt:Map datatype, as defined in Section 4.2 Lexical Space, then the literal is called a well-formed cdt:Map literal. In contrast, a cdt:Map literal whose lexical form is not in the lexical space of the cdt:Map datatype is called an ill-formed cdt:Map literal.

4.1 Value Space

The value space of the cdt:Map datatype consists of all functions that map from a finite set of map keys to the set of map values, where the notion of a map key is defined as follows:

and the notion of a map value is defined as follows: Any such function is called term map, and the empty function for the set of map values (i.e., the function that maps from the empty set to the set of map values) is the empty term map. Moreover, every pair (k,v) consisting of a map key k and a map value v is called a map entry where k is called the key of the map entry and v is called the value of the map entry.

Given a term map tm and a map entry (k,v), we say that tm contains this map entry if k is in the domain of tm and tm maps k to v; i.e., tm(k)=v.

Note

The empty term map does not contain any map entries.

Note

A term map MAY contain multiple map entries that all have the same value, but they MUST have different keys. It is possible, however, that multiple map entries of a term map MAY have literals with the same literal value as their respective keys, as long as all these literals have a different lexical form. For instance, a term map MAY contain two map entries, (k,v) and (k',v'), such that k is the literal "0042"^^xsd:integer and k' is the literal "42"^^xsd:integer (using a Turtle representation of these literals for the sake of readability). Clearly, these are two different literals and, thus, two different map keys, but their literal value is the same (the integer 42).

Distinguishing map keys that are literals based on their lexical form rather than their value is a deliberate design decision. If the distinction was based on the value instead, then the distinction could be made only for literals with recognized datatype IRIs. This would not only raise the question of what to do with map keys that are literals with an unrecognized datatype IRI, but it would also make the notion of map keys (and what makes them different from one another) dependent on the set of datatype IRIs recognized by the RDF processing system. Such a dependency may then result in different and potentially incompatible behavior in terms of how cdt:Map literals are handled by different RDF processing systems that support different sets of recognized datatype IRIs. Distinguishing map keys that are literals based on their lexical form avoids such incompatibilities.

4.2 Lexical Space

The lexical space of the cdt:Map datatype consists of all strings that are recognized by the Map production of the grammar defined in Section 3.2 Lexical Space and in which all substrings that matched the MapKey production are different from one another.

Issue 1

The constraint about the substrings is not restrictive enough because the MapKey production recognizes both the generic way of serializing literals in Turtle and the Turtle shorthand notations for some datatypes. For instance, consider the following string:

"{ 42 : 'value1',  '42'^^<http://www.w3.org/2001/XMLSchema#integer> : 'value2' }"

This string satisfies the definition given above (including the constraint about the substrings) and, thus, would be a valid lexical form for cdt:Map literals. Yet, the keys of the two map entries represented in this string are the exact same literal, which is illegal according to the definition of term maps.

4.3 Lexical-To-Value Mapping

The lexical-to-value mapping of the cdt:Map datatype is defined as follows for every string S that is recognized by the Map production of the grammar defined in Section 3.2 Lexical Space. The definition assumes the same bnl2bn function as assumed for the lexical-to-value mapping of the cdt:List datatype.

If the string S does not contain a substring that is recognized by the NonEmptyMapContent production of the grammar, then S is mapped to the empty term map.

Otherwise, let S' be the greatest substring of S that is recognized by the NonEmptyMapContent production, let (E1, ..., En) be the sequence of all substrings of S' that are recognized by the MapEntry production, and for every i in {1, ..., n}, let Ki and Vi be the substrings of Ei that are recognized by the MapKey and the MapValue productions, respectively. Then, S is mapped to the term map tm that contains n map entries, one for each substring Ei, where i in {1, ..., n}. The particular map entry (ki,vi) that tm contains for Ei depends on which of the cases of the MapKey and the MapValue productions recognize Ki and Vi. In particular, depending on Ki, ki is defined as follows.

Moreover, depending on Vi, vi is defined as follows.

Since producing the term map for S as defined above relies on the capability to construct RDF terms according to the table in Section 7.2 of [TURTLE], it requires a parser that maintains state of the bnodeLabels mapping used for parsing Turtle, as defined in Section 7.1 of [TURTLE].

Note

The note about unescaping of IRIs as given in Section 3.3 Lexical-To-Value Mapping applies here too.

4.4 Relationship to JSON

This section is non-normative.

By the definition of the lexical space of the cdt:Map datatype (see Section 4.2 Lexical Space), strings that match the object production of the grammar of the JavaScript Object Notation (JSON) [RFC8259] can generally be used as a lexical form of a cdt:Map literal (with a few exceptions related to escaping of characters in strings, see the corresponding note in the following list). When doing so, the lexical-to-value mapping of the cdt:Map datatype (see Section 4.3 Lexical-To-Value Mapping) interprets the various elements of the JSON format as follows.

5. Importing and Exporting of Composite Values

5.1 Motivation

This section is non-normative.

In the normal handling of an RDF syntax, blank node identifiers are scoped to the current document, and any identifiers used in that document are merely a syntactic convenience for referencing blank nodes freshly allocated during parsing. As an example, consider the following snippet of Turtle syntax [TURTLE].

_:b1 ex:p4 "[ <http://example.org/alice>, <http://example.org/bob>, _:eve ]"^^cdt:List .

Parsing this data as Turtle will result in a single triple whose subject is a blank node (now dissociated from the b1 identifier), and whose object is a cdt:List literal that represents a three-element term list in which the third element is a blank node. However, using a Turtle parser not aware of the cdt:List datatype will result in an object literal containing the substring "_:eve"; that is, the literal will keep the exact lexical form as given in the Turtle snippet above (Example 20). Allowing multiple such composite values, from multiple sources, to be inserted into the same RDF Dataset (e.g., via two SPARQL LOAD updates) presents a challenge: when a system interprets the literals obtained from its Turtle parser, the term lists that it creates by applying the cdt:List lexical-to-value mapping will contain the same blank nodes (because, during the mapping process for each cdt:List literal, the bnl2bn function is applied to the same blank node identifier, _:eve). This is problematic, as it introduces a form of unintentional skolemization, removing the scoping of serialized blank nodes and allowing them to be universally referenced so long as they are wrapped in an enclosing composite type.

The following sections describe requirements that conforming SPARQL systems must comply with to support proper handling of blank nodes when importing or exporting RDF data that contains cdt:List or cdt:Map literals.

5.2 Importing Requirements

Conforming implementations MUST process cdt:List literals and cdt:Map literals during loading, replacing, in their lexical form, any substring bnl matching the BLANK_NODE_LABEL production of the grammar with a string bnl' such that

Here, "loading" is every individual, discrete process by which composite values enter a SPARQL system or RDF Dataset. This includes (but is not limited to):

As an example, consider the Turtle document in Example 21. Loading this document MUST result in a system-internal representation of a single triple whose object is a cdt:List literal containing a blank node that, in the lexical form of the literal, is captured by a blank node identifier that may have been replaced during import (e.g., Example 22).

Example 21: Loading of a cdt:List literal with a blank node
ex:s ex:p1 "[ _:b0 ]"^^cdt:List .
Example 22: Loading of a cdt:List literal with a blank node: Result
<http://example.org/s> <http://example.org/p1> "[ _:b001 ]"^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> .

Consider the INSERT DATA operation in Example 23, which adds a triple with a blank-node-containing cdt:List literal. Evaluating two separate SPARQL Update requests that both contain this INSERT DATA operation (i.e., using the same blank node identifier within the lexical form of the given cdt:List literal in both requests) MUST result in a system-internal representation of two triples whose objects are cdt:List literals containing different blank nodes that, in the lexical form of these literals, are captured by different blank node identifiers (e.g., Example 24).

Example 23: Repeated loading of cdt:List literals with a blank node
INSERT DATA { <s> <p2> "[ _:b0 ]"^^cdt:List } ;
Example 24: Repeated loading of cdt:List literals with a blank node: Result
<http://example.org/s> <http://example.org/p2> "[ _:b002 ]"^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> .
<http://example.org/s> <http://example.org/p2> "[ _:b003 ]"^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> .
Note

While a SPARQL Update request may contain multiple INSERT DATA operations, it is illegal to use the same blank node identifiers across these operations. This constraint extends to blank node identifiers in (the lexical form of) cdt:List and cdt:Map literals within such INSERT DATA operations. For instance, the following SPARQL Update request with two INSERT DATA operations is illegal.

Example 25: Two INSERT DATA operations within a single SPARQL Update request, both with the same blank node identifier in a cdt:List literal, which makes the request syntactically incorrect
INSERT DATA { <s> <p2> "[ _:b0 ]"^^cdt:List } ;
INSERT DATA { <s> <p2> "[ _:b0 ]"^^cdt:List } ;

Loading the Turtle in Example 26 with a triple with _:b0 appearing in both the subject position and inside a cdt:List literal object MUST result in a system-internal representation with a single triple whose subject is the same blank node as contained in the single-element cdt:List literal object (e.g., Example 27).

Example 26: Producing consistent blank node identifiers shared between a cdt:List literal and the RDF data that contains this literal
_:b0 ex:p3 "[ _:b0 ]"^^cdt:List .
Example 27: Producing consistent blank node identifiers shared between a cdt:List literal and the RDF data that contains this literal: Result
_:b004 <http://example.org/p3> "[ _:b004 ]"^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/List> .

Loading the Turtle in Example 28 with a triple with a cdt:Map literal containing two map entries, both having the same blank node as their value, MUST result in a system-internal representation with a single triple whose cdt:Map literal object contains two map entries having the same blank node value, and whose identifier MUST NOT be "b0" (e.g., Example 29).

Example 28: Consistent rewriting of blank node identifiers that are used multiple times in a cdt:Map literal
ex:s ex:p3 "{ 'this': _:b0, 'again': _:b0 }"^^cdt:Map .
Example 29: Consistent rewriting of blank node identifiers that are used multiple times in a cdt:Map literal: Result
<http://example.org/s> <http://example.org/p3> "{'this': _:b005,'again': _:b005}"^^<http://w3id.org/awslabs/neptune/SPARQL-CDTs/Map> .

5.3 Exporting Requirements

TODO

6. Extensions of Existing SPARQL Operators

This section defines extensions of some of the existing operators of SPARQL. In particular, the = operator and the != operator are extended both for cdt:List literals and for cdt:Map literals. Implementations of SPARQL that recognize the datatypes defined above MUST implement these operators with the extensions defined in this section.

To define the extensions formally, the operator mapping table for binary operators in Section 17.3 of [SPARQL11-QUERY] is extended with the following twelve rows.

Operator Type(A) Type(B) Function Result type
A = B cdt:List cdt:List list-equal(A, B) xsd:boolean
A = B cdt:Map cdt:Map map-equal(A, B) xsd:boolean
A != B cdt:List cdt:List fn:not(list-equal(A, B)) xsd:boolean
A != B cdt:Map cdt:Map fn:not(map-equal(A, B)) xsd:boolean
A < B cdt:List cdt:List list-less-than(A, B) xsd:boolean
A < B cdt:Map cdt:Map map-less-than(A, B) xsd:boolean
A > B cdt:List cdt:List list-less-than(B, A) xsd:boolean
A > B cdt:Map cdt:Map map-less-than(B, A) xsd:boolean
A <= B cdt:List cdt:List logical-or(list-less-than(A, B), list-equal(A, B)) xsd:boolean
A <= B cdt:Map cdt:Map logical-or(map-less-than(A, B), map-equal(A, B)) xsd:boolean
A >= B cdt:List cdt:List logical-or(list-less-than(B, A), list-equal(A, B)) xsd:boolean
A >= B cdt:Map cdt:Map logical-or(map-less-than(B, A), map-equal(A, B)) xsd:boolean

The four new functions used in these table rows are defined in Subsections 6.1 list-equal, 6.2 map-equal, 6.3 list-less-than, and 6.4 map-less-than below.

Note

It should be noted that extending the operator mapping table with these twelve rows changes the behavior of the corresponding operators (=, !=, <, >, <=, and >=) only for cases in which these operators would otherwise (i.e., without the extension) yield a type error. Therefore, this extension is conformant with the SPARQL specification, as explicitly stated in Section 17.3.1 of [SPARQL11-QUERY].

6.1 list-equal

xsd:boolean list-equal ( cdt:List term1, cdt:List term2 )

This function cannot be used directly in expressions. Instead, the purpose of this function is to define the semantics of the = operator when applied to two cdt:List literals.

If both term1 and term2 are well-formed cdt:List literals, then the result of this function is determined by the following algorithm.

  1. Let list1 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of term1 (i.e., list1 is the value of term1).
  2. Let list2 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of term2 (i.e., list2 is the value of term2).
  3. If both list1 and list2 are the empty term list, then return true.
  4. If the length of list1 is different from the length of list2, then return false.
  5. For every integer i from 1 to n, where n is the length of list1:
    1. Let elmt1 be the i-th element in list1.
    2. Let elmt2 be the i-th element in list2.
    3. If elmt1 is null and elmt2 is not null, return false.
    4. If elmt2 is null and elmt1 is not null, return false.
    5. If neither elmt1 nor elmt2 is null, then:
      1. If elmt1 is a blank node and elmt2 is a blank node, terminate with an error.
      2. If evaluating the SPARQL expression elmt1 = elmt2 results in an error, terminate with an error.
      3. If evaluating the SPARQL expression elmt1 = elmt2 results in false, return false.
  6. Return true.

If term1 or term2 is an ill-formed cdt:List literal, then this function produces an error.

Note
Whenever the given algorithm, during its pairwise comparison of the elements of the two given lists, comes across a pair in which both elements are null, the algorithm advances to the next pair of elements. In this sense, the algorithm considers null values to be indistinguishable from one another. This behavior is necessary to guarantee that an expression such as the following evaluates to true (rather than to false or to an error).
"[null]"^^cdt:List = "[null]"^^cdt:List
Guaranteeing that this expression evaluates to true is necessary for the following reason. Notice that the two arguments in this expression are identical (i.e., they are the same literal, with the same lexical form and the same datatype IRI). Therefore, by the definition of the = operator in [SPARQL11-QUERY], the evaluation of the expression is true if cdt:List is not in the set of recognized datatype IRIs (i.e., if the RDF processing system does not support cdt:List). Then, the version of the = operator for systems that support cdt:List (i.e., the version of the operator as defined by the list-equal function in this section) must also return true, because "no additional operator may yield a result that replaces any result other than a type error" (see Section 17.3.1 of [SPARQL11-QUERY]).

6.2 map-equal

xsd:boolean map-equal ( cdt:Map term1, cdt:Map term2 )

This function cannot be used directly in expressions. Instead, the purpose of this function is to define the semantics of the = operator when applied to two cdt:Map literals.

If both term1 and term2 are well-formed cdt:Map literals, then the result of this function is determined by the following algorithm.

  1. Let map1 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of term1 (i.e., map1 is the value of term1).
  2. Let map2 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of term2 (i.e., map2 is the value of term2).
  3. If both map1 and map2 are the empty term map, then return true.
  4. If the number of map entries that map1 contains is different from the number of map entries that map2 contains, then return false.
  5. Let error be a boolean-typed flag that is initialized with the value false.
  6. For every map entry (k, v1) contained in map1:
    1. If k is not in the domain of map2, return false.
    2. Let v2 be the map value map2(k).
    3. If v1 is null and v2 is not null, return false.
    4. If v2 is null and v1 is not null, return false.
    5. If neither v1 nor v2 is null, then:
      1. If v1 is a blank node and v2 is a blank node, set error to true and move on to the next map entry in map1 (i.e., skip over the remaining instructions within this for loop).
      2. If evaluating the SPARQL expression v2 = v1 results in an error, set error to true and move on to the next map entry in map1 (i.e., skip over the remaining instructions within this for loop).
      3. If evaluating the SPARQL expression v2 = v1 results in false, return false.
  7. If error is true, terminate with an error.
  8. Return true.

If term1 or term2 is an ill-formed cdt:Map literal, then this function produces an error.

6.3 list-less-than

xsd:boolean list-less-than ( cdt:List term1, cdt:List term2 )

This function cannot be used directly in expressions. Instead, the purpose of this function is to define the semantics of the < operator when applied to two cdt:List literals.

If both term1 and term2 are well-formed cdt:List literals, then the result of this function is determined by the following algorithm.

  1. Let list1 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of term1 (i.e., list1 is the value of term1).
  2. Let list2 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of term2 (i.e., list2 is the value of term2).
  3. If both list1 and list2 are the empty term list, then return false.
  4. For every integer i from 1 to n, where n is the minimum of the length of list1 and the length of list2:
    1. Let elmt1 be the i-th element in list1.
    2. Let elmt2 be the i-th element in list2.
    3. If elmt1 is null and elmt2 is null, return false.
    4. If elmt1 is null or elmt2 is null, terminate with an error.
    5. If elmt1 is a blank node and elmt2 is a blank node, terminate with an error.
    6. If evaluating the SPARQL expression elmt1 < elmt2 results in true or in false (i.e., not in an error), return this result.
    7. If evaluating the SPARQL expression elmt1 = elmt2 results in an error, terminate with an error.
    8. If evaluating the SPARQL expression elmt1 = elmt2 results in false, return false.
  5. If the length of list1 is smaller than the length of list2, return true.
  6. Return false.

If term1 or term2 is an ill-formed cdt:List literal, then this function produces an error.

6.4 map-less-than

xsd:boolean map-less-than ( cdt:Map term1, cdt:Map term2 )

This function cannot be used directly in expressions. Instead, the purpose of this function is to define the semantics of the < operator when applied to two cdt:Map literals.

If both term1 and term2 are well-formed cdt:Map literals, then the result of this function is determined by the following algorithm.

  1. Let map1 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of term1 (i.e., map1 is the value of term1).
  2. Let map2 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of term2 (i.e., map2 is the value of term2).
  3. If both map1 and map2 are the empty term map, then return false.
  4. Let entryset1 be the set of all map entries that are contained in map1.
  5. Let entryset2 be the set of all map entries that are contained in map2.
  6. Let entrylist1 be a list of all map entries in entryset1, ordered based on their respective map keys by using the following order:
    • Map entries with map keys that are IRIs are listed first, ordered based on comparing the string representation of these IRIs using the < operator (i.e., the IRIs are used as the lexical form of literals with the datatype xsd:string and the order is then determined by comparing these literals using the < operator)
    • The remaining map entries have map keys that are literals. They are primarily ordered based on the string representation of their datatype IRIs (i.e., using these datatype IRIs as the lexical form of literals with the datatype xsd:string and, then, determining the order by comparing these string literals using the < operator). Literals of the same datatype are ordered based on their lexical form (i.e., replacing their datatype IRI by xsd:string and, then, using the < operator for the resulting string literals). Finally, literals with datatype rdf:langString that have the same lexical form are sorted based on the string representation of their language tags.
    • Note
      The process of ordering the map entries is equivalent to performing the following query (in which term1 is term1).
      SELECT ?k ?v WHERE {
        UNFOLD( term1 AS ?k, ?v )
      }
      ORDER BY  ISLITERAL(?k)  STR(DATATYPE(?k))  STR(?k)  LANG(?k)
  7. Let entrylist2 be a list of all map entries in entryset2, ordered in the same way as entrylist1.
  8. For every integer i from 1 to n, where n is the minimum of the respective length of entrylist1 and of entrylist2:
    1. Let (k1, v1) be the i-th map entry in entrylist1.
    2. Let (k2, v2) be the i-th map entry in entrylist2.
    3. If evaluating the SPARQL expression SAMETERM(k1, k2) results in false, then:
      1. Return true if k1 is ordered before k2 according to the ordering of map keys as defined above for creating entrylist1.
      2. Return false.
    4. If v1 is null and v2 is null, return false.
    5. If v1 is null or v2 is null, terminate with an error.
    6. If evaluating the SPARQL expression v1 < v2 results in true or in false (i.e., not an error), return this result.
    7. If evaluating the SPARQL expression v1 = v2 results in an error, terminate with an error.
    8. If evaluating the SPARQL expression v1 = v2 results in false, return false.
  9. Return true if the length of entrylist1 is smaller than the length of entrylist2.
  10. Return false.

If term1 or term2 is an ill-formed cdt:Map literal, then this function produces an error.

7. Functions on Lists and Maps

This section defines SPARQL extension functions which operate over both cdt:List literals and cdt:Map literals. These functions can be used in SPARQL expressions for testing values in FILTER clauses and for assigning values in, e.g., BIND clauses and SELECT expressions.

These extension functions are evaluated as defined in Section 17.2.1 of [SPARQL11-QUERY].

7.1 cdt:get

RDF term cdt:get ( cdt:List term, xsd:integer idx )

If term is a well-formed cdt:List literal, the value of idx is greater than 0 (zero) and smaller or equal than the number of elements of the term list represented by term, and the element at the idx-th position of the term list represented by term is an RDF term (i.e., not null), then the result of this function is that RDF term at the idx-th position of the term list represented by term. In all other cases an error is raised.

RDF term cdt:get ( cdt:Map term, RDF term key )

If term is a well-formed cdt:Map literal, and there exists a map entry (k,v) in the term map represented by term where k is the same term as key and v is an RDF term (i.e., not null), then the result of this function is v. In all other cases an error is raised.

Note
By its definition as given above, cdt:get for cdt:List literals raises an error not only if the given cdt:List literal is not well-formed but also in the following cases:
  • if term is a well-formed cdt:List literal that represents the empty term list,
  • if the value of idx is out of the bounds of the term list represented by term, and
  • if the element at the idx-th position of the term list represented by term is null.
Similarly, cdt:get for cdt:Map literals raises an error not only if the given cdt:Map literal is not well-formed but also in the following cases:
  • if term is a well-formed cdt:Map literal that represents the empty term map,
  • if key is not a map key in the term map represented by term, and
  • if the value associated with key in the term map represented by term is null.
Note

The list version of the function uses a 1-based numbering scheme to address the elements in term list (i.e., the first element of a term list is addressed by an idx value of 1), in contrast to a 0-based numbering scheme which is common for addressing elements of arrays in many programming languages. The reason for using 1-based numbering for this function is to remain consistent with existing build-in functions of SPARQL, which are also 1-based (e.g., substr for string literals). Moreover, the 1-based numbering scheme is also used by functions of other W3C languages such as XPath (see, e.g., the XPath array:get function which is similar to cdt:get).

7.2 cdt:size

xsd:integer cdt:size ( cdt:List term )

If term is a well-formed cdt:List literal, then the result of this function is the number of elements that the term list represented by term contains.

An error is raised if term is not a well-formed cdt:List literal.

xsd:integer cdt:size ( cdt:Map term )

If term is a well-formed cdt:Map literal, then the result of this function is the number of map entries in the term map represented by term.

An error is raised if term is not a well-formed cdt:Map literal.

8. Functions on Lists

This section defines SPARQL extension functions specifically for cdt:List literals. These functions can be used in SPARQL expressions for testing values in FILTER clauses and for assigning values in, e.g., BIND clauses and SELECT expressions.

Except for cdt:List, these extension functions are evaluated as defined in Section 17.2.1 of [SPARQL11-QUERY]. In contrast, cdt:List is a so-called functional form which has specific evaluation rules (as defined in Section 8.1 cdt:List).

8.1 cdt:List

cdt:List cdt:List ( SPARQL expression expr1, ..., SPARQL expression exprn )

The result of this function is a cdt:List literal that represents a term list tm that contains n elements where, for every i in {1, ..., n}, the element at the i-th position of tm is constructed as follows. Evaluate the i-th expression, expri, as defined in [SPARQL11-QUERY]. If evaluating expri raises an error, then the element at the i-th position of tm is null. Otherwise, the element at the i-th position of tm is the RDF term resulting from the evaluation of expri.

If the function is called without arguments, then the result is a cdt:List literal that represents the empty term list.

8.2 cdt:concat

cdt:List cdt:concat ( cdt:List term1, ..., cdt:List termn )

If all arguments (i.e., term1, ..., termn) are well-formed cdt:List literals, then the result of this function is a cdt:List literal that represents the term list that is the concatenation of the term lists represented by term1 to termn (in the order in which these arguments are given). If the function is called without arguments, then the result is a cdt:List literal that represents the empty term list. If the function is called with a single argument and that argument is a well-formed cdt:List literal, then this literal is the result of this function.

An error is raised if any of the arguments is not a well-formed cdt:List literal.

Issue 2

We should define the notion of "represents" as used in this definition!

8.3 cdt:contains

xsd:boolean cdt:contains ( cdt:List term1, RDF term term2 )

If term1 is a well-formed cdt:List literal, then the result of this function is determined by the following algorithm.

  1. If term2 is a blank node, return false.
  2. If the term list represented by term1 contains an RDF term term' such that evaluating the SPARQL expression term' = term2 results in true, return true.
  3. Return false.

An error is raised if term1 is not a well-formed cdt:List literal.

8.4 cdt:head

RDF term cdt:head ( cdt:List term )

If term is a well-formed cdt:List literal, the term list represented by term contains at least one element, and the element at the first position of this term list is an RDF term (i.e., not null), then the result of this function is that RDF term. In all other cases an error is raised.

Note
By its definition as given above, cdt:head raises an error not only if the given cdt:List literal is not well-formed but also in the following cases:
  • if term is a well-formed cdt:List literal that represents the empty term list, and
  • if the element at the first position of the term list represented by term is null.

8.5 cdt:reverse

cdt:List cdt:reverse ( cdt:List term )

If term is a well-formed cdt:List literal, then the result of this function is a cdt:List literal that represents the term list that contains all elements of the term list represented by term, in the reverse order (e.g., the first element of the term list of term becomes the last element of the term list represented by the resulting cdt:List literal). This definition implies that

An error is raised if term is not a well-formed cdt:List literal.

8.6 cdt:subseq

cdt:List cdt:subseq ( cdt:List term, xsd:integer idx )

cdt:List cdt:subseq ( cdt:List term, xsd:integer idx, xsd:integer length )

If term is a well-formed cdt:List literal, then the result of this function is determined as follows. Let tm be the term list represented by term and let n be the number of elements contained in tm.

In all other cases an error is raised, including the case that term is not a well-formed cdt:List literal.

Note

As the cdt:get function, this function uses a 1-based numbering scheme to address the elements in term maps (i.e., the first element of a term map is addressed by an idx value of 1) in order to be consistent with existing build-in functions of SPARQL (e.g., substr) and with functions in other W3C languages such as XPath (e.g., array:subarray).

Note

This function corresponds to the XPath array:subarray function and, for the sake of consistency, is defined using the same rules and error conditions. As a consequence, the following special case of arguments is permitted for this function. For term maps with a size n>0, it is permitted to call the function with an idx value equal to n+1 as long as the value of length is 0 or length is omitted. When doing so, the result is a cdt:List literal that represents the empty term list.

8.7 cdt:tail

cdt:List cdt:tail ( cdt:List term )

If term is a well-formed cdt:List literal and the term list represented by term contains at least one element, then the result of this function is a cdt:List literal that represents the term list that contains all but the first element of the term list represented by term, where the remaining elements are kept in the same order. This definition implies that, if the term list of term consists of a single element, the resulting cdt:List literal must represent the empty term list.

An error is raised if term is not a well-formed cdt:List literal or if it is a well-formed cdt:List literal that represents the empty term list.

9. Functions on Maps

This section defines SPARQL extension functions specifically for cdt:Map literals. These functions can be used in SPARQL expressions for testing values in FILTER clauses and for assigning values in, e.g., BIND clauses and SELECT expressions.

Except for cdt:Map and cdt:put, these extension functions are evaluated as defined in Section 17.2.1 of [SPARQL11-QUERY]. In contrast, cdt:Map and cdt:put are so-called functional forms which have specific evaluation rules (as defined in Section 9.1 cdt:Map and Section 9.5 cdt:put, respectively).

9.1 cdt:Map

cdt:Map cdt:Map ( SPARQL expression expr1, ..., SPARQL expression exprn )

If n is an even number, then the result of this function is a cdt:Map literal that represents a term map tm that contains up to n/2 map entries, produced by the following algorithm.

  1. Initialize tm as the empty term map.
  2. For every integer i from 1 to n/2:
    1. Evaluate the (2i-1)-th expression, expr2i-1, as defined in [SPARQL11-QUERY].
    2. If evaluating expr2i-1 did not raise an error and the RDF term resulting from this evaluation is a map key, then:
      1. Let k be the RDF term resulting from the evaluation of expr2i-1.
      2. Evaluate the 2i-th expression, expr2i, as defined in [SPARQL11-QUERY].
      3. Let v be the RDF term resulting from the evaluation of expr2i if this evaluation did not raise and error; otherwise, let v be null.
      4. If tm contains a map entry (k',v') such that k=k', then replace this map entry (k',v') by the map entry (k,v). Otherwise, add the map entry (k,v) to tm.

If n is an odd number, an error is raised.

Note

By the given definition, every argument to this function that is meant to produce the map key of a map entry for the constructed term map but that evaluates to an error or to an RDF term that is not a map key is ignored, and so is the directly following argument (i.e., the expression that is meant to produce the corresponding map value).

Note

If the cdt:Map function is called without arguments (i.e., n=0), then the result is a cdt:Map literal that represents the empty term map.

9.2 cdt:containsKey

xsd:boolean cdt:containsKey ( cdt:Map term1, RDF term term2 )

If term1 is a well-formed cdt:Map literal, then the result of this function is determined by the following algorithm.

  1. If the term map represented by term1 contains a map entry (k,v) such that k is the same term as term2, return true.
  2. Return false.

An error is raised if term1 is not a well-formed cdt:Map literal.

9.3 cdt:keys

cdt:List cdt:keys ( cdt:Map term )

If term is a well-formed cdt:Map literal, then the result of this function is a cdt:List literal that represents a term list that consists of all map keys of the map entries contained in the term map represented by term. The order of the elements in this term list is undefined.

If term is an ill-formed cdt:Map literal, then this function produces an error.

Note

If the term map represented by term is the empty term map, then the resulting cdt:List literal represents the empty term list.

9.4 cdt:merge

cdt:Map cdt:merge ( cdt:Map term1, cdt:Map term2 )

If term1 is a well-formed cdt:Map literal and term2 is a well-formed cdt:Map literal, then the result of this function is a cdt:Map literal that represents a term map containing all map entries contained in the term map represented by term1, together with every map entry (k,v) that is contained in the term map represented by term2 and that has a map key k that is not in the domain of the term map of term1.

If term1 is an ill-formed cdt:Map literal or term2 is an ill-formed cdt:Map literal, then this function produces an error.

Note

By the definition of the cdt:merge function, and as illustrated in the example above, if the term maps of the two given cdt:Map literals contain map entries with the same key, then the respective map entry from the first of the two literals is used for the resulting merged term map. The rationale for defining cdt:merge in this way is to be consistent with the default way in which the XPath map:merge function handles duplicate keys.

Note

If the term map represented by any of the two given cdt:Map literals is the empty term map, then the cdt:Map literal returned by this function represents exactly the same term map as the other of the two given cdt:Map literals (which may, of course, be the empty term map as well).

9.5 cdt:put

cdt:Map cdt:put ( SPARQL expression expr1, SPARQL expression expr2 )

cdt:Map cdt:put ( SPARQL expression expr1, SPARQL expression expr2, SPARQL expression expr3 )

If evaluating the expression expr1 results in a well-formed cdt:Map literal, hereafter denoted by lit, and evaluating the expression expr2 results in a map key, hereafter denoted by key, then the result of this function is a cdt:Map literal that represents a term map tm produced by the following algorithm.

  1. Initialize tm as the empty term map.
  2. Let tm' be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of lit (i.e., tm' is the value of lit).
  3. For every map entry (k,v) that is contained in tm':
    1. If k is not the same term as key, add the map entry (k,v) to tm.
  4. If the expression expr3 is given and evaluating this expression does not result in an error, then add the map entry (keyterm) to tm, where term is the RDF terms resulting from the evaluation of expr3.
  5. If the expression expr3 is not given or evaluating this expression results in an error, then add the map entry (keynull) to tm.

In all other cases an error is raised.

Note

If the term map tm' represented by lit contains a map entry (k,v) such that k is the same term as key and v is the same term as the result of evaluating expr3 (resp. v is null and either expr3 is not given or evaluating expr3 results in an error), then the cdt:Map literal returned by this function represents exactly the same term map as lit; i.e., tm'.

9.6 cdt:remove

cdt:Map cdt:remove ( cdt:Map term1, RDF term term2 )

If term1 is a well-formed cdt:Map literal and term2 is a map key, then the result of this function is a cdt:Map literal that represents a term map containing all map entries contained in the term map represented by term1, except for any map entry whose map key is the same term as term2.

If term1 is a well-formed cdt:Map literal and term2 is not a map key, then the result of this function is term1.

If term1 is an ill-formed cdt:Map literal, then this function produces an error.

10. Extension of ORDER BY

This section extends the definition of the ORDER BY clause of SPARQL to explicitly define the relative order of two RDF terms that both are cdt:List literals or cdt:Map literals.

According to the definition of ORDER BY ordering in Section 15.1 of [SPARQL11-QUERY], cdt:List literals and cdt:Map literals are ordered higher than any blank node and any IRI (because they are RDF literals).

10.1 Relative Order of cdt:List Literals

Given two cdt:List literals lit1 and lit2, if they both are well-formed, the relative order of these two literals is determined by the algorithm listed below. In brief, the algorithm operates as follows: It iterates over the term lists of both literals and compares their elements pairwise. For every pair for which both elements are null, the algorithm advances to the next pair. Once the algorithm comes across a pair with only one null value, the corresponding cdt:List literal that contains this null value is ordered lower than the other cdt:List literal. If none of the two elements in the current pair is null, and one of the two elements is ordered higher than the other element, then the cdt:List literal from which this higher-ordered element originates is defined as ordered higher than the other cdt:List literal. It may also be possible that the relative order of the two elements in such a pair is undefined. In such a case the relative order of the two cdt:List literals is undefined as well, unless the two elements are equal, in which case the algorithm also advances to the next pair. If the algorithm reaches the end of at least one of the two term lists and the other term list contains further elements, then the cdt:List literal with the larger term list is ordered higher. If no decision can be made based on any of the aforementioned conditions, then the relative order of the two literals is determined based on their lexical forms.

  1. Let list1 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of lit1 (i.e., list1 is the value of lit1).
  2. Let list2 be the term list obtained by applying the lexical-to-value mapping of the cdt:List datatype to the lexical form of lit2 (i.e., list2 is the value of lit2).
  3. For every integer i from 1 to n, where n is the minimum of the length of list1 and the length of list2:
    1. Let elmt1 be the i-th element in list1.
    2. Let elmt2 be the i-th element in list2.
    3. If elmt1 is null and elmt2 is not null, then lit2 is ordered higher than lit1.
    4. If elmt2 is null and elmt1 is not null, then lit1 is ordered higher than lit2.
    5. If neither elmt1 nor elmt2 is null (i.e., both are RDF terms), then:
      1. If the relative order of elmt1 and elmt2 is defined such that elmt1 is ordered higher than elmt2 (according to the definition in Section 15.1 of [SPARQL11-QUERY], and also considering the extension of this definition as provided in this section), then lit1 is ordered higher than lit2.
      2. If the relative order of elmt1 and elmt2 is defined such that elmt2 is ordered higher than elmt1, then lit2 is ordered higher than lit1.
      3. If evaluating the SPARQL expression elmt1 = elmt2 results in false or in an error, then the relative order of lit1 and lit2 is undefined.
  4. If the length of list1 is greater than the length of list2, then lit1 is ordered higher than lit2.
  5. If the length of list2 is greater than the length of list1, then lit2 is ordered higher than lit1.
  6. Let str1 be the literal that has the same lexical form as lit1 and the datatype IRI xsd:string.
  7. Let str2 be the literal that has the same lexical form as lit2 and the datatype IRI xsd:string.
  8. If str1 is ordered higher than str2, then lit1 is ordered higher than lit2.
  9. If str2 is ordered higher than str1, then lit2 is ordered higher than lit1.
  10. The relative order of lit1 and lit2 is undefined.

If at least one of the two literals, lit1 or lit2, is ill-formed, then their relative order is undefined.

10.2 Relative Order of cdt:Map Literals

Given two cdt:Map literals lit1 and lit2, if they both are well-formed, the relative order of these two literals is determined by the following algorithm.

  1. Let map1 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of lit1 (i.e., map1 is the value of lit1).
  2. Let map2 be the term map obtained by applying the lexical-to-value mapping of the cdt:Map datatype to the lexical form of lit2 (i.e., map2 is the value of lit2).
  3. Let entrylist1 be a list of all map entries that are contained in map1, ordered based on their respective map keys by using the same order as defined in Step 6 of the algorithm in Section 6.4 map-less-than.
  4. Let entrylist2 be a list of all map entries that are contained in map2, ordered in the same way as entrylist1.
  5. For every integer i from 1 to n, where n is the minimum of the respective length of entrylist1 and of entrylist2:
    1. Let (k1, v1) be the i-th map entry in entrylist1.
    2. Let (k2, v2) be the i-th map entry in entrylist2.
    3. If evaluating the SPARQL expression SAMETERM(k1, k2) results in false, then:
      1. If k1 is ordered before k2 according to the ordering of map keys as used above for creating entrylist1 (and as defined in Step 6 of the algorithm in Section 6.4 map-less-than), then lit1 is ordered before lit2.
      2. lit2 is ordered before lit1.
    4. If v1 is null and v2 is not null, then lit2 is ordered higher than lit1.
    5. If v2 is null and v1 is not null, then lit1 is ordered higher than lit2.
    6. If neither v1 nor v2 is null (i.e., both are RDF terms), then:
      1. If the order of v1 and v2 is defined such that v1 is ordered higher than v2, then lit1 is ordered higher than lit2.
      2. If the order of v1 and v2 is defined such that v2 is ordered higher than v1, then lit2 is ordered higher than lit1.
      3. If evaluating the SPARQL expression v1 = v2 results in false or in an error, then the relative order of lit1 and lit2 is undefined.
  6. If the length of entrylist1 is greater than the length of entrylist2, then lit1 is ordered higher than lit2.
  7. If the length of entrylist2 is greater than the length of entrylist1, then lit2 is ordered higher than lit1.
  8. Let str1 be the literal that has the same lexical form as lit1 and the datatype IRI xsd:string.
  9. Let str2 be the literal that has the same lexical form as lit2 and the datatype IRI xsd:string.
  10. If str1 is ordered higher than str2, then lit1 is ordered higher than lit2.
  11. If str2 is ordered higher than str1, then lit2 is ordered higher than lit1.
  12. The relative order of lit1 and lit2 is undefined.

If at least one of the two literals, lit1 or lit2, is ill-formed, then their relative order is undefined.

11. FOLD

This section defines a new SPARQL Set Function, FOLD, used to construct composite values. As illustrated in Section 2.3 FOLD Aggregate, this function can be used to collect individual values into cdt:List literals and cdt:Map literals.

11.1 Grammar

Adding the FOLD aggregate to SPARQL requires the following extension of the Aggregate production of the SPARQL grammar. The part in which the extended production differs from the corresponding production in the original grammar is marked in bold font. The given production rule uses several symbols for which no production rule is provided below (e.g., Expression, OrderCondition); the production rules for these symbols are defined as given in the original SPARQL grammar.

[127+] Aggregate  ::=    'COUNT' '(' 'DISTINCT'? ( '*' | Expression ) ')'
| 'SUM' '(' 'DISTINCT'? Expression ')'
| 'MIN' '(' 'DISTINCT'? Expression ')'
| 'MAX' '(' 'DISTINCT'? Expression ')'
| 'AVG' '(' 'DISTINCT'? Expression ')'
| 'SAMPLE' '(' 'DISTINCT'? Expression ')'
| 'GROUP_CONCAT' '(' 'DISTINCT'? Expression ( ';' 'SEPARATOR' '=' String )? ')'
| 'FOLD' '(' 'DISTINCT'? Expression ( ',' Expression )? ( 'ORDER' 'BY' OrderCondition+ )? ')'

11.2 Translation to the Algebra

Based on the SPARQL grammar, the SPARQL specification defines the process of converting graph patterns and solution modifiers in a SPARQL query string into a SPARQL algebra expression (see Section 18.2 of [SPARQL11-QUERY]). This process must be adapted to consider FOLD clauses as added by the extended grammar introduced above. In particular, the step of translating grouping and aggregation during this process needs to be extended. The original algorithm that defines this translation is given in Section 18.2.4.1 of [SPARQL11-QUERY]. The extended version of the algorithm is given as follows, where the parts that are modified or added to cover the FOLD operator are marked in bold font.

Let A := the empty sequence
Let Q := the query level being evaluated
Let P := the algebra translation of the GroupGraphPattern of the query level
Let E := [], a list of pairs of the form (variable, expression)

If Q contains GROUP BY exprlist
   Let G := Group(exprlist, P)
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
   Let G := Group((1), P)
Else
   skip the rest of the aggregate step
   End

Global i := 1   # Initially 1 for each query processed

For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q
  For each unaggregated variable V in X
      Replace V with Sample(V)
      End
  For each aggregate R(args ORDER BY orderconditions; scalarvals) now in X
      # note ORDER BY clause may be omitted
      # note scalarvals may be omitted, then it's equivalent to the empty set
      If aggregate contains ORDER BY orderconditions
          G := OrderGroups(G, orderconditions)
          End
      If R = Fold
          If |args| = 1
              setfunc = Fold1
          Else
              setfunc = Fold2
              End
      Else
          setfunc = R
          End
      Ai := Aggregation(args, setfunc, scalarvals, G)
      Replace R(...) with aggi in Q
      i := i + 1
      End
  End

For each variable V appearing outside of an aggregate
   Ai := Aggregation(V, Sample, {}, G)
   E := E append (V, aggi)
   i := i + 1
   End

A := Ai, ..., Ai-1
P := AggregateJoin(A)

Notice that the extended algorithm uses the new algebra symbol OrderGroups. Hence, to use this algorithm, the list of symbols in the SPARQL algebra (as given in the second table of Section 18.2 [SPARQL11-QUERY]) needs to be extended by adding this symbol. The evaluation semantics of algebra expressions that contain this symbol is defined in the following.

11.3 Algebra

This section defines an operator for evaluating the new algebra symbol (as Section 18.5 of [SPARQL11-QUERY] does for the standard SPARQL algebra symbols). This operator has the same name as the algebra symbol and shall be used in the following section to extend the evaluation semantics of algebra expressions that may contain the new algebra symbol.

Definition: OrderGroups

Let { key1Ψ1, ..., keymΨm } be a partial function from keys to solution sequences. OrderGroups({ key1Ψ1, ..., keymΨm }, condition) is also a partial function from keys to solution sequences that is defined as follows:

OrderGroups({ key1Ψ1, ..., keymΨm }, condition) = { key1OrderBy(Ψ1, condition), ..., keymOrderBy(Ψm, condition) }

11.3.1 Aggregate Algebra

Two new set functions are introduced to support the FOLD aggregate:

11.3.1.1 Fold1

Fold1 is a SPARQL set function which combines the elements in the aggregate group into a cdt:List literal.

Definition: Fold1

cdt:List Fold1(sequence S)

L = Flatten(S)

n = |L|

Fold1(S) is a cdt:List literal that represents a term list tm that contains n elements where, for every i in {1, ..., n}, the element at the i-th position of tm is determined as follows: If the element at the i-th position of L is an error, then the element at the i-th position of tm is null. Otherwise, the element at the i-th position of tm is the element at the i-th position of L.

Note

If S is the empty sequence, then n=0 and, thus, the result of Fold1(S) is a cdt:List literal that represents the empty term list.

Note

By its definition, Fold1 behaves in the same way as the cdt:List function if that function is invoked by passing each element of L as a separate argument.

11.3.1.2 Fold2

Fold2 is a SPARQL set function which combines the elements in the aggregate group into a cdt:Map literal.

Definition: Fold2

cdt:Map Fold2(sequence S)

L = Flatten(S)

n = |L|
Note that n is an even number, which is due to i) the grammar of the FOLD aggregate, ii) the corresponding translation to the algebra, and iii) the definition of the Aggregation operator.

Fold2(S) is a cdt:Map literal that represents a term map tm that is constructed by the following algorithm.

  1. Initialize tm as the empty term map.
  2. For every integer i from 1 to n/2:
    1. Let k be the element at the (2i-1)-th position of L.
    2. Let v be the element at the 2i-th position of L.
    3. If k is neither an error nor a blank node, then:
      1. If tm contains a map entry (k',v') such that k=k', then replace this map entry (k',v') by the map entry (k,v). Otherwise, add the map entry (k,v) to tm.
Note

If S is the empty sequence, then n=0 and, thus, the result of Fold2(S) is a cdt:Map literal that represents the empty term map.

Note

By its definition, Fold2 behaves in the same way as the cdt:Map function if that function is invoked by passing each element of L as a separate argument.

11.4 Evaluation Semantics

The SPARQL specification defines a function eval(D(G), algebra expression) as the evaluation of an algebra expression with respect to a dataset D having active graph G (see Section 18.6 of [SPARQL11-QUERY]). The definition of this function is recursive. To cover algebra expressions that may contain the new algebra symbol (OrderGroups) the definition of this function needs to be extended with the following additional case.

Definition: Evaluation of OrderGroups

eval(D(G), OrderGroups(L, orderconditions)) = OrderGroups(eval(D(G), L), orderconditions)

12. UNFOLD

This section defines an extension of the SPARQL language that adds a new operator called UNFOLD. As illustrated in Section 2.4 UNFOLD Operator, this operator can be used to transform composite values (given in the form of cdt:List literals or cdt:Map literals) into their individual components and assigns these components separately to new query variables.

In particular, recall from Section 2.4 UNFOLD Operator that there are two variations of this operator: one that assigns a single new query variable (i.e., UNFOLD(expr AS ?v1)) and another one that assigns two new query variables (i.e., UNFOLD(expr AS ?v1, ?v2)). When applied to cdt:List literals, the one-variable version binds the new variable to each of the RDF terms in the corresponding term list (or leaves it unbound for null values) and the two-variables version binds the first new variable to each RDF term in the term list (or leaves it unbound for null values) and the second variable to the corresponding position of the term/null in the list. When applied to cdt:Map literals, the one-variable version binds the variable to each of the keys in the term map and the two-variables version binds the first variable to each key in the term map and the second variable to the corresponding value (or leaves it unbound for null values).

Note

Note that there is an asymmetry between the two-variables version of UNFOLD for cdt:List literals and the two-variables version for cdt:Map literals. In the case of maps, the values are assigned to the second variable whereas, for lists, the values (the list elements) are assigned to the first variable.

While it would be possible to eliminate this asymmetry by changing the two-variables version of UNFOLD for cdt:List literals such that the list elements are assigned to the second variable (and the positions to the first variable), doing so would introduce another asymmetry: This asymmetry would then be between the so-changed two-variables version and the one-variable version of UNFOLD for cdt:List literals, because the latter binds the first (and, in this case, only) variable to the values. A follow-up change to eliminate this asymmetry as well may be to define the one-variable version such that it assigns the (single) variable to the positions and not to the list elements. However, this definition would have a very limited practical value and, thus, the initially mentioned asymmetry is considered as the best possible compromise.

Adding this new UNFOLD operator to the language requires an extension to the grammar of SPARQL, an extension of the SPARQL algebra, and corresponding extensions both of the process of converting SPARQL graph patterns into SPARQL algebra expressions and of the evaluation semantics. The following sections define these extensions.

12.1 Grammar

Adding the UNFOLD operator to SPARQL requires the following extension of the GraphPatternNotTriples production of the SPARQL grammar. The part in which the extended production differs from the corresponding production in the original grammar is marked in bold font. The second production defined below (i.e., Unfold) needs to be added and has no counterpart in the original grammar. The given production rules use several symbols for which no production rule is provided below (e.g., GroupOrUnionGraphPattern, Expression); the production rules for these symbols are defined as given in the original SPARQL grammar.

[56+] GraphPatternNotTriples  ::=  GroupOrUnionGraphPattern | OptionalGraphPattern | MinusGraphPattern | GraphGraphPattern | ServiceGraphPattern | Filter | GroupOrUnionGraphPattern | Bind | InlineData | Unfold
[174] Unfold  ::=  'UNFOLD' '(' Expression 'AS' Var ( ',' Var )? ')'

Two additional constraints MUST be satisfied when using the UNFOLD operator:

  1. For every expression of the form UNFOLD(expr AS v, v'), v and v' MUST be two different variables.
  2. Second, the variable(s) assigned in an UNFOLD operator must not be already in use; more formally:
    • For every expression of the form UNFOLD(expr AS v), the variable v MUST NOT be in-scope from the preceding elements in the group graph pattern in which the expression is used.
    • For every expression of the form UNFOLD(expr AS v, v'), the variable v MUST NOT be in-scope from the preceding elements in the group graph pattern in which the expression is used, and the same holds for the variable v'.

The notion of in-scope variables as used in the second constraint is defined in Section 18.2.1 of [SPARQL11-QUERY]. When using the extended syntax with the UNFOLD operator, this notion must be extended as well. To capture this extension formally, the table in Section 18.2.1 of [SPARQL11-QUERY] is augmented with the following two additional rows.

Syntax Form In-scope variables
UNFOLD(expr AS v) v is in-scope
UNFOLD(expr AS v, v') v and v' are in-scope

12.2 Translation to the Algebra

Based on the SPARQL grammar, the SPARQL specification defines the process of converting graph patterns and solution modifiers in a SPARQL query string into a SPARQL algebra expression (see Section 18.2 of [SPARQL11-QUERY]). This process must be adapted to consider UNFOLD clauses as added by the extended grammar introduced above. In particular, the step of translating group graph patterns (GroupGraphPattern) during this process needs to be extended. The original algorithm that defines this translation is given in Section 18.2.2.6 of [SPARQL11-QUERY]. The extended version of the algorithm is given as follows, where the parts that are added to cover the UNFOLD operator are marked in bold font.

Let G := the empty pattern, a basic graph pattern which is the empty set.

For each element E in the GroupGraphPattern

    If E is of the form OPTIONAL{P}
        Let A := Translate(P)
        If A is of the form Filter(F, A2)
            G := LeftJoin(G, A2, F)
        Else
            G := LeftJoin(G, A, true)
            End
        End

    If E is of the form MINUS{P}
        G := Minus(G, Translate(P))
        End

    If E is of the form BIND(expr AS var)
        G := Extend(G, var, expr)
        End
If E is of the form UNFOLD(expr AS var)
    G := Unfold1(G, var, expr)
    End

If E is of the form UNFOLD(expr AS var1, var2)
    G := Unfold2(G, var1, var2, expr)
    End
    If E is any other form
        Let A := Translate(E)
        G := Join(G, A)
        End

   End

The result is G.

Notice that the extended algorithm uses two new algebra symbols: Unfold1 and Unfold2. Hence, to use this algorithm, the list of symbols in the SPARQL algebra (as given in the second table of Section 18.2 [SPARQL11-QUERY]) needs to be extended by adding these two symbols. The evaluation semantics of algebra expressions that contain any of these two symbols is defined in the following.

12.3 Algebra

This section defines operators for evaluating the two new algebra symbols (as Section 18.5 of [SPARQL11-QUERY] does for the standard SPARQL algebra symbols). These operators have the same name as the algebra symbols and shall be used in the following section to extend the evaluation semantics of algebra expressions that may contain the two new algebra symbols.

The operator for evaluating Unfold1 is defined as follows.

Definition: Unfold1

Let μ be a solution mapping, var be a variable, and expr be an expression.
Then, Unfold1(μ, var, expr) is a multiset of solution mappings that is defined as follows:

  • If var ∉ dom(μ) and expr(μ) is a well-formed cdt:List literal, then
    • Unfold1(μ, var, expr) = { μ | L contains null } ∪ Ω', where
          L is the term list represented by the cdt:List literal expr(μ) and
          Ω' = { μ ∪ (var, term) | term is an RDF term in L },
    • card[Unfold1(μ, var, expr)](μ') = 0 for every μ' ≠ μ and μ' ∉ Ω',
    • card[Unfold1(μ, var, expr)](μ') is the number of null values in L if μ' = μ, and
    • card[Unfold1(μ, var, expr)](μ') is the number of times the RDF term μ'(var) is contained in L if μ' ∈ Ω'.
  • If var ∉ dom(μ) and expr(μ) is a well-formed cdt:Map literal, then
    • Unfold1(μ, var, expr) = { μ ∪ (var, key) | key is the key of a map entry in M }, where
          M is the term map represented by the cdt:Map literal expr(μ),
    • card[Unfold1(μ, var, expr)](μ') = 1 for every μ' ∈ Unfold1(μ, var, expr), and
    • card[Unfold1(μ, var, expr)](μ') = 0 for every μ' ∉ Unfold1(μ, var, expr).
  • If var ∉ dom(μ) and expr(μ) is an error or an RDF term that is neither a well-formed cdt:List literal nor a well-formed cdt:Map literal, then
    • Unfold1(μ, var, expr) = {μ},
    • card[Unfold1(μ, var, expr)](μ') = 1 if μ' = μ, and
    • card[Unfold1(μ, var, expr)](μ') = 0 for every μ' ≠ μ.
  • If var ∈ dom(μ), then Unfold1(μ, var, expr) is undefined.

Let Ω a multiset of solution mappings. We define:
Unfold1(Ω, var, expr) = ⋃μΩ Unfold1(μ, var, expr)

Note

By the given definition , for cases in which var ∉ dom(μ) and expr(μ) is a well-formed cdt:List literal that represents the empty term list, Unfold1(μ, var, expr) is the empty multiset of solution mappings. Similarly, if var ∉ dom(μ) and expr(μ) is a well-formed cdt:Map literal that represents the empty term map, then Unfold1(μ, var, expr) is also the empty multiset.

The operator for evaluating Unfold2 is defined as follows.

Definition: Unfold2

Let μ be a solution mapping, var1 and var2 be two different variables, and expr be an expression.
Then, Unfold2(μ, var1, var2, expr) is a multiset of solution mappings that is defined as follows:

  • If var1 ∉ dom(μ), var2 ∉ dom(μ), and expr(μ) is a well-formed cdt:List literal such that the term list L represented by this literals is a nonempty, then
    • Unfold2(μ, var1, var2, expr) = { μμi ∪ (var2, i) | 1 ≤ in }, where
          μi is the empty solution mapping if the element at the i-th position of L is null,
          otherwise, μi = (var1, term) where term is element at the i-th position of L, and
          n is the number of elements in L,
    • card[Unfold1(μ, var, expr)](μ') = 1 for every μ' ∈ Unfold2(μ, var1, var2, expr), and
    • card[Unfold1(μ, var, expr)](μ') = 0 for every μ' ∉ Unfold2(μ, var1, var2, expr).
  • If var1 ∉ dom(μ), var2 ∉ dom(μ), and expr(μ) is a well-formed cdt:List literal that represents the empty term list, then Unfold2(μ, var1, var2, expr) is the empty multiset.
  • If var1 ∉ dom(μ), var2 ∉ dom(μ), and expr(μ) is a well-formed cdt:Map literal, then
    • Unfold2(μ, var1, var2, expr) = { μ ∪ (var1, k) ∪ μv | (k,v) is a map entry in M }, where
          μv is the empty solution mapping if v is null, otherwise, μv = (var2, v), and
          M is the term map represented by the cdt:Map literal expr(μ),
    • card[Unfold2(μ, var1, var2, expr)](μ') = 1 for every μ' ∈ Unfold2(μ, var1, var2, expr),
    • card[Unfold2(μ, var1, var2, expr)](μ') = 0 for every μ' ∉ Unfold2(μ, var1, var2, expr).
  • If var1 ∉ dom(μ), var2 ∉ dom(μ), and expr(μ) is an error or an RDF term that is neither a well-formed cdt:List literal nor a well-formed cdt:Map literal, then
    • Unfold2(μ, var1, var2, expr) = {μ},
    • card[Unfold2(μ, var1, var2, expr)](μ') = 1 if μ' = μ, and
    • card[Unfold2(μ, var1, var2, expr)](μ') = 0 for every μ' ≠ μ.
  • If var1 ∈ dom(μ) or var2 ∈ dom(μ), then Unfold2(μ, var1, var2, expr) is undefined.

Let Ω a multiset of solution mappings. We define:
Unfold2(Ω, var1, var2, expr) = ⋃μΩ Unfold2(μ, var1, var2, expr)

Note

While the given definition captures the case of cdt:List literals with the empty term list explicitly (see the second main bullet point in the definition), the case of cdt:Map literals with the empty term map are captured implicitly as part of the third main bullet point. In this case, the result of Unfold2(μ, var1, var2, expr) is also the empty multiset of solution mappings.

12.4 Evaluation Semantics

The SPARQL specification defines a function eval(D(G), algebra expression) as the evaluation of an algebra expression with respect to a dataset D having active graph G (see Section 18.6 of [SPARQL11-QUERY]). The definition of this function is recursive. To cover algebra expressions that may contain the two new algebra symbols (Unfold1 and Unfold2) the definition of this function needs to be extended with the following two additional cases.

Definition: Evaluation of Unfold1

eval(D(G), Unfold1(P, var, expr)) = Unfold1(eval(D(G), P), var, expr)

Definition: Evaluation of Unfold2

eval(D(G), Unfold2(P, var1, var2, expr)) = Unfold2(eval(D(G), P), var1, var2, expr)

A. IANA Considerations

This section is non-normative.

TODO

B. Complete EBNF Grammars

B.1 Composite Datatypes EBNF Grammar

This section is non-normative.

The following is a complete grammar for composite datatype literals (i.e., literals that conform either to the cdt:List datatype or to the cdt:Map datatype). The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

B.2 SPARQL-CDT EBNF Grammar

This section is non-normative.

The following is a complete grammar for the SPARQL language with the extensions for composite datatype literals as defined in this document. The EBNF used here is defined in XML 1.0 [EBNF-NOTATION].

C. References

C.1 Normative references

[EBNF-NOTATION]
EBNF NOTATION. Tim Bray; Jean Paoli; C. M. Sperberg-McQueen; Eve Maler; François Yergeau. W3C. 26 November 2008. W3C Recommendation. URL: http://www.w3.org/TR/REC-xml/#sec-notation
[RDF11-CONCEPTS]
RDF 1.1 Concepts and Abstract Syntax. Richard Cyganiak; David Wood; Markus Lanthaler. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf11-concepts/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC3987]
Internationalized Resource Identifiers (IRIs). M. Duerst; M. Suignard. IETF. January 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc3987
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174
[SPARQL11-FEDERATED-QUERY]
SPARQL 1.1 Federated Query. Eric Prud'hommeaux; Carlos Buil Aranda. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-federated-query/
[SPARQL11-HTTP-RDF-UPDATE]
SPARQL 1.1 Graph Store HTTP Protocol. Chimezie Ogbuji. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-http-rdf-update/
[SPARQL11-QUERY]
SPARQL 1.1 Query Language. Steven Harris; Andy Seaborne. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-query/
[SPARQL11-SERVICE-DESCRIPTION]
SPARQL 1.1 Service Description. Gregory Williams. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-service-description/
[SPARQL11-UPDATE]
SPARQL 1.1 Update. Paula Gearon; Alexandre Passant; Axel Polleres. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-update/
[TURTLE]
RDF 1.1 Turtle. Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau et al. W3C. 26 November 2008. W3C Recommendation. URL: https://www.w3.org/TR/xml/

C.2 Informative references

[RDF11-PRIMER]
RDF 1.1 Primer. Guus Schreiber; Yves Raimond. W3C. 24 June 2014. W3C Working Group Note. URL: https://www.w3.org/TR/rdf11-primer/
[RDF11-SCHEMA]
RDF Schema 1.1. Dan Brickley; Ramanathan Guha. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-schema/
[RFC8259]
The JavaScript Object Notation (JSON) Data Interchange Format. T. Bray, Ed.. IETF. December 2017. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc8259
[SPARQL11-OVERVIEW]
SPARQL 1.1 Overview. The W3C SPARQL Working Group. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-overview/
[XPATH-FUNCTIONS]
XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition). Ashok Malhotra; Jim Melton; Norman Walsh; Michael Kay. W3C. 14 December 2010. W3C Recommendation. URL: https://www.w3.org/TR/xpath-functions/