Table of Contents
Object Query Language
The Object Query Language (OQL) is similar in spirit to Hibernate's HQL. The goals of the query language are to:
- provide a slightly higher-level, "object" based, query syntax by specifying objects and properties of objects in the query
- provide object support by returning object results
- support polymorphic queries by returning appropriate subclasses
- provide triple-store independence by supporting both iTQL and SPARQL underneath
- easy to learn by looking similar to SQL
Query Structure
The basic structure of a query is as follows:
select <projection-list> from <typed-var-list> [where <condition>] [<order>] [<limit>] [<offset>];
Example:
select a.id, a.author from Article a where a.title = 'Hello Dolly';
The typed-var-list specifies the "root" variables for the query, together with their types (classes). The projections and the condition use these variables to build their expressions.
The projection-list states what to return; the items can be simple variables, expressions based on variables, or even whole sub-queries.
The condition is a Boolean expression to select the desired results. Only four operators (alias assignment, equals, not-equals, and cast) are pre-defined; all others can be defined through functions.
Simple Examples
Select all articles with a given title:
select a from Article a where a.title = 'Hello Dolly';
Select a count of all articles with a given title or author:
select count(a) from Article a where a.title = 'Hello Dolly' or a.author.name = 'Ernest Lehman';
Select all role names together with a list of users in each role:
select role.name, (select u from User u where u = role.user) from Role role;
Query Details
Projections
As stated above, the list of projections determines what exactly to return. This is a comma-separated list of projection items, where each item is a projection expression optionally followed by a variable:
select a.b foo, d bar, c, (select ...), ...
The variable, if present, can be used in order clauses, in subqueries, and to access the item by name in the returned result, i.e. it provides a named handle which can be used elsewhere to refer to that projection expression. As a special case, if no variable is present and the expression itself is just a variable (the "c" in the above example) then a projection variable with the same name is generated for you - this slightly simplifies queries like the following:
select a from Article a order by a;
Each projection expression can be a selector ("a.b.c"), a function call (at the time of this writing only count() is supported), or a subquery.
Subqueries
Subqueries are delimited by parenthesis and contain full OQL queries, as in the example from above:
select role.name, (select u from User u where u = role.user) from Role role;
The subqueries are exactly like regular queries (except for the variable bindings described below), and so for example they can in turn contain subqueries.
The only special thing about subqueries is the presence of additional variable bindings: all variables declared in any "outer" queries (whether from the from clause, projections, or aliases) are accessible from within the subquery (see the "role" variable in the above example). Because variables have global scope, subqueries cannot redefine variables defined in an outer scope.
A subquery is evaluated once for each row returned, with the outer variables fixed for the duration of that evaluation.
Conditions
The where condition is a essentially a Boolean expression, limiting the returned results to only those for which the condition is true. Besides the Boolean operators 'and' and 'or', OQL provides only the two binary operators '=' and '!=' and the special operators ':=' (alias) and 'cast()' - the last two are described below. All other operators are implemented by functions, allowing for the extension of the query language without necessitating syntax changes.
The '=' and '!=' work pretty much as expected. Both sides of the operator must have compatible types, as discussed in Typing - if not, the comparison will inevitably evaluate to false. When comparing an object to another object or a URI the object's ID (URI) is effectively compared (i.e. no comparison of an object's fields is implied). When one side of the operator is a collection then '=' becomes 'contains' and '!=' becomes 'does not contain'; if both sides are collections then '=' becomes 'have at least one common element' and '!=' becomes 'have no common elements'.
The following functions are currently defined:
- lt()
- less-than - arguments may be literals, URI's, or objects (in which case the object's ID is used for comparison)
- gt()
- greater-than - arguments may be literals, URI's, or objects (in which case the object's ID is used for comparison)
- le()
- less-or-equal-than - arguments may be literals, URI's, or objects (in which case the object's ID is used for comparison)
- ge()
- greater-or-equal-than - arguments may be literals, URI's, or objects (in which case the object's ID is used for comparison)
The where clause is optional; if left out no further contraints are generated.
Order
There may be multiple order items; the results will be ordered first by the first order item (in lexicographical order), then ordered by the second order item, etc. Each order item must refer to a projection variable (whether explicit or generated); this implies that you cannot order by something that is not returned as part of the query result. The ordering direction of each item can be controlled by the asc and desc keywords for ascending and descending order, respectively; the default ordering is ascending.
Aliases
Aliases bind a variable to an instance of an expression, and are defined in the where condition using the ':=' operator, as in the following example:
... and name := a.b.name and ...
The thus created variable has global scope, i.e. is visible throughout the whole where condition and in the projection expressions.
Aliases are not macros, though, and do more than just reduce the number of characters in complex queries. As the next example demonstrates, they are necessary in order to express certain constraints. Assume the following classes:
class Person {
Name[] name;
}
class Name {
String first;
String last;
}
I.e. persons with multiple names. Now in order to select a person with the name 'Joe Sutter' we might attempt the following query:
select p from Person p where p.name.first = 'Joe' and p.name.last = 'Sutter';
However, this query will select any person who has one first name of 'Joe' and one last name of 'Sutter'. So a person with the names 'Joe Hatter' and 'Bob Sutter' would also match. The reason for this is that there is nothing that binds the two 'p.name' expressions, i.e they are free to match different names. In order achieve a binding we must use an alias:
select p from Person p where n := p.name and n.first = 'Joe' and n.last = 'Sutter';
This will now give us the desired results.
Even though aliases appear in the where condition they do not participate in the determination of the truth value of the condition. This means they effectively return a value of "true" if part of a conjunction and a value of "false" if part of a disjunction.
Typing
An OQL query is statically typed: while parsing the query the parser assigns each variable, expression, constant, etc a type which may be a URI, an untyped (plain) literal, a typed literal, an object, or a special "unknown" type. These types are used as follows:
- in a selector, at each step the current type is used in order to translate field names into predicate-uri's. E.g. in "a.b.c" the type of "a" is used to look up field "b", and the resulting type is used to look up field "c". If a field is a collection then the resulting type is the collection's component type. An error occurs if the type at any point except at the end is not an object type (i.e. only object types can be dereferenced).
- in projection expressions the resulting type is used to determine what kind of item to return. If the type is an object then an instance of that type is returned; if the type is a URI then a URI is returned; if the type is literal then a literal is returned; and if the type is the "unknown" type then a URI or a literal is returned, depending on value returned from the RDF store.
- while evaluating the where condition, both sides of the '=' and
'!=' operators, as well as the arguments to certain binary
comparison functions (such as 'lt' and 'gt'), must be "compatible",
else a warning is generated. Compatibility is as follows:
- any type is compatible with itself; in the case of typed literals the datatypes must match; in the case of objects any object matches any other object (though this will probably change at some point to require assignment compatibility)
- the "unknown" type is compatible with everything
- a parameter is compatible with either a URI or a literal
- a URI is compatible with any object
- filters are applied wherever a matching type is encountered
Type are assigned as follows:
- variables in the from clause are assigned the associated object type
- variables from aliases are assigned the type of the alias expression
- variables in projections are assigned the type of the associated expression
- predicate expression variables are assigned the type URI
- the result of dereferencing a variable in a selector with a field name ("a.b") is assigned the type of that field, which may be another object type, a URI, or a literal.
- the result of dereferencing a variable in a selector with a predicate ("a.<foo:bar>") is assigned the "unknown" type.
- the result of dereferencing a variable in a selector with a predicate expression ("a.{p -> ...}") is assigned the "unknown" type.
- the result of calling a function in a projection expression or in a selector (i.e. non-Boolean functions) is assigned whatever type that function defines for its return value.
ID Fields
Dereferencing the ID field of an object is treated specially, though in a natural manner: it does not change the current object (i.e. it does not generate any new constraints in the resulting query), but instead just changes the current type from object to URI. This means that the following expressions in the where condition are equivalent, i.e. produce the exact same query (assuming that 'a' is of type A and A's ID field is called 'id'):
- a = b
- a.id = b
However, in a projection expression 'a' and 'a.id' change the type of object being returned: an instance of A in the former case, and a URI in the latter because of the change in the type. Also, in a selector 'a.id.b' is not the same as 'a.b', and in fact is an error (because a URI cannot be dereferenced).
Casts
The cast() operator can be used change the type of an expression, or to assign a type where the type would be "unknown". Currently it can only be used to assign an object type (i.e. it cannot be used to assign something a URI or literal type). The cast() assumes the user knows what he or she is doing and does not attempt to perform any checks. Also, the cast() is something that is entirely removed early in the query parsing and never shows up in the final query.
cast() is particularly useful wherever you want to continue dereferencing but the current type is unknown, for instance after a predicate, as in
cast(a.<foo:bar>, D).e
Another useful case is inside predicate expressions if the predicate is actually of an object type:
{ p -> cast(p, P).name = 'Joe'}
Advanced Features
Because RDF (and the supported query languages) do not in general treat predicates specially, there are a couple features in OQL that are not present in, say, HQL or SQL. The first of these is predicate expressions.
Predicate Expressions
Predicate expressions provide a way to select a list of predicates based on an expression, instead of just (indirectly) specifying the predicate via the field name. Example:
select p from Permissions p where p.{perm -> perm != <topaz:prop>} = 'user1';
This would select all Permissions objects for which at least one predicate other than <topaz:prop> has the value 'user1' (i.e. for which at least one permission has been granted to the given user).
Predicate expressions can be arbitrary Boolean expressions just like the condition (and can include further nested predicate expressions). The '<id> ->' declares the variable <id> to represent the current predicate (the scope of this variable is global and it must be globally unique). The expression part may be left out ({perm ->}) in which case all predicates match.
Note that the variable represents a predicate (i.e. a URI), not a field-name.
Wildcard Projector
The '.*' syntax in projections is used to signify "all predicates". Example:
select p.* from Permissions p where p.id = <resource>;
This would return the URI's of all predicates which had a non-null value (more precisely: for which at least one statement with the predicate exists). I.e. where "select p ..." would return an instance of Permissions, "select p.* ..." will return a list of URI's (the predicates).
Specification
Base Grammar
query = select ";"
select = "select" projectionList "from" classList [ "where" condition ] [ order ] [ limit ] [ offset ]
projectionList = projExpr [ var ] ( "," projExpr [ var ] )*
projExpr = func(projArg) | fieldExpr | subquery
projArg = func(projArg) | fieldExpr | constant
fieldExpr = var ( "." ( field | predicate ) )* [ "." "*" ]
subquery = "(" select ")"
classList = class var ( "," class var )*
condition = disjunction
disjunction = conjunction ( "or" conjunction )*
conjunction = expression ( "and" expression )*
expression = factor
| "(" disjunction ")"
factor = var ":=" ( selector | constant ) // alias
| selector ( "=" | "!=" ) ( selector | constant )
| constant ( "=" | "!=" ) selector
| func(condArg) // boolean functions
condArg = selector | constant
selector = ( cast | var ) ( "." ( field | predicate | predicateExpr ) )*
| func(condArg)
predicateExpr = "{" var "->" [ disjunction ] "}"
cast = "cast" "(" selector "," class ")"
order = "order" "by" orderItem ("," orderItem )*
orderItem = var [ "asc" | "desc" ]
limit = "limit" NUM
offset = "offset" NUM
var = ID
field = ID
predicate = URIREF
class = ID ( "." ID )*
constant = QSTRING [ "^^" URIREF | "@" ID ]
| URIREF
| parameter
parameter = ":" ID
Pseudo Macros
func(arg) (function):
func = fname "(" [ arg ( "," arg )* ] ")"
fname = [ ID ":" ] ID
Tokens
NUM = [ "-" ] ( "0".."9" )+ ID = ( "a".."z" | "A".."Z" ) ( "a".."z" | "A".."Z" | "0".."9" | "_" )* QSTRING = "'" ( "\" "'" | ~"'" )* "'" URIREF = "<" ( ~">" )+ ">"
Notes
- cast() is a separate thing instead of just another function for a
few reasons. They have to do with the fact that I want the query
to be "statically" typed, i.e. that I can figure out the field ->
predicate translation statically. In this light:
- cast(), by not allowing an arbitrary expression for the class argument, allows the implementation currently to do something akin to a simple string-replacement as part of AST processing (the cast is completely gone by the time the query reaches the stores).
- Allowing foo(bar).title requires either functions definitions to be queried for their return type (possibly based on the arguments) or requires multiple queries (first resolve the result of the function call and get its type, then do the field to predicate translation and continue the query).
- there's no way to specify a model anywhere (yet)
- 'order by' only allows variables, not arbitrary expressions, and furthermore those variables must be from those defined in the select clause (projections). To support arbitrary expressions in iTQL we would have to have to add further variables to the select clause, which means the result returned to the user either has more columns than expected, or the result may have duplicates (if we remove the column again internally), or we have to to do explicit duplicate removal ourselves. I'm open to suggestions though.
- I was hoping to avoid the 'foo'^^<xsd:blah> and 'foo'@en stuff, but I don't see how. And neither for the uri (<...>) vs literal ('...'). The reason is when a constant is used a function argument: unless the function statically defines the type of the argument we don't know what's needed, and requiring all functions to statically define their types won't fly (unless maybe we do some sort of generics-like syntax for function type declarations). An example is the case-insensitive compare function: it works for any pair of arguments with the same type, so it doesn't know or restrict the type of the arguments. If we do know the type of the function arguments and return then I think we could use just, say, just '...'@en for all constants since we can figure out the type of the thing they're being compared to, except where the predicate escape mechanism is being used with unknown predicates (var.<foo:bar>).
