 | 
| | |
Reference: The Java OQL to SQL translator Status Introduction Overview Syntax Type and validity checking SQL Generation OQL FAQ Summary
Status The Castor OQL implementation is currently in phase 3 of development.
NOTE: This documentation is not yet finished Introduction This document describes an OQL to SQL translator to be added to the Castor JDO Java object Persistence API. The translator will accept OQL queries passed as strings, and generate a parse tree of the OQL. It will then traverse the tree creating the appropriate SQL. The user will then be able to bind parameters to parameterized queries. Type checking will be performed on the bound parameters. When the user executes the query, the system will submit the query to the SQL database, and then postprocess the SQL resultset to create the appropriate result as a Java Object or literal. The current mapping and persistence packages will be used for metadata and RDBMS communication. Four of the (now defunct) SourceXchange milestones for this project call for java source code. These milestones will be referred to here as phase 1, 2, 3, and 4. There are many possible OQL features that can be supported, but weren't discussed in the proposal or RFP. Many of these are probably unwanted. These additional features are specified as phase 5, which is out of the scope of this SourceXChange project. Overview The parser will construct a parse tree as output from an OQL query string given as input. The OQL syntax is a subset of the syntax described in the ODMG 3.0 specification section 4.12, with some additional constructs. Following is a description of the supported OQL syntax, and its SQL equivalent. Certain features of OQL may not be directly translatable to SQL, but may still be supported, by post processing the query. For example, the first() and last() collection functions supported in OQL are not directly translatable to standard SQL, but a resultset can be post-processed to return the appropriate values. Features requiring post-processing of SQL resultsets will be documented as such below. Currently the OQLQuery checks for correct syntax at the same time as it does type checking and other types of error checking. The new code will involve a multiple pass strategy, with the following passes: - Parse the String query checking for syntax errors, and return a parse tree.
- Traverse the parse tree checking for correct types, valid member and method identifiers, and use of features which are unsupported. This pass may also generate some data necessary for creating the SQL.
- Traverse the tree one final time, creating the equivalent SQL statement to the OQL Query originally passed.
Syntax This section describes the first pass which will be done by the parser. The parser will create a StringTokenizer like this:
StringTokenizer tokenizer
= new StringTokenizer(oql,
"\n\r\t,.()[]+-*/<>=:|$", true); This will create a StringTokenizer with the delimiter characters listed in the second argument, and it will return delimeters as well as tokens. The parser will also create a Vector to be used as a token buffer. As tokens are returned from the StringTokenizer they will be added to the Vector. Older tokens will be removed from the Vector when it reaches a certain size. The Vector will also be modified when the StringTokenizer returns multi character operators as seperate tokens, for example the -> method invocation operator. The parser will consume tokens from the StringTokenizer, generating a ParseTree. Each ParseTree node will have a nodeType corresponding to its symbol in the OQL syntax. After each node is created it will look at the next token and act acordingly, either modifying its properties (i.e. for DISTINCT property of selectExpr), creating a new child node or returning an error. If the error travels up to the root node of the ParseTree, there is a syntax error in the OQL submitted. At the end of this pass, the ParseTree will contain an apropriate representation of the query, which will be analyzed, and used to create SQL. Below is the modified EBNF which will be the Castor OQL syntax.
query ::= selectExpr
| expr
selectExpr ::= select [distinct]
projectionAttributes
fromClause
[whereClause]
[groupClause]
[orderClause]
projectionAttributes ::= projectionList
| *
projectionList ::= projection {, projection }
projection ::= field
| expr [as identifier]
fromClause ::= from iteratorDef {, iteratorDef}
iteratorDef ::= identifier [ [as ] identifier ]
| identifier in identifier
whereClause ::= where expr
groupClause ::= group by fieldList {havingClause}
havingClause ::= having expr
orderClause ::= order by sortCriteria
sortCriteria ::= sortCriterion {, sortCriterion }
sortCriterion ::= expr [ (asc | desc) ]
expr ::= castExpr
castExpr ::= orExpr
| ( type ) castExpr
orExpr ::= andExpr {or andExpr}
andExpr ::= quantifierExpr {and quantifierExpr}
quantifierExpr ::= equalityExpr
| for all inClause : equalityExpr
| exists inClause : equalityExpr
inClause ::= identifier in expr
equalityExpr ::= relationalExpr
{(= | !=)
[compositePredicate] relationalexpr }
| relationalExpr {like relationalExpr}
relationalExpr ::= additiveExpr
{(< | <=
| > | >= )
[ compositePredicate ] additiveExpr }
| additiveExpr between
additiveExpr and additiveExpr
compositePredicate ::= some | any | all
additiveExpr ::= multiplicativeExpr
{+ multiplicativeExpr}
| multiplicativeExpr
{- multiplicativeExpr}
| multiplicativeExpr
{union multiplicativeExpr}
| multiplicativeExpr
{except multiplicativeExpr}
| multiplicativeExpr
{|| multiplicativeExpr}
multiplicativeExpr ::= inExpr {* inExpr}
| inExpr {/ inExpr}
| inExpr {mod inExpr}
| inExpr {intersect inExpr}
inExpr ::= unaryExpr {in unaryExpr}
unaryExpr ::= + unaryExpr
| - unaryExpr
| abs unaryExpr
| not unaryExpr
| postfixExpr
postfixExpr ::= primaryExpr{[ index ]}
| primaryExpr
{(. | ->)identifier[arglist]}
index ::= expr {, expr}
| expr : expr
argList ::= ([ valueList ])
primaryExpr ::= conversionExpr
| collectionExpr
| aggregateExpr
| undefinedExpr
| collectionConstruction
| identifier[ arglist ]
| queryParam
| literal
| ( query )
conversionExpr ::= listtoset( query )
| element( query )
| distinct( query )
| flatten( query )
collectionExpr ::= first( query )
| last( query )
| unique( query )
| exists( query )
aggregateExpr ::= sum( query )
| min( query )
| max( query )
| avg( query )
| count(( query | * ))
undefinedExpr ::= is_undefined( query )
| is_defined( query )
fieldList ::= field {, field}
field ::= identifier: expr
collectionConstruction ::= array([valueList])
| set([valueList])
| bag([valueList])
| list([valueList])
| list(listRange)
valueList ::= expr {, expr}
listRange ::= expr..expr
queryParam ::= $[(type)]longLiteral
type ::= [unsigned] short
| [unsigned] long
| long long
| float
| double
| char
| string
| boolean
| octet
| enum [identifier.]identifier
| date
| time
| interval
| timestamp
| set <type>
| bag <type>
| list <type>
| array <type>
| dictionary <type, type>
| identifier
identifier ::= letter{letter| digit| _}
literal ::= booleanLiteral
| longLiteral
| doubleLiteral
| charLiteral
| stringLiteral
| dateLiteral
| timeLiteral
| timestampLiteral
| nil
| undefined
booleanLiteral ::= true
| false
longLiteral ::= digit{digit}
doubleLiteral ::= digit{digit}.digit{digit}
[(E | e)[+|-]digit{digit}]
charLiteral ::= 'character'
stringLiteral ::= "{character}"
dateLiteral ::= date
'longliteral-longliteral-longliteral'
timeLiteral ::= time
'longliteral:longLiteral:floatLiteral'
timestampLiteral ::= timestamp
'longLiteral-longLiteral-longLiteral
longliteral:longLiteral:floatLiteral'
floatLiteral ::= digit{digit}.digit{digit}
character ::= letter
| digit
| special-character
letter ::= A|B|...|Z|
a|b|...|z
digit ::= 0|1|...|9
special-character ::= ?|_|*|%|\
The following symbols were removed from the standard OQL Syntax for the following reasons: | - | andthen: Cannot be implemented in a single SQL query. | - | orelse: Same as above. | - | import: This is advanced functionality which may be added later. This phase will use the castor mapping mechanism to define the namespace. | - | Defined Queries: This is another feature which can be added later. It is unclear where the queries would be stored, and what their scope would be seeing as how this project is an OQL to SQL translator, and not an ODBMS. | - | iteratorDef was changed so that all instances of expr were replaced by identifier. This means that the from clause can only contain extent names (class names), rather than any expression. This is the most common case and others could create complicated SQL sub-queries or post-processing requirements. | - | objectConstruction and structConstruction were removed. What is the scope of the constructed object or struct, and how is a struct defined in Java? |
The following symbols were added or modified. | - | between added to relationalExpr. | - | Optional type specification added to queryParam. |
The rest of the standard OQL syntax remains unchanged. Certain syntactically correct queries may not be supported in Castor. For example, top level expressions which do not contain a selectExpr anywhere in the query may not be supported. This will be discussed further in the next section. Type and validity checking The first pass over the ParseTree will do type checking, and create some structures used in the SQL generation pass. It will also check whether the identifiers used are valid, and whether the query uses unsupported features. The following table describes each type of node in the ParseTree, and how it will be processed in the first pass. expr | | - | A query whose top level element is an expr, rather than a selectExpr will not be supported within the scope of this project. These queries can either be stated as a selectExpr, like aggregateExpr's, or they would require post-processing of the SQL results, like element()first() and last(). |
| Phase 5 | projectionAttributes | | - | select * will return a Collection of Arrays of Objects. |
| Phase 5 | projectionList | | - | Selecting multiple fields will return a Collection of Arrays of Objects. | - | When there are multiple fields selected, a list of field names and aliases will be kept for checking validity of expr's in the whereClause, groupClause, and orderClause. |
| Phase 5 | projection | | - | Alias identifier will be stored. | - | expr in projection may only be identifier, without an arglist. |
| Phase 1 | projection | | - | expr in projection may only be identifier (with optional argList), aggregateExpr, undefinedExpr, and postfixExpr (for selecting fields and accessors). | - | The subquery in aggregateExpr and undefinedExpr can be identifier (with optional arglist), or postfixExpr for applying these functions to fields and accessors. | - | If an identifier before the . or -> contains an arglist, it will be considered a SQL function, and passed through to the RDBMS. | - | If the postfixExpr contains one of the above operators, the mapping mechanism will be used to determine if the path expression is valid and to generate a list of required join tables, using the manyKey and manyTable from the JDOFieldDescriptor. |
| Phase 2 | fromClause | | - | The class of the extent being selected from will be stored, and ClassDescriptor objects will be instantiated. |
| Phase 1 | whereClause | | - | expr in whereClause may only contain orExpr, andExpr, equalityExpr (without compositePredicate), relationalExpr, additiveExpr (without set operators union and except), multiplicativeExpr (without set operator intersect), unaryExpr, postFixExpr (must be only primaryExpr, no array or property reference or method calls). | - | primaryExpr may only contain identifier (without an argList), literal and queryParam. Identifier will be checked against object name and alias in projectionList. | - | For equalityExpr, relationalExpr, aditiveExpr, multiplicativeExpr, the left side and right side expr's must evaluate to comparable types. | - | For unaryExpr, simple type checking for numerical or character based types will be performed. | - | If the operands for any of the relational, equality, additive, multiplicative, or unary operators is a query parameter, an expected type will be determined. If the parameter included a specified type which is incompatible with the system determined type, an error will be generated. |
| Phase 1 | whereClause | | - | Support for built in OQL functions will be added to the whereClause: is_defined, is_undefined. | - | inExpr will be supported in whereClause. | - | inExpr will only allow collectionConstruction for the right side argument to in. No subQueries will be allowed. |
| Phase 2 | whereClause | | - | identifiers will be able to contain an optional arglist. If the arglist is before a . or -> the identifier will be considered a SQL function and will be passed through to the DBMS. Otherwise, the identifier will be for an accessor method, or a property name. | - | Accessor methods and property references will cause a check through the ClassDescriptor and FieldDescriptors for the object type, and the required join tables. |
| Phase 3 | whereClause | | - | compositePredicate will be suported in equalityExpr. | - | exists(query) will be supported. | - | quantifierExpr will support for all and exists. | - | Subqueries will be supported on the right side of the in operator |
| Phase 4 | groupClause, havingClause | | - | Will identify appropriate fields in SQL schema for each expr. | - | aggregateExpr will be supported. | - | Only expr's which translate to SQL columns whcih are already being selected will be supported. |
| Phase 4 | orderClause | | - | May only contain expr's which translate into SQL columns which are already being selected. |
| Phase 3 | SQL Generation After the first pass, the ParseTree is free of errors, and ready for the SQL generation step. The existing implementation of the OQLParser uses the persistence API for SQL generation. This API lacks the necessary features to generate SQL from any OQL. The SQLEngine class which implements Persistence is used to create a JDBCQueryExpression. The SQL is derived from the finder, which is a JDBCQueryExpression produced by the SQLEngine. The problem is that the SQLEngine only supports single objects. It cannot generate SQL for path expressions like this:
select p.address from Person p This query requires a SQL statement like this:
select address.* from person, address
where person.address_id = address.address_id The buildFinder method should not be used to generate a queryExpression. The SQLEngine should be used to get a ClassDescriptor, and to create a new QueryExpression. The OQLParser should use the methods in the QueryExpression to generate the SQL. The JDBCQueryExpression which is an implementation of QueryExpression is also lacking in necessary features. This class should continue to be used, but the following features will need to be added: - addColumn(String)
- For adding something to select without specifying the tablename, for use with functions (i.e. select count(*))
- addTable(String)
- For when the table has to be added manually.
- addCondition(String)
- Add a condition created outside the class, for nested expressions, and other expressions that are not of the form table.column op table.column.
- setDistinct(boolean)
- Used for select distinct.
- addOrderColumn(String tableName, String columnName, boolean desc)
- Used for order by
- addGroupExpr(String)
- Used for group by
- addHavingExpr(String)
- Used for having.
The following table lists each type of tree node, and how it will be processed in the SQL generation pass. selectExpr | | - | distinct in the selectExpr will result in a call to setDistinct(true) in the queryExpr. |
| Phase 2 | projection | | - | The queryExpr will be populated with the columns and tables necessary to retrieve the object. This will use code similar to SQLEngine.addLoadSql(...). |
| Phase 1 | projection | | - | aggregateExpr and SQL functions will be passed to addColumn. | - | undefinedExpr will be translated to is null and is not null | - | postfixExpr (for selecting fields and accessors) will result in a different group of select expressions and "from tables" being generated. |
| Phase 2 | whereClause | | - | Entire expr in where clause will be translated, and then added to the QueryExpr, using a single call to addCondition(String), and multiple calls to addTable(String). |
| Phase 1 | whereClause | | - | is_defined() will translate into is not null and is_undefined() will translate into is null. | - | inExpr will translate directly, with the collectionConstruction removed. |
| Phase 2 | whereClause | | - | compositePredicate and exists(query) translate directly to SQL. | - | For quantifierExpr, exists will translate into an exists() SQL subquery. for all will translate into the contrapositive(?) exists query, for example:
for all x in teachers:
x.name = 'Nis' translates to:
not exists (select * from teachers
where name != 'Nis') |
| Phase 4 | OQL FAQ Please see the OQL section of the JDO FAQ. Summary The Parser and ParseTree classes will be improved through the phases of this project. The top level of OQL to SQL translation will look very simple, like this:
OQLParser parser = new OQLParser(query);
ParseTree pt = parser.getParseTree();
pt.checkPass();
//the SQL generation pass
_expr = pt.getQueryExpr();
These methods will have some additional parameters passed for storing and retrieving data relevant to the query. Following is a table containing a list of what will be introduced in each coding phase of the project. Phase 1 | | - | New parser structure which generates symbol tree | - | Parser still supports only limited OQL | - | selected field aliases | - | whereClause supports or, and, equality, additive, multiplicative, and unary Operators. | - | support for specifying parameter types | - | check specified parameter type against system determined type | - | specifying ordered parameters. | - | JDBCQueryExpression must support addCondition(String condition) |
| Phase 2 | | - | Distinct keyword will be supported in selectExpr | - | aggregateExpr and undefinedExpr supported in projection (Select statement) | - | isDefined, isUndefined in whereClause | - | inExpr in whereClause | - | type checking/conversion in bind() | - | fields and accessors in the projection. Mapping mechanism may need some additional features. | - | SQL functions in the projection | - | order by |
| Phase 3 | | - | fields and accessors in whereClause | - | SQL functions in the where clause |
| Phase 4 | | - | sub queries | - | exists() | - | compositePredicate | - | quantifierExpr's: for all and exists | - | group by | - | having |
| Phase 5: | | - | expr as top level symbol | - | queries selecting multiple fields or as comma separated list or * |
| |