www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

Runtime Hosting

Runtime Environments
CLR, .Net & ASPX Host
CLR & Mono
Embedded Java VM API
Virtuoso Server Extension Interface (VSEI) (C Interface)
Virtuoso Server Extension Interface (VSEI) SQL Run Time Objects Memory Management Rules Server Main Function Compiling & Linking Functions by Category VSEI Definition SQL Exceptions Executing SQL Adding New Languages And Encodings Into Virtuoso
VSEI Plugins

16.5. Virtuoso Server Extension Interface (VSEI) (C Interface)

16.5.1. Virtuoso Server Extension Interface (VSEI)

The Virtuoso Server Extension Interface allows Virtuoso functionality to be extended by including new functions written in other languages such as C. These new functions are SQL callable.

These functions share the same C prototype and use Virtuoso internal APIs to do the following:

A SQL-callable C function is called a Virtuoso Server Extension (VSE). These are external functions integrated into Virtuoso by linking their executable produced with a Virtuoso server in library format, rather than executable. VSEs were formally known as BIFs, which stood for Built-In Functions. Such functions must be exported using the bif_define() or bif_define_typed() C-functions when initializing the extended Virtuoso server.

These functions will thereafter be invoked on server threads. The functions should be re-entrant and comply to some simple memory management conventions outlined below.

These functions may execute arbitrary C code and call arbitrary APIs, to the extent these are compatible with the host operating system's threading model.

Virtuoso VSEs can be debugged within the normal C debugger by either starting Virtuoso under the debugger in foreground mode or by attaching the debugger to a running process.

Stack consumption should not be excessive: threads normally have 100K of stack on 32 bit platforms. The stack size may however be increased by settings in the virtuoso.ini file.


16.5.2. SQL Run Time Objects

The Virtuoso Server Extension API introduces the following data types:

box: This is a run-time-typed block of memory which represents any SQL data type, e.g. number, string, array etc. Boxes have a type and length that are retrievable at run time and can be allocated, freed and otherwise manipulated by functions appropriate to each type of box. Boxes may form trees through use of heterogeneous arrays but should not form graphs.
query_t: This is a compiled query, corresponding to a SQL statement or procedure compilation. The query_t is made from a SQL string and can thereafter be executed multiple times. This is a read-only object, not affected by execution on any number of threads, analogously to machine code not being affected by being executed.
query_instance_t: This is a structure representing a query execution state. These are created when executing a query_t. This is analogous to a stack frame of a C function. It holds all relevant query state, such as cursor positions, intermediate results, column values etc. This is passed to all VSEs so they can have access to environment information such as current transaction, current client etc. The query instance references the query_t. As a rule, the query instance is specific to a thread. A query instance can be relatively long lived in the case of a cursor, which may live across multiple client-server message exchanges.
state_slot_t: This is a part of query_t that specifies or describes a query time variable, column, parameter, intermediate result etc. This is analogous to an offset in a stack frame, it actually indexes a position inside a query instance. Given the state slot and the instance, it is possible to read or to set a value in the query state. Arguments of VSEs are passed as an array of state. Slots combined with the running query instance give the arguments values, and can be used to set output parameters.
local_cursor_t: When executing a select statement, the local cursor structure is returned for accessing the result set rows. This is always a forward-only cursor. This can be advanced, column values may be accessed and the cursor may be closed.

16.5.3. Memory Management Rules

All state slots in a query have distinct values. With the exception of a reference parameter, no value is referenced twice. All state slot values can therefore be recursively freed independently of each other.

If a VSE returns data, this data must always be new, i.e. allocated inside the VSE and may not be a copy of or include any of the arguments as a substructure. All return values and arguments must be legitimate boxes and may not share a structure.


16.5.4. Server Main Function

The server main function for a customized Virtuoso server has the following format:

static void (*old_ddl_hook) (client_connection_t *cli) = NULL;

static void
ddl_hook (client_connection_t *cli)
{
  if (old_ddl_hook)
    old_ddl_hook (cli);

  /* DDL code (depending on the server being fully initialized
    (ex: create table) ) goes here */
}

static void
init_func (void)
{
  old_ddl_hook = set_ddl_init_hook (ddl_hook);
  /* initialization code (prerequisite for server initialization
     (bif_define, unrelated init code) goes here */
}

int
main (int argc, char *argv[])
{
  VirtuosoServerSetInitHook (init_func);
  return VirtuosoServerMain (argc, argv);
}

There are three phases to custom code initialization:

The init_func() function is called before any server initialization functions are called. This is typically a place for defining new VSEs, allocation of synchronization objects (since the server does not have any threads yet), and/or custom code initialization not related to Virtuoso. Here, the set_ddl_init_hook() should be called also, if it exists, to register the ddl_hook() callback function.

Note:

The old_ddl_hook() mechanism - this allows queuing of the ddl_hooks.

The ddl_hook() function is called during normal startup just before the roll forward, but after the server's internal structure has been initialized. This is typically a place to execute SQL statements to initialize the extension. The variable client_connection_t * is passed to the function to provide the client connection that should be used for SQL execution.

The main() function can call VirtuosoServerSetInitHook() if there is any Virtuoso-related initialization to be performed, and should then call the VirtuosoServerMain() function to start the Virtuoso server. The VirtuosoServerMain() function will return control after the server has been shut down.


16.5.5. Compiling & Linking

The files should be compiled for the multi-threaded environment appropriate to the operating system and should be linked accordingly e.g. -lm, -ldl.

The Virtuoso distribution contains the following libraries/object files:


16.5.6. Functions by Category

16.5.6.1. General Box Functions

The box, usually marked with the caddr_t data type is the basic representation of any SQL data in Virtuoso. All boxes have a run time data type, with a name beginning with DV_. All boxes have a 3 byte run time length which allows for up to 16 MB of contiguous array size in SQL data.

The further interpretation of the content of the box is determined by the type tag. The length is always an exact byte length, although the actual length is rounded up to the next suitably aligned value. The length and tag of a box must never be changed while the box is allocated but the content is freely writable. The tag and length reside immediately under the pointer of the box, so that a box, with the appropriate type cast will pass as a C array or string.

Numbers are generally represented as boxes. There is an exception for small integers, which are always distinguishable from pointers. Thus the range from -10000 to 10000 are not allocated as boxes holding the value but can be passed directly. This is hidden however and the programmer need not be concerned about this except sometimes when debugging.

The byte order in boxes depends on the platform.

The most important types are:


16.5.6.2. Box Functions

box_t dk_alloc_box (uint32 bytes , int tag );

dk_alloc_box() allocates a box of the given size and type. The initial contents are undefined.

int dk_free_box (box_t box );

dk_free_box() frees a box allocated by dk_alloc_box(). The argument may not be any other pointer.

int dk_free_tree (box_t box );

dk_free_tree() is like dk_free_box() but will free recursively, following through DV_ARRAY_OF_POINTER boxes.

uint32 box_length (box_t box2 );
#define box_tag(box) \
	(*((dtp_t *) &(((unsigned char *)(box))[-1])))

These return the length and the tag of a box.

long unbox (box_t n ); box_t box_num (long n ); box_t box_dv_short_string (char *string ); box_t box_double (double d ); box_t box_float (float f );
#define unbox_num(n) unbox(n)
#define unbox_float(f) (*((float *)f))
#define unbox_double(f) (*((double *)f))
#define unbox_string(s) ((char *)s)

The above functions and macros convert between C data types and boxes. box_dv_short_string() takes a char * to any null terminated string and allocates a string box of appropriate size. This itself looks like a null terminated string but has the box header with the run time length and type under the pointer.

box_t box_copy (box_t box );

box_copy() returns an identical size box with the same type and contents.

box_t box_copy_tree (box_t box );

box_copy_tree() performs a recursive copy, traversing DV_ARRAY_OF_POINTER references.

int box_equal (box_t b1 , box_t b2 );

Given two arbitrary boxes, returns true if they are recursively equal.


Box Examples

Below is the code for box_copy_tree:

box_t
box_copy (box_t box)
{
  dtp_t tag;
  uint32 len;
  box_t copy;

  if (!IS_BOX_POINTER (box))
    return box;

  tag = box_tag (box);
  if (box_copier[tag])
    return (box_copier[tag] (box));
  len = box_length (box);
  copy = dk_alloc_box (len, tag);
  memcpy (copy, box, (uint32) len);
  return copy;
}
box_t
box_copy_tree (box_t box)
{
  box_t *copy;
  dtp_t tag;

  if (!IS_BOX_POINTER (box))
    return box;

  tag = box_tag (box);
  copy = (box_t *) box_copy (box);
  if (tag == DV_ARRAY_OF_POINTER || tag == DV_LIST_OF_POINTER)
    {
      uint32 inx, len = BOX_ELEMENTS (box);
      for (inx = 0; inx < len; inx++)
	copy[inx] = box_copy_tree (((box_t *) box)[inx]);
    }

  return (box_t) copy;
}
Note:

The IS_BOX_POINTER check at the start will detect the unboxed, 'bare' small integers which are actually not allocated and can be returned by value. Only then can box_tag be used to find the type.

The DV_TYPE_OF macro should be used instead of box_tag when the type is unknown to avoid de-referencing a small integer.

Also note BOX_ELEMENTS, which is box_length () / sizeof (caddr_t). This is practical for iterating over arrays.

See Also

The VSEI Functions.


16.5.7. VSEI Definition

typedef caddr_t (*bif_t) (caddr_t *qst, caddr_t *error_return, state_slot_t ** arguments);
void bif_define (char *name, bif_t bif);
void bif_define_typed (char * name, bif_t bif, bif_type_t *bt);

These functions associate a function pointer to a VSE name. The typed variant allows associating a value type used when inferring SQL meta-data if the result is returned to a client. The type can be one of the following externs:

If a VSE accesses indexes either by its own internal code or by executing Virtuoso/PL statements, there becomes a potential for deadlocks. To prevent deadlocks, the Virtuoso/PL compiler must be informed of potential index usage inside the VSE. Special deadlock-safe code can be created for its needs. The bif_set_uses_index() function should be used after bif_define() or bif_define_typed() in such cases.

The potential for deadlocking is always present if the VSE executes Virtuoso/PL code or uses XPath/XSLT functions. Other functions of Virtuoso's C interface are deadlock-safe since they perform no database access.


16.5.8. SQL Exceptions

caddr_t srv_make_error (char *code, char *msg);
void sqlr_error (char *code, char *msg,...);
void sqlr_resignal (caddr_t err);

An error object is a three element array of type DV_ARRAY_OF_POINTER, consisting of the number 3, the SQL state and the message. The control flow in case of errors signalled inside VSEs is a longjmp to an outer context, typically that of the calling stored procedure or top level query. The condition is there handled or sent to the next level up, ultimately to the ODBC, JDBC or Web client. Executing a SQL statement inside a VSE always returns and never exits the VSE by longjmp. Thus the VSE gets a first look at all SQL errors caused by statements executed by it.

sqlr_error is the normal function for signaling a SQL state. It takes a 5 character SQL state, a printf format string and optional arguments, a la printf.

sqlr_resignal is used to throw a condition to the next level handler. This is typically done when executing a query which returns an error and the error is sent up to the caller of the VSE.

srv_make_error makes the error structure. The expression

sqlr_resignal (srv_make_error ("12345", "message")); is equivalent to sqlr_error ("12345", "message");

Note

srv_make_error does not take the printf-type arguments.

By convention a NULL pointer indicates no error. sqlr_resignal (NULL) is an error.

The macros:

#define ERR_STATE(err)  (((caddr_t*) err)[1])
#define ERR_MESSAGE(err)  (((caddr_t*) err)[2])

can be used to read an error returned by a statement.


16.5.9. Executing SQL

query_t * sql_compile (char *string2, client_connection_t * cli,
				caddr_t * err, int store_procs);
void qr_free (query_t * qr);

client_connection_t * qi_client (caddr_t * qi);

These functions allow executing SQL from VSEs. First the SQL statement needs to be compiled with sql_compile. The statement may take value parameters and may be a DDL or DML statement, including select, update, procedure call, table creation etc.

The query_t returned can be used multiple times on any number of simultaneous threads. if an application repeatedly performs the same queries the text can be compiled once and reused at infinitum.

qr_free will free a query returned by sql_compile.

Example
{
  caddr_t err = NULL;
  query_t * qr  = sql_compile ("select * from SYS_USERS", qi_client (qst), &err, 0);
  ...
  if (err)
  exit (-1);
  qr_free (qr);
}
caddr_t qr_rec_exec (query_t * qr, client_connection_t * cli,
    local_cursor_t ** lc_ret, query_instance_t * caller, stmt_options_t * opts,
    long n_pars, ...);

Once a query is compiled it can be executed and fetched. This function executes a query in the context of a VSE. The execution is on behalf of the same user and in the same transaction as the VSE. This is only possible in the context of a VSE, not at top level in the main program, for example.

The first argument is the compiled query to execute. The second is the client connection, obtained by qi_client from the qst argument of the VSE. The lc_ret, if non NULL will get a be set to a newly allocated local_cursor_t * that allows fetching rows from the result set. This only applies to a select statement. The caller is the qst argument of the VSE, The opts can be NULL. The n_args is the count of query parameters, 0 if no parameters are passed.

The return value is an error, suitable for sqlr_resignal. A NULL value means success.

It should be double-checked if the query access or potentially may access any tables or indexes. If it may do this, the VSE must be described as deadlock-unsafe by calling bif_set_uses_index() after bif_define() or bif_define_typed(). If qr_rec_exec access any tables or views, and the call of VSE from Virtuoso/PL code is compiled as deadlock-safe, the whole server may be halted.

If parameters are passed, a group of 3 actual parameters follows for each ? in the query being executed. In each such group the first is the name of the parameter, of the form ":n", where n is the position of the parameter, starting at 0, so ":0" corresponds to the 1st ? and ":11" to the 12th. The second in the group of 3 is the value, usually a box pointer. The third is the type, one of QRP_INT, QRP_STRING or QRP_RAW.

QRP_INT means that the value will be converted to a box as by box_num. QRP_STRING means that the value will be converted to a string as by box_dv_short_string. In either case the value is allocated and freed as part of the execution. QRP_RAW means that an arbitrary box is passed as is. If so, this box will be freed in the process and MUST NOT BE REFERENCED AGAIN in the VSE. if the statement is a select, lc_ret should be specified and should be the address of a local_cursor_t * variable, where the cursor can be returned.

long lc_next (local_cursor_t * lc);
caddr_t lc_nth_col (local_cursor_t * lc, int n);
void lc_free (local_cursor_t * lc);

These functions allow reading through a result set. The local_cursor_t * must have come from qr_rec_exec.

lc_next will move the cursor one row forward. The first call after the exec places the cursor on the first row. A 0 return value indicates that the cursor is at end. if 0 is returned at the first call, the result set had zero rows. The data member lc_error may be set and should be checked after calls to this function. See examples. The value will be suitable for sqlr_resignal if copied (box_copy_tree).

The lc_nth_col returns the value of the nth column of the current row. The index is 0 based. The value is an arbitrary box pointer and is READ ONLY, to be copied (box_copy_tree) if the application needs to keep it around. The value will stay readable until the next lc_next or lc_free. Use DV_TYPE_OF et al to determine the type of the value.

lc_free frees the cursor and any resources associated to it. This has no effect on the transaction.

The bif_my_select function returns an array with one element for each row of the SYS_KEYS table. The rows are themselves arrays containing the column values.


16.5.10. Adding New Languages And Encodings Into Virtuoso

There are too many languages to be able to support them all by default so Virtuoso is user extensible in this respect. The built-in 'x-any' language supports most languages to a degree, but it is not the optimum solution for some specific languages or if you want to perform a words' normalization to make text search more effective. To make Virtuoso extensible, language-specific functions are organized into language handlers, and handlers are organized in hierarchical trees. Every handler contains pointers to such functions as "count words in given string", "call given callback once for every word in the string" etc.

XML documents and SQL procedures may identify languages by their names, for example by value of xml:lang attribute, content_language argument of built-in functions, or by __lang option etc... Every language handler defines up to two names of the language it supports, one matching ISO 639 regulations (e.g. 'en'), and one matching RFC 1766 (e.g. 'en-UK'). When Virtuoso finds a match to the language name specified, it searches through the an internal hash-table. If the name is unknown, the 'x-any' handler will be returned as a default.

Custom language handlers should contain a pointer to a more generic handler, e.g. to the handler, Handler may have NULLs stored instead of pointers to required functions, these NULLs will be replaced with pointers to generic handler's functions automatically when the custom handler will be activated.

See Also:

lh_get_handler

lh_load_handler

There are two trees of language handlers in current version of Virtuoso. "Main" tree starts from 'x-any' root and contains handlers of languages used in documents, another tree starts from 'x-ftq-x-any' root contains handlers of Free Text query ('ftq') languages. The difference is in handling of wildcard characters: query string 'hello, wo*ld' consists of two "words", 'hello' and 'wo*ld', and 'x-ftq-x-any' will properly locate them, but 'x-any' handler will report three words -- 'hello', 'wo' and 'ld', because it knows nothing about special meaning of '*' in query strings. That is why every handler may contain a pointer to a handler of its own query language.

In addition to plain language handlers, it's possible to add handlers of "encoded language" They are useful if you have large number of documents in some particular encoding and speed of free text indexing is critical for your applications. While usual handlers deal with Unicode data, and it requires data to be decoded before processing, functions of "encoded language" handler may accept buffers of encoded text, eliminating decoding.

See Also:

elh_get_handler

elh_load_handler

The OpenLink Virtuoso Server distribution contains sources of sample language handler for 'en-UK' language. The difference between 'x-any' handler and this one is in handling of abbreviations and numbers. 'en-UK' handler will read phrase '$3.54 per sq.inch.' as the sequence of words '3.54', 'per' and 'sq.inch', instead of sequence '54', 'per', 'sq' and 'inch' that 'x-any' will read. The generic 'x-any' handler has no specific rules for dealing with the "decimal point" because in many scripts "decimal comma" is used, thus '3.54' will be processed as pair of words '3' and '54', but '3' will be ignored in many cases as noise word due to its 1-character length.

In addition to the language extension interface, Virtuoso provides an eh_load_handler function to add new encodings, but it should be used solely for multi-character encodings which cannot be supported through the usual Virtuoso International Character Support. If an encoding was created by the CHARSET_DEFINE function, Virtuoso can build special lookup tables for very fast text translation from Unicode to the encoding, thus you are not likely to gain in performance by writing your own C code, but some applications will know nothing about your encoding because they will check only the SYS_CHARSETS system table.