StelsXML JDBC Driver v3.0 Documentation

 

 

Contents 


Getting Started

Schema File

Driver Properties

Data Types

Supported SQL Syntax

Connection Example

Handling XML Namespaces

XPath engines

Driver Modes

User-defined SQL functions

Performance and other hints

   

 

Getting Started


To process your XML documents via StelsXML JDBC driver, you should do the following easy steps:

 

Schema File


The schema file is used to define the specifications of tables the driver can later access using SQL queries. Each table is defined with the help of XPath expressions to various elements of an XML document (i.e. tag elements, attributes, etc).

Suppose we have an simple XML document that stores data about the company's employees:

<?xml version="1.0" encoding="UTF-8"?>

<employees>

     <document_name>Employees doc</document_name>

     <employee id="1">

         <first_name>Bill</first_name>

         <last_name>Adams</last_name>

         <age>25</age>

         <hire_date>12-06-1995</hire_date>

         <title>Java programmer</title>

     </employee>

     <employee id="2">

         <first_name>Mary</first_name>

         <last_name>Jones</last_name>

         <age>32</age>

         <hire_date>22-09-2001</hire_date>

         <title>Sales manager</title>

     </employee>

</employees>

 

And we want to generate the "employees" table based on it. In this case the table specification may look as follows:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

 

  <table name="employees" file ="employees.xml" path="/employees/employee" constraint = "PRIMARY KEY(id), UNIQUE (lastname), CHECK age > 18">

  <!-- The attribute 'name' specifies the name of the table-->

  <!-- The attribute 'file' specifies the file path to the XML document. This path may be absolute (e.g.: 'c:/xmlfiles/employees.xml') or relative to the schema file path (e.g.: 'employees.xml' or 'subdir/employees.xml'). Also, you can use the following URLs:

a) HTTP URL: http://www.example.com/rss_feed.xml

b) FTP URL: ftp://user:passw@www.sample.com:21/test/test.xml

c) CLASSPATH URL: classpath://resources/test.xml

d) SERVER PAGE URL: http://www.somesite.com/php/script.php?table=test.xml 

e) ZIP URL: zip://c:/archive.zip/test.xml 

f) SMB/CIFS URL: smb://your_server/your_share/your_folder/test.xml

g) SFTP URL: sftp://login:password@somesite.com:22/test/test.xml

-->

  <!-- The attribute 'path' defines the base path which table entries are based on, i.e. table columns will be based on 'employee' elements.-->

  <!-- the 'constraint' attribute specifies a table constraint like a primary key, unique value, etc -->

 

    <column name="documentname" type="VARCHAR" path="/employees/document_name"/>

   <!-- The attribute 'name' specifies the name of the column-->

   <!-- The attribute 'type' specifies the data type of the column-->

   <!-- The attribute 'path' defines XPath to the XML element the column is based on. In a given case the XPath expression '/employees/document_name' is an absolute path from the root of the XML document-->

 

    <column name="firstname" type="VARCHAR" size="15" path="first_name"/>

   <!-- The value 'first_name' of the 'path' attribute defines the XPath expression which is relative to the base path '/employees/employee/' defined in the 'table' element.  I.e. the full path to this column is '/employees/employee/first_name'-->

    <!-- The 'size' attribute specifies the maximum number of characters for the VARCHAR type or total number of digits that can be stored for the BIGDECIMAL type -->

    <!-- The 'decimalCount' attribute specifies the maximum number of digits that can be stored to the right of the decimal separator for the BIGDECIMAL type -->

 

    <column name="lastname" type="VARCHAR" size="25" path="last_name"/>

 

    <column name="title" type="VARCHAR" size="15 path="title"/>

 

    <column name="id" type="INTEGER" path="@id"/>

   <!-- This 'path' attribute defines the path to the attribute of the XML element-->

 

    <column name="age" type="INTEGER" path="age"/>

 

    <column name="hiredate" type="DATETIME" path="hire_date"/>

  </table> 

 

</schema>

 

Now, after the table has been specified, we can access it using an SQL query (see Connection Example for more details).

 

Additional notes on the schema file:

 

Driver Properties


The driver supports a number of parameters that change default behavior of the driver.

These properties are:

 

charset is used to set the character encoding for output. To specify the input encoding you need to add the encoding declaration to the XML file: <?xml version="1.0" encoding="some_charset"?> (Default charset is "UTF-8")

 

dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd |  HH:mm:ss.SSS |  HH:mm:ss"). For more details about date/time format used in the driver please see chapter "time format syntax" in the java.util.SimpleDateFormat class documentation. 

 

decimalFormatInput,  decimalFormatOutput are used to specify input and output formats for double and float values in a XML file. For instance, you can use these formats to specify currency values, e.g.: "###,###.##$". For more information about patterns used in "decimalFormat" please see the documentation for java.text.DecimalFormat class.

 

dbInMemory, dbPath are used to set a driver mode. For more details see driver modes.

 

ignoreCase. The property sets the case sensitivity for text columns. By default the driver are not case sensitive, i.e. the default value for this property is true.

 

namespaceAware is used to enable namespace support when parsing XML documents. Default is "true". It is not supported for XOM model.

 

namespaces is used to define XML namespaces used in the XML document (see Handling XML Namespaces for more details )

 

 

Advanced Properties:

 

emptyStringAsNull. If 'emptyStringAsNull' is set to 'true', strings containing only whitespaces (i.e. strings like: "", "   ", " \r\n", etc) are treated as NULL values. (By default is true).

 

preSQL is used to specify a path to a SQL script that will be executed after creating a driver connection, e.g. c:/sql_script.txt. It may be useful for creating table indexes and other preparations.

 

trimBlanks. If 'trimBlanks' is set to 'true', the driver trims leading and trailing whitespaces for string values (By default is false).

 

To set the driver properties, you can use one of the following ways:

1)  using Properties class:

java.util.Properties props = new java.util.Properties();
 
props.put("dateFormat", "MM.dd.yyyy");
props.put("namespaceAware", "false");
 
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:schema.xml", props);
 
2) appending the properties to the URL:
Connection conn = 
DriverManager.getConnection(
"jdbc:jstels:xml:schema.xml?dateFormat=MM.dd.yyyy&namespaceAware=false");
 
3) adding the properties as attributes to the <table> tag in the schema file 
(only for local table properties such as "charset", "dateFormat", "namespaces", etc):

<?xml version="1.0" encoding="UTF-8"?>

<schema>

<table name="products" file ="products.xml" path="/products/product" 
charset="ISO-8859-2" dateFormat="dd-MM-yyyy">
	<column name="prodid" type="int" path="prodid"/>
	<column name="description" type="string" path="description"/>
</table>
</schema>

 

 

Data Types


The driver supports the following data types: INTEGER, BIGINT, FLOAT, DOUBLE, VARCHAR, DATETIME, BIGDECIMAL and BOOLEAN. The table below demonstrates the mapping scheme between SlelsXML data types and JDBC data types:

StelsXML data type

 JDBC returned type (java.sql.Types.*)

Java class used in StelsXML

AUTOINCREMENT

java.sql.Types.BIGINT

java.lang.Long

Integer, INT

java.sql.Types.INTEGER

java.lang.Integer

Bigint, LONG

java.sql.Types.BIGINT

java.lang.Long

FLOAT

java.sql.Types.FLOAT

java.lang.Float

Double

java.sql.Types.DOUBLE

java.lang.Double

BIGDECIMAL, DECIMAL, NUMERIC (recommended for storing currency values)

java.sql.Types.NUMERIC

java.math.BigDecimal

STRING, VARCHAR

java.sql.Types.VARCHAR

java.lang.String

DATETIME

java.sql.Types.TIMESTAMP

java.util.Date

BOOLEAN

java.sql.Types.BOOLEAN

java.lang.Boolean

 

Notes:

 

Supported SQL Syntax


StelsXML version 2.x uses H2 database as an SQL engine and supports the most part of ANSI/ISO SQL grammar like SELECT, INSERT, UPDATE and DELETE statements.

 

An SQL query must meet the following conditions:

Examples of SELECT queries:

// ---- SELECT queries ---

SELECT * FROM employees WHERE title = 'Java programmer' ORDER BY last_name

SELECT name FROM salesreps WHERE (rep_office IN (22, 11, 12)) OR (manager IS NULL AND hire_date >= parsedatetime('01-05-2002','dd-MM-yyyy') OR (sales > quota AND NOT sales > 600000.0)

SELECT city, target, sales FROM "offices.xml" WHERE region = 'Eastern' AND sales > target ORDER BY city

// ---- SELECT queries with join ----

SELECT * FROM prices ps JOIN regions regs ON ps.regionid = regs.id JOIN products prod ON prod.prodid = ps.prodid

SELECT * FROM prices ps, products prod WHERE prod.prodid = ps.prodid

// ---- INSERT, UPDATE and DELETE commands ----

INSERT INTO employees (firstname, lastname, title, id, hiredate) VALUES('John', 'Doe', 'Web admin', 3, parsedatetime('07:02:2007','dd:MM:yyyy'))

DELETE FROM employees WHERE lastname LIKE 'Henry%' 

UPDATE "c:/xmlfiles/customers.xml" SET credit_limit = 50000.00 WHERE company = 'Acme Mfg.'

// ---- CREATE INDEX command ----

CREATE INDEX i_1 ON new_table (int_col)

 

 

Also, the driver supports a number of additional commands for processing XML files:

Examples:

SAVE TABLE employees // saves the table "employees" by using a file path and columns XPaths defined in the corresponding specification "employees"

SAVE TABLE employees AS "c:/copy/employees_copy.xml" // saves "employees" to another file "employees_copy.xml" by using column XPaths of the specification "employees"

SAVE TABLE employees AS "employees_html_table.html" USING SPECIFICATION employees_html_table // saves "employees" to another file "employees_html_table.html" by using column XPaths of the specification "employees_html_table"

SAVE (SELECT * FROM employees WHERE title LIKE '%programmer') AS "employees_programmers.xml" USING SPECIFICATION employees // saves the SQL query result to the file "employees_programmers.xml" by using column XPaths of the specification "employees"

CREATE TABLE "html_table.xml" (td1 string, td2 string, td3 int) // creates the table "html_table.xml" with three columns which XPaths are defined in the corresponding specification "html_table.xml".

CREATE TABLE "employees.xml" USING SPECIFICATION // creates the table "employees.xml" with all columns defined in the corresponding specification "employees.xml"

 

See also:

 
Connection Example

This complete example shows how the driver is used. You can download this example with the driver package here.

The driver connection code:

import java.sql.*;

 

public class DriverTest {

 

  public static void main(String[] args) {

    try {

      // load the driver into memory

      Class.forName("jstels.jdbc.xml.XMLDriver2");

 

      // create a connection. The first command line parameter is assumed to

      // be the file path to the schema file

      Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0]);

 

      // create a Statement object to execute the query with

      Statement stmt = conn.createStatement();

 

      // execute a query

      ResultSet rs = stmt.executeQuery("SELECT * FROM employees");

 

      // read data and put it to the console

      for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) {

        System.out.print(rs.getMetaData().getColumnName(j) + "\t");

      }

      System.out.println();

 

      while (rs.next()) {

        for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) {

          System.out.print(rs.getObject(j) + "\t");

        }

        System.out.println();

      }

 

      // close the objects

      rs.close();

      stmt.close();

      conn.close();

    }

    catch (Exception e) {

      e.printStackTrace();

    }

  }

}

 

The specification for the table "employees" used in the SQL query above:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file ="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy">

    <!-- In this example the file "employees.xml" is located in the same directory where the schema is-->

    <!-- The attribute "dateFormat" sets a local date format used in the specified XML file-->

    <column name="documentname" type="string" path="/employees/document_name"/>

    <column name="firstname" type="string" path="first_name"/>

    <column name="lastname" type="string" path="last_name"/>

    <column name="title" type="string" path="title"/>

    <column name="id" type="integer" path="@id"/>

    <column name="age" type="integer" path="age"/>

    <column name="hiredate" type="datetime" path="hire_date"/>

  </table> 

</schema>

 

The XML file "employees.xml":

<?xml version="1.0" encoding="UTF-8"?>

<employees>

     <document_name>Employees doc</document_name>

     <employee id="1">

         <first_name>Bill</first_name>

         <last_name>Adams</last_name>

         <age>25</age>

         <hire_date>12-06-1995</hire_date>

         <title>Java programmer</title>

     </employee>

     <employee id="2">

         <first_name>Mary</first_name>

         <last_name>Jones</last_name>

         <age>32</age>

         <hire_date>22-09-2001</hire_date>

         <title>Sales manager</title>

     </employee>

     <employee id="3">

         <first_name>Dan</first_name>

         <last_name>Roberts</last_name>

         <age>30</age>

         <hire_date>01-11-1999</hire_date>

         <title>Project manager</title>

     </employee>

     <employee id="4">

         <first_name>Paul</first_name>

         <last_name>Crauz</last_name>

         <age>24</age>

         <hire_date>02-05-2002</hire_date>

         <title>C++ programmer</title>

     </employee>

</employees>

 

The result of the test:

DOCUMENTNAME  FIRSTNAME LASTNAME TITLE           ID AGE HIREDATE          

Employees doc Bill      Adams    Java programmer 1  25  1995-06-12 00:00:00.0         

Employees doc Mary      Jones    Sales manager   2  32  2001-09-22 00:00:00.0         

Employees doc Dan       Roberts  Project manager 3  30  1999-11-01 00:00:00.0         

Employees doc Paul      Crauz    C++ programmer  4  24  2002-05-02 00:00:00.0         

 

 

Handling XML Namespaces


By default, XML parsing is namespace aware in the driver. You can change this with the namespaceAware property.

In order to process XML documents that use namespaces, you must specify each namespace using the driver property namespaces. The value format for this property is the following:

prefix1:namespace1|prefix2:namespace2| ...

where namespace is a URI identifier of the namespace used in a XML document and prefix is a prefix identifier for use in XPath expressions in the schema file. The prefix may be arbitrary.

 

Example:

The XML file "employees.xml" that uses the namespace "http://www.example.com/employees" and prefixed XML elements:

<?xml version="1.0" encoding="UTF-8"?>

<empl:employees xmlns:empl="http://www.example.com/employees">
     <empl:employee id="1">

         <empl:first_name>Bill</empl:first_name>

         <empl:last_name>Adams</empl:last_name>

         <empl:age>25</empl:age>

         <empl:hire_date>12-06-1995</empl:hire_date>

         <empl:title>Java programmer</empl:title>

     </empl:employee>

     <empl:employee id="2">

         <empl:first_name>Mary</empl:first_name>

         <empl:last_name>Jones</empl:last_name>

         <empl:age>32</empl:age>

         <empl:hire_date>22-09-2001</empl:hire_date>

         <empl:title>Sales manager</empl:title>

     </empl:employee>

</empl:employees>

 

The table specification for the "employees.xml":

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file="employees.xml" path="/empl:employees/empl:employee" dateFormat="dd-MM-yyyy" namespaces="empl:http://www.example.com/employees">

    <!-- Notice that in a given case all XPath expressions must use the prefix 'empl:' -->

    <column name="firstname" type="string" path="empl:first_name"/>

    <column name="lastname" type="string" path="empl:last_name"/>

    <column name="title" type="string" path="empl:title"/>

    <column name="id" type="integer" path="@id"/>

    <column name="age" type="integer" path="empl:age"/>

    <column name="hiredate" type="datetime" path="empl:hire_date"/>

  </table> 

</schema>

 

 

XPath engines


The driver supports two different XPath engines for reading data from XML.

The first is based on the SAX (Simple API for XML) model to read XML files (further referred to as "SAX"). This engine is used by default. The second is based on XOM (XML Object Model) to read XML files (further referred to as "XOM"). The main advantage of SAX is that it uses the minimum amount of operating memory to evaluate XPath expressions. The drawback is that it does not fully support the XPath syntax. For example, it does not support preceding axes. XOM supports the XPath syntax practically fully, but requires a specific amount of operating memory to process XML. The memory that is used depends on the size of the XML file and therefore Java Virtual Machine must have enough free memory allotted for processing large XML files (use -Xms and -Xmx JVM options).
 

To specify precisely which engine will be applied, use the parameter "readAPI" in the "table" element of the driver schema. It has accordinly two values: "SAX" and "XOM".
Hint: use XOM for complex XPath expressions. For example, if you need to use preceding axes. For simple XPath expressions, use SAX.
 

Let’s look at the following XML structure:

<?xml version="1.0" encoding="UTF-8"?>

<catalogue>

   <author id="1">

      <name>Isaac Asimov</name>

      <books>

           <book>

               <name>The Stars, Like Dust</name>

               <genre>Science fiction</genre>

               <price>44.95</price>

           </book>

           <book>

               <name>Pebble in the Sky</name>

               <genre>Science fiction</genre>

               <price>24.95</price>

           </book>

        </books>

   </author>

   <author id="2">

      <name>Stanislaw Lem</name>

      <books>

           <book>

               <name>Solaris</name>

               <genre>Science fiction</genre>

               <price>39.95</price>

             </book>

           <book>

               <name>Return from the Stars</name>

               <genre>Science fiction</genre>

               <price>34.95</price>

            </book>

      </books>

   </author>

</catalogue>

 

As you can see, the “author” element and the “book” element are connected by relation 1:n. We can use the following table descriptions to extract data for authors and books:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="books" file ="books.xml" path="/catalogue/author/books/book" readAPI="XOM">

    <column name="author_id" path="../../@id" type="INT" />

    <column name="name" path="name" type="VARCHAR" size="50" />

    <column name="genre" path="genre" type="VARCHAR" size="30" />

    <column name="price" path="price" type="BIGDECIMAL" />

  </table> 

  <table name="authors" file ="books.xml" path="/catalogue/author" readAPI="SAX">

    <column name="id" path="@id" type="INT" />

    <column name="name" path="name" type="VARCHAR" size="50" />

  </table> 

</schema>

 

"SELECT * FROM books" gives the following result:

AUTHOR_ID  NAME                  GENRE            PRICE          

1          The Stars, Like Dust  Science fiction  44.95
1          Pebble in the Sky     Science fiction  24.95
2          Solaris               Science fiction  39.95
2          Return from the Stars Science fiction  34.95

 

"SELECT * FROM authors" gives:

AUTHOR_ID  NAME          

1          Isaac Asimov
2          Stanislaw Lem

 

The join of two tables "SELECT a.name as author_name, b.name as book_name, genre, price FROM authors a INNER JOIN books b WHERE a.id = b.author_id" gives:

AUTHOR_NAME   BOOK_NAME             GENRE            PRICE         

Isaac Asimov  The Stars, Like Dust  Science fiction  44.95

Isaac Asimov  Pebble in the Sky     Science fiction  24.95

Stanislaw Lem Solaris               Science fiction  39.95

Stanislaw Lem Return from the Stars Science fiction  34.95

 

As you can see, we used the "id" attribute as a foreign key to connect the tables "author" and "books". When doing this we had to use the preceding axe in the books table. To do this, the "readAPI" attribute was set to "XOM". The authors table is very simple and it is sufficient to use the SAX.

 

 

Driver Modes


The driver works in the following way: it loads data from files to an intermediate database (also referred to as “synchrobase”) that is used for running SQL queries and synchronizing changes between this database and external XML files.
 


The driver can work in the following three modes:

1) With a temporary synchrobase in RAM. The synchrobase is created in RAM and is removed from it after the connection is closed. It is a default mode.

2) With a temporary synchrobase on the hard drive. The synchrobase is created on the HDD every time a connection is opened and deleted after it’s closed. To use this mode, set the driver property dbInMemory to false.

Connection conn = DriverManager.getConnection("jdbc:jstels:xml:c:/xml/schema.xml?dbInMemory=false&tempPath=c:/tempfiles");


3) With a
persistent synchrobase on the hard drive. The synchrobase is created just once and is re-used afterwards. To use this mode set the property dbPath to the file path where is synchrobase will be stored.

Connection conn = DriverManager.getConnection("jdbc:jstels:xml:c:/xml/schema.xml?dbPath=c:/synchrobases/syncro_db_name&tempPath=c:/tempfiles ");


In the first mode, all synchrobase data are stored in the system RAM, which ensures maximum performance. Keep in mind that Java Virtual Machine must have enough free memory allotted for storing large tables (use -Xms and -Xmx JVM options). You can also use the DROP TABLE <table name> FROM CACHE command to force tables to be removed from the cache.
 

In the second case, a temporary synchrobase is created on the hard drive. This mode is used for processing large files, since it uses a minimum of RAM. The synchrobase is deleted after the connection is closed

In the third mode, the synchrobase is created once and is reused afterwards. This is the optimal mode. However, it has a limitation: files must not be modified by an external program, they can be modified only by the driver. Otherwise, it will desynchronize the synchrobase and external files. If a file had been loaded into the synchrobase and was then modified by an external program, use the DROP TABLE <table name> FROM CACHE command to delete the obsolete table from the synchrobase.

There are also some properties for configuring these modes:
 

tempPath - directory where temporary files will be created (by default it is a OS temporary directory, specified by JVM environment variable "java.io.tmpdir"). It is recommended to set your own value.

 

Example:

Properties props = new java.util.Properties();

props.setProperty("dbInMemory", "false");         // switch to the second mode (a temporary synchrobase on the hard drive)
props.setProperty("tempPath", "c:/temp");     
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:c:/xml/schema.xml", props);

 

 

User-defined SQL functions

You can use your own SQL functions in the driver. To use this feature, you should do the following: 

 

1) Create a static method that will act as an SQL function. Mind that:

For example:

package my_pack;

public class MyFuncs{

// user-defined SQL function that formats the given argument into a date/time string with specified format

public static String format_date( java.util.Date d, String format ) {
   
// process the null values

    if (d == null || format == null)
      return null;
    java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat(format);
   
// return a function result with java.lang.String type

    return sdf.format(d);
}
}

 

2) Register the function by executing CREATE ALIAS ... FOR command.

For example:

CREATE ALIAS format_date FOR "my_pack.MyFuncs.format_date"

 

Also, you can use the driver property function:<my_func>.

For example:

Properties props = new java.util.Properties();
props.put("function:formate_date","my_pack.MyFuncs.format_date");
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:c:/xmlfiles/schema.xml", props);
// or append this property to the URL
Connection conn2 = DriverManager.getConnection("jdbc:jstels:xml:c:/xmlfiles/schema.xml"
 + "?function:formate_date=my_pack.MyFuncs.format_date");

 

3) Call the function in an SQL query

For example:

Statement st = conn.createStatement();

st.execute("select format_date(date_column, 'yyyy-MM-dd') from test);

 

 

Performance and other hints

 

 

[HOME]   [TOP]