Archive

Posts Tagged ‘columnstore’

MonetDB and OpenJena

April 6, 2012 1 comment

MonetDB has been updated recently with a Dec 2011-SP2 release. Having previously tried to integrate it with OpenJena and failed because of the use of multiple inner joins, I was happy to find that the update fixed those problems and allows all the integration/unit-tests to pass.

This means of course that Im going to now have to create a patch to Jena (see Jira issue[1]), and when thats done, you can follow the instructions below to test it out – literally run the unit tests. I have been using Ubuntu 11.10 amd64 for this so the notes below reflect this:

1) Download latest MonetDB and JDBC driver

2) Install as per instructions (default username:monetdb with password:monetdb)

3) In your home dir create a my-farm directory

4) Create an "env.sh" file to house your local settings for PATH etc

export JAVA_HOME=/usr/lib/jvm/java-6-sun
#point this to whereever you have SDB installed
export SDBROOT=${JENA_HOME}/SDB
export PATH=$SDBROOT/bin:$PATH
#point this to whereever you have downloaded the MonetDB JDBC driver
export SDB_JDBC=~/Downloads/monetdb/jdbcclient.jar

5) Create a "monet_h.ttl" assembly file to define a layout2/hash repository

@prefix sdb:     <http://jena.hpl.hp.com/2007/sdb#> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

# MonetDB

<#store> rdf:type sdb:Store ;
sdb:layout     "layout2/hash" ;
sdb:connection <#conn> ;
.

<#conn> rdf:type sdb:SDBConnection ;
sdb:sdbType       "MonetDB" ;    # Needed for JDBC URL
sdb:sdbHost       "localhost" ;
sdb:sdbName       "TEST2H" ;
sdb:driver        "nl.cwi.monetdb.jdbc.MonetDriver" ;
sdb:sdbUser        "monetdb" ;
sdb:sdbPassword        "monetdb" ;
sdb:jdbcURL    "jdbc:monetdb://localhost:50000/TEST2H";
.

6) create a script – "make_db.sh"– to drop,create and initialise the repo – this needs to be used each time you run the sdbtest suite. It will make use of the env.sh and the monet_h.ttl

cd $JENA_HOME
monetdb stop TEST2H
monetdb destroy TEST2H
monetdb create TEST2H
monetdb release TEST2H
. ./env.sh
bin/sdbconfig --sdb monet_h.ttl --create

7) Run the make_db.sh script

8) Check things went ok with

i) mclient -u monetdb -d TEST2H.

ii) \D

You should see a dump of the schema. There should be among other things a prefixes table.

9) Now for the unit tests :

Create a monetdb-hash.ttl file that Jena can use to connect with

@prefix sdb:     <http://jena.hpl.hp.com/2007/sdb#> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type sdb:Store ;
sdb:layout     "layout2" ;
sdb:connection _:c ;
.

_:c rdf:type sdb:SDBConnection ;
sdb:sdbType       "MonetDB" ;    # Needed for JDBC URL
sdb:sdbHost       "localhost" ;
sdb:sdbName       "TEST2H" ;
sdb:driver        "nl.cwi.monetdb.jdbc.MonetDriver" ;
sdb:sdbUser        "monetdb" ;
sdb:sdbPassword        "monetdb" ;
sdb:jdbcURL    "jdbc:monetdb://localhost:50000/TEST2H?debug=true&logfile=monet.debug.log";

10) If in Eclipse, with the SDB source, create a run configuration for sdbtest.

#Main class : sdb.sdbtest
#Arguments: --sdb monetdb-hash.ttl ./testing/manifest-sdb.ttl

11) Run the test suite – all tests should pass.

12) Next : Load some RDF and test performance !……

[1] https://issues.apache.org/jira/browse/JENA-134

Column stores, Hadoop, Semantic web

November 3, 2011 Comments off

Been trying to do some work on Jena, to get some column store support in there. This is all predicated on having a DBC driver to talk to the column store. Some have, some dont, but the ones that do have do seem to have minimal JDBC implementations. Either temp table support isn’t there, or things like batch support are lacking. Still, pursuing this, because the normalized schema (a simple star-ish schema) used by Jena (and Sesame iirc) seems to marry well with some of the optimisation claims the column stores make (retrieval, compressed storage, materialized views). For near read-only semantic knowledge bases, this might make a significant performance boost over row based RDBMS as semantic backends. And hadoop might come in useful here too at load stage, if RDF needs to be ETLd to some kind of loadable format, or to materialize sparql query results to column store accessible external storage. Might being the operative word I think, but there are interesting possibilties.

Categories: technology Tags: , , , ,