Java Semantic & Linked Open Data webapps – Part 5.1
How to Architect ?
Well – what before how – this is firstly about requirements, and then about treatment
Linked Open Data app
Create a semantic repository for a read only dataset with a sparql endpoint for the linked open data web. Create a web application with Ajax and html (no server side code) that makes use of this data and demonstrates linkage to other datasets. Integrate free text search and query capability. Generate a data driven UI from ontology if possible.
So – a fairly tall order : in summary
- define ontology
- extract entites from digital text and transform to rdf defined by ontology
- create an RDF dataset and host in a repository.
- provide a sparql endpoint
- create a URI namespace and resolution capability. ensure persistence and decoupling of possible
- provide content negotiation for human and machine addressing
- create a UI with client side code only
- create a text index for keyword search and possibly faceted search, and integrate into the UI alongside query driven interfaces
- link to other datasets – geonames, dbpedia, any others meaningful – demonstrate promise and capability of linkage
- build an ontology driven UI so that a human can navigate data, with appropriate display based on type, and appropriate form to drive exploration
Here’s what we end up
- UserAgent – a browser navigates to Lewis TDI homepage – http://uoccou.endofinternet.net:8080/resources/sparql – and
- purl.org redirects to dynamic dns which resolves to hosted application – on EC2, or during development to some other server. This means we have permanent URIs with flexible hosting locations, at the expense of some network round trips – YMMV.
- dyndns calls EC2 where a 303 filter intersects to resolve to either a sparql (6) call for html, json or rdf. pluggable logic for different URIs and/or accept headers means this can be a select, describe, or construct.
- Joseki as a sparql endpoint provides RDF query processing with extensions for freetext search, aggregates, federation, inferencing
- TDB provides single semantic repository instance (java, persistent, memory mapped) addressable by joseki. For failover or horizontal scaling with multiple sparql endpoints SDB should probably be used. For vertical scaling at TDB – get a bigger machine ! Consider other repository options where physical partitioning, failover/resilience or concurrent webapp instance access required (ie if youre building a webapp connected to a repository by code rather than a web page that makes use of a sparql endpoint).
Next article will provide similar description or architecture used for the Java web application with code that is directly connected to a repository rather than one that talks to a sparql endpoint.