Analytics and the Semantic web

November 3, 2011 Comments off

There was a post recently on SmartDataCollective [1] that seemed to miss the point somewhat I thought. IMO, the semantic web on its own is not enough to make the web of things a reality, on the one hand because theres a huge extant body of non-semantic text on the web, and on the other the effort-reward factor for CMS and content authors is still to high – tagging with semantic structured vocabularies is still to hard. So, I am with google – text analytics and collective intelligence has a lot to offer the web of things, and if its used in combination to mine and produce a semantic lense over all that content then things really start to kick off. Watch this space…


DERI (LATC) launch

June 18, 2011 Comments off

Some of the DERI people (and others) involved in LATC have launched to counter the lack of rdfs in – the Microsoft/Google/Yahoo attempt to kickstart some RDFa publishing so their search engines can try and improve result relevancy. Some of the items in are quite simple, but thats probably a good thing : a large term set or number of properties is going to look daunting to anyone interested or someone starting out for the first time – and indeed this is the reason cited that it is not RDF (its microdata). And while I agree with Michael Bergman that it is more than likely another step towards structured/linked/common/open data, adopters urgently need a combination of

  1. Tools (or better still no tools, just an unobtrusive natural way to author microdata or rdfa) and
  2. a Reason to do it – payback
  3. Support in search UIs to specify vocabulary items

I’d like a wordpress plugin for instance, but then I’d need to host an instance myself or find a hoster that allows plugins because doesnt allow it. I’d also like to think that if I placed some RDFa in my blog that it would get higher a ranking in Search results (it should) but this blog is pretty specialised anyway and its not commercially oriented so Im happy enough with keyword based results anyway.

So, I’m not going to be doing it too soon, and thats the problem really. Or is it ? This post isn’t data really, but it does have links and it does talk about concepts, people, technology problems. If I could mark them up with tags and attributes that define what I am talking about then it would mean that I could tell those search engines and crawlers what I am talking about rather than hoping they can work it out from the title, the links I have chosen, then feedback comments and so on. Then people looking for these particular topics could find or stumble upon this post more easily. So, while there is some data here, arguably I don’t see it that way, and even if I think there might be a good Reason to do it, it’s too hard without the Tools

So I wonder finally, if I was to mark up one of these people mentioned in this post with name,address, affiliation,organisation and so on, would the search engine UIs allow me to use this vocabulary directly – I want to find articles about DERI say, would the search drop down prompt me with itemprop="EducationalOrganization" – so that I’d then only get results that have been marked up with this microdata property and not with things that are about the Deri vineyard in wales, punto deri, courtney deri and so on ?

Sindice kinda does this, couldn’t the Goog do it too ??? Or indicate which results are microdata’d, or allow a keyword predicate (like site: say), or allow the results to be filtered (like Search Tools in the left column). The point for me is that is only half or less than half the story – the search engines need to Support the initiative by making it available at query time, and to allow their results to manipulated in terms of microdata/rdfs too. Then I might be more tempted to markup my posts in microdata,rdfs,microformat or whatever, and I might create some extensions to the schemas and contribute a bit more, and my post might get more traffic in the long tail, that traffic would be more valuable, my ad revenue might go up (if I had ads for myself !), and the ECB might drop their interest rates. Well, maybe not, but they’re not listening to anythine else, perhaps some structured data might persuade them. It is the future after all.

Semantic Food & Beverages

February 25, 2011 2 comments

Google do semantic recipe search apparently [1-3] and a university in New York does a Semantic Sommelier [4]. Interesting – yes, but I want more info !

What ontology and what vocabulary is being used – tap[5], pips[9] ? food[7], hrecipe[6] – or other microformats [8] ? How about Umbel or Yago ? Any chance of a link for the explanation of the sommelier app – would love to know more about it and see it, rather than just some PR ? Do they link to other datasets, so that if I pick a wine or a recipe I can find things that go with the flavours and aromas, see photos, maybe learn the history, culture, location and science/tech of the recipe ? Hell, commercialisation here I come, perhaps I want to know what stores in my area have the ingredients, or stock the wine, with other related produce and offers ? And if a celeb chef happens to endorse it, then maybe I’ll go and but their set of cookware for christmas. (Or I’ll go/link to Amazon and get it cheaper, if they use the same vocabulary…)

If I search for Sausages and you have a recipe for “Bangers and Mash” do I get a recipe snippet or result fraction ? If I search for Mash and you have a blog post about creating a web page from lots of parts of other pages, does it show up ? How about if I search for a French Classic recipe – do I not find results for pages that describe the same thing but in English, German or Japanese ?

This semantic web thing needs to get out there, so you can taste it.










Linked Data, OData, GData, DataRSS comparison matrix

February 17, 2011 4 comments

(Update : Its a year since my next-in-line brother died. This post involved a conversation I had with him. I’ve finally, after 5 years, updated it to include an .ODT version of the table below. He’ll probably kick my ass the next time I see him….)

I’m new to OData, having just talked with one of my brothers about it. He’s using it in a large company, but I’m not sure if its an internal tool or for customers. However, having forgotten or never investigated it before because of the lack of Microsoft fanfare, I was struggling to see what the difference between it and Linked Data with RDF is. Much googling and reading [47,48,50] left me with lots of questions, points of view, some pros and cons, and discovery about GData and DataRSS. (I’ve really been sipping the W3C Linked Data Kool Aid too long 🙂 ).

So, I want to create a matrix of criteria that anyone can quickly look at and get salient information about them. (Perhaps I could publish that matrix as linked data sometime…). You’ll understand by now that I haven’t used OData so I’m going on what i read until I install SharePoint somewhere, or whatever else it takes to get a producer running to play with. And with that, I’ve also made the mental jump to Drupal publishing RDFa – can you embed OData in a web page ? Would you want to if you had a CMS where the data was also content ?

I hope to end up with information about LoD, OData, GData, DataRSS (and fix this table’s formatting). Help ! RDF_OData_GData_DataRSS

Criteria RDF…/LinkingOpenData OData GData

Logical Model Graph/EAV.
Technology grounding (esp OWL ) in Description Logic.[12, 13]. “Open
World Assumption” [27]
Graph/EAV. AtomPub
and EDM grounding in entity relationship modelling [11]. “Closed World
Assumption”[28] view (?) but with “OpenTypes” and “Dynamic Properties”
Unclear/Mixed – whatever google logical Model is behind services, but transcoded and exposed as AtomPub/JSON. Data relations and graphs not controllable by API – eg cannot define a link between data elements that doesnt already exist. GData is primarily a client API.
Not mandated, but probably
backed by a triple store and serialised over Http to RDF/XML, Json,
TTL, N3 or other format. RDBMS backing or proxying possible.
not mandated, but probably
backed by existing RDBMS persistence [4 – “Abstract Data Model”], or
more precisely a non-triple store. (I have no evidence to
support this, but the gist of docs and examples suggests it as a
typical use case) and serialised over Http with Atom/JSON
according to Entity Data Model (EDM)[6] and  Conceptual Schema
Definition Language (CSDL)[11]
Google applications and services publishing data in AtomPub/JSON format, with Google Data Namespace[58] elements.
Intent Data syndication
and web level linking : “The goal of
the W3C SWEO Linking Open Data community project is to extend the Web
with a data commons by publishing various open data sets as RDF on the
Web and by setting RDF links between data items from different data
Data publishing
syndication : “There is a vast
amount of data available today and data is now
being collected and stored at a rate never seen before. Much, if
not most, of this data however is locked into specific applications
or formats and difficult to access or to integrate into new
Google cloud data publishing [55] : “The Google Data Protocol provides a secure means for external developers to write new applications that let end users access and update the data stored by many Google products. External developers can use the Google Data Protocol directly, or they can use any of the supported programming languages provided by the client libraries.”
http, content
negotiation, RDF, REST-GET. Sparql 1.1 for update
http, content
negotiation, AtomPub/JSON, REST-GET/PUT/POST/DELETE [9]
Openness/Extensibility Any and all,
create your own ontology/namespace/URIs with RDFS/OWL/SKOS/…, large
opensource tooling & community, multiple serialisation RDF/XML,
Any and all (with
a “legacy” Microsoft base), while reuse Microsoft classes and types,
namespaces (EDM)[6] with Atom/JSON serialisation. Large microsoft
tooling and integration with others following.[7,8]
Google applications and services only.
URI minting,
Create your own
URIs and namespaces following guidelines (“slash vs hash”) [15,16]
Subject, predicate and object URIs must be dereferencible, content
negotiation expected. Separation of concept URI and location URI
Unclear whether
concept URI and Location URI are distinguished in specification –
values can certainly be Location URIs, and IDs can be URIs, but
attribute properties aren’t dereferencible to Location URIs.Well specified URI conventions [21]
Atom namespace.  <link rel=”self” …/> denotes URI of item. ETags also used for versioned updates.  Google Data namespace for content “Kinds”.[59], no dereferencing.
matching, equivalence
External entities
can inherently be directly linked by reference, and equivalence is
possible with owl:sameAs, owl:seeAlso (and other equivalence assertions)
properties link entity elements within a single OData materialisation –
external linkage not possible. Dereferencable attribute properties not
possible but proposed[10].
URIS Not dereferencable, linkage outside of google not possible.
Data Model :
create ontology model of data, concepts and relations. Import and
extend external ontologies. Terminology (T-Box) and Asserted ( A-Box,
vocabulary) both possible with OWL presentation.
EDM defines
creation of entities, types, sets, associations and navigation
properties for data and relations. Primitive types a la XSD types[35].
Seems more akin to capabilities of schema definition than ontology
modelling with OWL. Unclear at this stage whether new entities can be
created, and then reused or existing ones imported and extended.
Reasoning Inferrence or reasoning out of
DL terminology and assertion separation possible. May be handled at
repository level (eg Sesame) or at query time (eg Jena)
service may be able to infer
from derived typing[41]
Declare namespaces
as required when importing public or “well known”
ontologies/vocabularies, creating SPARQL queries, short hand URIs,
create new as required for your own custom classes, instances.
supported in EDM but unclear if possible to create and use namespace,
or if it can be backed with a custome class/property definition
(ontology). $metadata seems to separate logically and physically type
and service metadata from instance data – ie oData doesn’t “eat its own
dog food”.
AtomPub and Google Data namespace only.
Content negotiation Client and server
negotiate content to best determination.[17,18]
Client specifies
or server fails, or default to Atom representation.[19]. Only XML
serialisation for service metadata.[40]. New mime-types introduced.
Use alt query param (accept-header not used)[57]
Query capability Dereferencibility
central principle to linked data, whether in document, local endpoint
or federated. SPARQL [14] query language allows suitably equipped
endpoints to service structured query requests and return serialised
RDF, json, csv, html, …
dereferencible URIs with special $metadata path element allow type
metadata to be retrieved [10]. Running a structured query against an
OData service with something like SPARQL isn’t possible.
Query by author,category,fields.
Derefencable URIs,
well-known/common/upperlevel ontologies/vocabularies.VoID [22,52] can be used to provide extensive metadata for a linked
Service documents
[32] describe types, datasets. Programmatic Mapping to/from RDF
RDF-XML, ttl, n3
well known formats. See also content negotiation.
AtomPub, JSON
outputs. OpenVirtuoso mapping, custom code.
Security, privacy, provenance. No additional
specifications above that supplied in web/http architecture. CORS
becoming popular as access filter method for cross-site syndication
capability at client level. Server side access control. Standards for
Provenance and privacy planned and under development[24]. W3C XG
provenance group[25]
No additional
specifications above that mandated in http/atom/json.[23, 31] CORS use
possible for cross site syndication. Dallas/Azure Datamarket for
“trusted commercial and premium public domain data”.[26]
Http wire protocols, but in addition authentication (OpenID) and authorization are required(OAuth). “ClientLogin” and AuthSub are deprecated. [60]. No provenance handling.
license, sponsorship, governance
W3C Supported
community project [1], after a proposal by TBL[2]. Built up
Architecture of World Wide Web [3]
Microsoft owned
sponsored under “Open Specification Promise”, [4] but brought to W3C
incubator [5]
support, community
w3c docs,
community wikis,forums,blogs,developer groups and libraries.[38] site
with developer docs, and links to articles and videos, mailinglist, msdn
producing, consuming
Many and varied,
open.[1,36,37] et al.
consumers[8], “datamarket”[26], PowerPivot for Excel[49
Other SPARQL update v1.1[42], Semantic
Web Services, [43-46,51-54]
Batch request [20], protocol
versioning [33], Service Operations [30]





























































Google Fund Social Network Semantic Research at DERI

December 16, 2010 Comments off

An announcement today [1] that Google will fund a team led by Dr Alexandre Passant [2] on mobile social network applications. I wonder how much it will be like my own SkyTwenty platform [3] – far from being a social networking or blogging system but a mobile location service, with anonymity, access control and data shrouding. The DERI work is apparently going to build on some Google tech – “PubSubHubbub”[4] – and DERIs own hub based mobile blogging service – SMOB[5] – v2 of which was released in Jan 2010, integrates with Sindice. Distributed storage, Git like ? What about anonymity, provenance, trust, security, and scalabilty ?