Thursday, May 03, 2007

An OAI-PMH Profile/Ontology for RDF objects in Jena

Jena is an open source RDF repository with pluggable back end storage modules and a modular front end RDF Query interface. Although the semantic web community has long advocated the use of RDF as both storage and exchange format for next generation of web applications, there are issues to be resolved before RDF could be used for projects such as the peoples network or curriculum online LOM repository. Specifically, there are two issues which I believe to be generic, but which jena helps us see in concrete terms. Firstly, to actually search the underlying structure there is a need to propagate both full text and spatial queries down to the statement store (Before any inferincing takes place, to select candidate statements) and there is a need to be able to access the contents of an RDF repository in a structured way (OAI-PMH). Both these issues don't need to be solved at the same time, if we can get a functional jena based metadata repository for cultural heritage, or learning objects started with a working OAI interface, then we can simply inject those records into a relational database for retrieval, using the base URL as a "Document Key" and retrieving the appropriate documents from the jena repository. There is also research work underway at the OS into accessing spatial indexes via the jena query interface. To my mind, extending jena with lucene indexes, and sidestepping the built in full text capabilities (Now present in all serious relational database systems) is an error beyond what knuth was talking about when he said "Premature optimisation is the root of all evil".

So, all that said, OAI and RDF repositories.. seems like a worthwhile thing to get working, and not too much trouble. Since an object in an RDF graph can be a member of an arbitrary number of classes, it seems to me that we should define a namespace and an ontology for objects to be shared through OAI-PMH that can be used for all RDF repositories. Any repository implementing the RDF QL and supporting the OAI-PMH ontology should be capable of acting as an OAI-PMH data provider. The properties of this class will need to contain (Or allow derivation of), the properties of the OAI Header record.

In theory, the OAI-PMH engine for RDF repositories should work over any RDF repository supporting the standard query mechanisms. It might be that this approach can extend beyond OAI-PMH and into the realms of OAI-ORE.

OAI-PMH Class for RDF should be incredibly simple, and therefore, it's benefit is in getting it defined at a high enough level to promote reuse. First impressions are that it should contain the following properties

Date Added
Date Last Modified
Date Deleted (Some flags to control deletion tracking needed)

The record identifier will be the URI of the base object. Format is determined by the classes of the object itself.

Next step would be to get this class formalised in some kind of Ontology Markup Language then try and build the repo under jena.... more to follow.

1 comment:

Dom said...

Nice post...did you start the writing of the aformentioned vocabulary ?