Setting up our URIs and the Talis Platform

Time to set up our URIs and upload our data to one of our Talis Platform stores.

In a previous post we discussed which URIs to use. We settled on http://data.lib.sussex.ac.uk/archive/ – we felt this should be stable, and allow for integration with other Special Collection records in the future (while not conflicting with other Library data).

We now needed those URIs to do something, at the moment they all just returned a 404 message (albeit a 404 message with a Rick  roll link).

As so often the case in this project this is where Pete Johnston came in. He had already set up the required code on his test server, and similar things had been put in place for the LOCAH project.

In total, all that is required is a few php/html files and a .htaccess file to handle rewrites (i.e. taking a URI and calling the script in question with the righthand bit of the URI as a parameter). The main script is an index.php file which on our server lives at www/data/archive/doc (which corresponds to http://data.lib.sussex.ac/archive/doc/).

Along with these files were a few dependencies, PHP libraries: paget, moriarty and ARC.

However this code needs to access data from somewhere, and to do this we need to put our data in to our new shiny Talis Platform store…

Talis Platform

The second part of this work was to upload our data to the Talis Platform. Talis had kindly created to stores for us: massobservation and massobservation-dev1, as part of their Connect Commons scheme.

Pete ran a set of scripts he had developed to upload our data to the dev1 store. We’re currently installing these on our own server so we can do this ourselves, and we’ll report more on them soon.

So that was that, without much fuss, we now had our data in our publicly available, Sparql query-able, RDF store. There probably should have been champagne.

Back to our server

So with our data now in a RDF store, libraries installed on server, files copied and in place, config edited to point to our store, it was time to point a browser at one our URIs and start debugging the first error message (which once resolved will lead to the next error message, and so forth). But… for the first time in my life, it just worked. This never happens. It left me confused, I had set aside hours of my diary for endless frustrations and here it was working. I felt cheated. But, once over the shock, I (and you too) could visit examples such as this: http://data.lib.sussex.ac.uk/archive/doc/archivalresource/gb181SxMOA1 (RDF/XML, JSON, Turtle)

Look! It’s our data… as Linked Data… live on the internet!

data.lib.sussex.ac.uk

I would like to see data.lib.sussex.ac.uk become more than just the Mass Observation Archive, and with that in mind I created a front end for the top level URL: http://data.lib.sussex.ac.uk/

This uses WordPress as the CMS (life’s to short to code the html/css files by hand).

For those interested, the .htaccess mod_rewrite looks like this:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^archive/id/(.*)$ /archive/doc/$1 [R=303,L]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

The rule for the URIs is at the top (simply redirecting archive/id/* to archive/doc/*) if this rule is ‘matched’ then processing ends and the rest of the rules are ingnored ( [L] ), otherwise process the standard WordPress rules.

Next steps… (for this strand of work)

Install scripts on our server so that we can:

  • take a file of EAD date from Calm and transform in to a file of RDF/XML
  • convert this to a set of N-Triples files (which are easier to upload to the Platform store as each statement/triple (or if you prefer, fieldname and field value from a record) is complete and able to standalone, so the data can be uploaded in stages without complications.
  • Upload the files to the store.

Comments are closed.