Converting EAD data to RDF Linked Data

In my last blog post I discussed how to setup our server to handle the URIs being created within our Linked Data, and said the next step was for us to turn our EAD/XML data from Calm in to RDF/XML Linked Data.

This is a big step, until now our process looked something like this: Export EAD data -> send it to someone else -> Magic -> Linked Data!

Pete Johnston provided us with details of the magic part. In essence much of the complexity is hidden in an XSLT script (XSLT is a language to process XML in to different schemas, such as here, or in to HTML and other formats). He’s blogged about some of the decisions and concepts that have gone in to it. However, here, we can treat it like a black box. It’s still magic, but we know how to use it.

Converting EAD to XSLT using XSLT and Saxon

We use the Saxon HE XSLT (Java) version to the do transformation. It’s simple to download and setup. The basic core step is very simple: run Saxon, passing it the name of the EAD/XML file and the XSLT file. An example command line looks like this:

java -jar 'saxon9he.jar' -s:ead/ -xsl:xslt/ead2rdf.xsl -o:rdf/ root=http://data.lib.sussex.ac.uk/archive/

And there you have it, your EAD data is now RDF!

Before the data is loaded in to the Talis Platform store, there’s a couple more things we do.

Triples and Turtle

The first is the conversion of the RDF/XML in to the alternative RDF format N-Triples (and also Turtle) using the Raptor RDF parser.

RDF can be written and presented in a number of ways. Probably the most common method is using XML, partly due to the XML language being so ubiquitous, however it is very verbose and can be difficult to read by us humans.

Not only is N-Triples considered easier to read. but each line contains a fully complete and self-contained Triple (a Triple contains a subject, predicate and object, mostly expressed as URIs). While it isn’t too much of an issue here, this allows us to split up the data in to smaller chunks/files which can be POSTED to the Talis Platform.

Talis Platform

The Talis Platform is a well established Triple Store (think of a SQL database but with three part triples rather than records and tables). While you can run your own Triple Store using software such as ARC2, the Talis Platform provides a stable, robust and quick solution.

You interact with the Platform with standard HTTP Requests; GET, POST, DELETE etc. However for simplicity an interactive command prompt front end has been developed in Python called Pynappl. This allows you to simply specify the store you wish to work with, authenticate, and then use commands such as ‘store filename.rdf’ to upload data.

A simple script can upload our data to the Platform, uploading each n-triple file created above.

The final step is to try our the Sparql interface at:

http://api.talis.com/stores/massobservation/services/sparql

Here’s one to try:

SELECT * WHERE {
?a ?b <http://data.lib.sussex.ac.uk/archive/id/concept/moa/religion>
}

Summary

To take our EAD from Calm and turn it in to Linked Data we used a XSLT script written by Pete Johnston, used Saxon to transform the EAD/XML in to RDF/XML using the XSLT script. Then we converted the RDF/XML to RDF/N-Triples using Raptor. And finally we used Pynappl to upload this to the Talis Platform.

The XSLT scripts mentioned here can be found at:

http://data.lib.sussex.ac.uk/files/massobservation/xslt/

The RDF Linked Data is available for download, in addition to the SPARQL interface above:

http://data.lib.sussex.ac.uk/files/massobservation/rdf/

My Thanks to Pete Johnston of Eduserv for providing the process (with documentation) described above.

This page has been translated into Spanish by Maria Ramos from http://www.webhostinghub.com/support/edu

Tags: , ,

Comments are closed.