{"id":159,"date":"2011-07-25T11:56:15","date_gmt":"2011-07-25T11:56:15","guid":{"rendered":"http:\/\/blogs.sussex.ac.uk\/salda\/?p=159"},"modified":"2013-07-29T09:02:54","modified_gmt":"2013-07-29T09:02:54","slug":"converting-ead-data-to-rdf-linked-data","status":"publish","type":"post","link":"https:\/\/blogs.sussex.ac.uk\/salda\/2011\/07\/25\/converting-ead-data-to-rdf-linked-data\/","title":{"rendered":"Converting EAD data to RDF Linked Data"},"content":{"rendered":"<p>In my last blog post I discussed how to setup our server to handle the URIs being created within our Linked Data, and said the next step was for us to turn our EAD\/XML data from Calm in to RDF\/XML Linked Data.<\/p>\n<p>This is a big step, until now our process looked something like this: Export EAD data -&gt; send it to someone else -&gt; Magic -&gt; Linked Data!<\/p>\n<p>Pete Johnston provided us with details of the magic part. In essence much of the complexity is hidden in an XSLT script (XSLT is a language to process XML in to different schemas, such as here, or in to HTML and other formats). He&#8217;s <a href=\"http:\/\/blogs.sussex.ac.uk\/salda\/2011\/05\/16\/the-data-transformation\/\">blogged<\/a> about some of the\u00a0decisions\u00a0and concepts that have gone in to it. However, here, we can treat it like a black box. It&#8217;s still magic, but we know how to use it.<\/p>\n<p><strong>Converting EAD to XSLT using XSLT and Saxon<\/strong><\/p>\n<p>We use the <a href=\"http:\/\/saxon.sourceforge.net\/\">Saxon<\/a> HE XSLT (Java) version to the do transformation. It&#8217;s simple to download and setup. The basic core step is very simple: run Saxon, passing it the name of the EAD\/XML file and the XSLT file. An example command line looks like this:<\/p>\n<pre>java -jar 'saxon9he.jar' -s:ead\/ -xsl:xslt\/ead2rdf.xsl -o:rdf\/ root=http:\/\/data.lib.sussex.ac.uk\/archive\/<\/pre>\n<p>And there you have it, your EAD data is now RDF!<\/p>\n<p>Before the data is loaded in to the Talis Platform store, there&#8217;s a couple more things we do.<\/p>\n<p><strong>Triples and Turtle<\/strong><\/p>\n<p><strong><a href=\"http:\/\/blogs.sussex.ac.uk\/salda\/files\/2011\/07\/turtle-tripel.jpg\"><img loading=\"lazy\" class=\"alignright size-full wp-image-160\" title=\"turtle &amp; tripel\" src=\"http:\/\/blogs.sussex.ac.uk\/salda\/files\/2011\/07\/turtle-tripel.jpg\" alt=\"\" width=\"300\" height=\"270\" \/><\/a><\/strong><\/p>\n<p>The first is the conversion of the RDF\/XML in to the alternative RDF format\u00a0<a href=\"http:\/\/en.wikipedia.org\/wiki\/N-Triples\">N-Triples<\/a> (and also <a href=\"http:\/\/en.wikipedia.org\/wiki\/Turtle_(syntax)\">Turtle<\/a>) using the Raptor RDF parser.<\/p>\n<p>RDF can be written and presented in a number of ways. Probably the most common method is using XML, partly due to the <a href=\"http:\/\/en.wikipedia.org\/wiki\/RAS_syndrome\">XML language<\/a> being\u00a0so\u00a0ubiquitous, however it is very verbose and can be difficult to read by us humans.<\/p>\n<p>Not only is N-Triples considered easier to read. but each line contains a fully complete and self-contained Triple (a Triple contains a subject, predicate and object, mostly expressed as URIs). While it isn&#8217;t too much of an issue here, this allows us to split up the data in to smaller chunks\/files which can be POSTED to the Talis Platform.<\/p>\n<p><strong>Talis Platform<\/strong><\/p>\n<p>The Talis Platform is a well established Triple Store (think of a SQL database but with three part triples rather than records and tables). While you can run your own Triple Store using software such as ARC2, the Talis Platform provides a stable, robust and quick solution.<\/p>\n<p>You interact with the Platform with standard HTTP Requests; GET, POST, DELETE etc. However for\u00a0simplicity\u00a0an\u00a0interactive\u00a0command prompt front end has been developed in Python called <a href=\"http:\/\/code.google.com\/p\/pynappl\/wiki\/tshell\">Pynappl<\/a>. This allows you to simply specify the store you wish to work with, authenticate, and then use commands such as &#8216;store filename.rdf&#8217; to upload data.<\/p>\n<p>A simple script can upload our data to the Platform, uploading each n-triple file created above.<\/p>\n<p>The final step is to try our the <strong>Sparql <\/strong>interface at:<\/p>\n<p><a href=\"http:\/\/api.talis.com\/stores\/massobservation\/services\/sparql\">http:\/\/api.talis.com\/stores\/massobservation\/services\/sparql<\/a><\/p>\n<p>Here&#8217;s one to try:<\/p>\n<pre>SELECT * WHERE {\r\n?a ?b &lt;http:\/\/data.lib.sussex.ac.uk\/archive\/id\/concept\/moa\/religion&gt;\r\n}<\/pre>\n<h3>Summary<\/h3>\n<p>To take our EAD from Calm and turn it in to Linked Data we used a XSLT script written by Pete Johnston, used Saxon to transform the EAD\/XML in to RDF\/XML using the XSLT script. Then we converted the RDF\/XML to RDF\/N-Triples using Raptor. And finally we used Pynappl to upload this to the Talis Platform.<\/p>\n<p>The XSLT scripts mentioned here can be found at:<\/p>\n<p><a href=\"http:\/\/data.lib.sussex.ac.uk\/files\/massobservation\/xslt\/\">http:\/\/data.lib.sussex.ac.uk\/files\/massobservation\/xslt\/<\/a><\/p>\n<p>The RDF Linked Data is available for download, in addition to the SPARQL interface above:<\/p>\n<p><a href=\"http:\/\/data.lib.sussex.ac.uk\/files\/massobservation\/rdf\/\">http:\/\/data.lib.sussex.ac.uk\/files\/massobservation\/rdf\/<\/a><\/p>\n<p><em>My Thanks to Pete Johnston of Eduserv for providing the process (with documentation) described above.<\/em><\/p>\n<p>This page has been translated into <a href=\"http:\/\/www.webhostinghub.com\/support\/es\/misc\/convirtiendo-la-data-ead\" target=\"_blank\">Spanish<\/a> by Maria Ramos from <a href=\"http:\/\/www.webhostinghub.com\/support\/edu\" target=\"_blank\">http:\/\/www.webhostinghub.com\/support\/edu<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my last blog post I discussed how to setup our server to handle the URIs being created within our Linked Data, and said the next step was for us to turn our EAD\/XML data from Calm in to RDF\/XML Linked Data. This is a big step, until now our process looked something like this: [&hellip;]<\/p>\n","protected":false},"author":15,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[126],"tags":[108,101,132],"_links":{"self":[{"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/posts\/159"}],"collection":[{"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/comments?post=159"}],"version-history":[{"count":19,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/posts\/159\/revisions"}],"predecessor-version":[{"id":175,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/posts\/159\/revisions\/175"}],"wp:attachment":[{"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/media?parent=159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/categories?post=159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sussex.ac.uk\/salda\/wp-json\/wp\/v2\/tags?post=159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}