Posts Tagged ‘URI’

Linking to LOCAH

Wednesday, October 12th, 2011

As readers of this blog will know, we followed closely in the footsteps of the LOCAH project and we are now linked to the Archives Hub dataset. Which is nice.

See http://data.lib.sussex.ac.uk/archive/doc/concept/moa/advertising as an example

Our other  external links are:

-DBpedia (for some places, people & organisations) e.g. http://data.lib.sussex.ac.uk/archive/id/organization/moa/communistpartyofgreatbritain

– Geonames (for some places) e.g. http://data.lib.sussex.ac.uk/archive/id/place/moa/blackpool

– LCSH (for some concepts) e.g. http://data.lib.sussex.ac.uk/archive/id/concept/moa/conscientiousobjectors

– VIAF (for some people) e.g. http://data.lib.sussex.ac.uk/archive/id/person/nra/churchillsirwinstonleonardspencer1874-1965knightprimeministerandhistorian

URIs. A decision

Tuesday, May 10th, 2011

The project team at Sussex (Jane Harvell, Fiona Courage, Chris Keene and myself) met for an hour yesterday to decide about the URI stem for our data.

We took 20 minutes. Did we make a hasty decision? No. Did we make a considered, long term, looking to the future sort of decision? Yes. The combined expertise round the table was very useful; Jane is very library and looks to the digital future, Fiona is very involved with the Keep and our identity when we are there, Chris knows about servers and how that bit works. We considered the comments from Rob Styles and the advice from Pete Johnston at Eduserv and we decided on;

data.lib.sussex.ac.uk/archive

We wanted something that could work with other archive collections (if we decide to make them into linked data) so a Mass Observation or Massobs stem was too exclusive. We also wanted to avoid creating lots and lots of URIs for the same thing in the future so a generic stem seemed the way to go.

Musings about URIs

Wednesday, April 20th, 2011

Choosing a base for our URIs. Easy right? The task was recently allocated to me. Should take all of 5 minutes and then I can sit back and sip my coffee at job well done. Simples.

Annoyingly, not quite yet.

First thing: The URIs will resolve to an actual web server. We’ve got loads of servers, hostnames and aliases (cnames) but which to use? We need a server and hostname that will be stable and permanent. In this rapidly changing world, changing services, and consolidation of servers (and a move towards that cloud stuff hosted services) what’s best to use?

Two potential base URI options:

  • data.lib.sussex.ac.uk
  • www.sussex.ac.uk/library/

The former was my immediate first choice, it fits in with the common naming practice ‘data.organisation.tld’ (admittedly with ‘lib’ in the middle, I don’t think we are ready to roll out an institutional data service just yet).

The latter was a consideration as it built on an already known and trusted URI on a institutionally embedded service: our University website (and corresponding infrastructure). Both URL and service are going to be around for the foreseeable future. What’s more they don’t require the Library to maintain any additional infrastructure. However, this didn’t fit in with the common convention used, might clash with other Library URLs. And there’s a risk: If the University moved to a new Content Management System it might break our URIs, especially if the CMS required full control of the ‘www.sussex.ac.uk’ namespace. Plus,  it just doesn’t look cool.

So currently thinking is http://data.lib.sussex.ac.uk/ – We can run it off a server here in the Library, which runs Apache and a number of other undemanding web services (wikis etc). This does require the Library to maintain it, which to be blunt, might be an issue if I leave. But there is nothing stopping us working with our IT Services and moving data.lib.sussex.ac.uk to a centrally run (or even third party hosted) server in the future.

Second issue…

Do we need a to create a ‘Mass Observation’ name space under http://data.lib.sussex.ac.uk/ e.g. http://data.lib.sussex.ac.uk/moa/ ?

In a nutshell, keeping it as http://data.lib.sussex.ac.uk/ keeps a simple URI, and allows us to merge in other datasets in to the same ‘pool’ (I don’t think pool is part of the Linked Data vocabulary but never mind).

However the risk is that should we wish to create more Linked Data sets in the future, whether for the Library Catalogue, the Institutional Repository or other Special Collections, how can we be sure the various identifiers, names and reference numbers will not clash between the different datasets? Will a Library Catalogue and Archive metadata be strange bedfellows?

I’ve been discussing this with Pete Johnston from Eduserv who has provided a lot of advice and things to consider. An example which came up in our discussions was:

http://data.lib.sussex.ac.uk/id/person/nra/churchillsirwinstonleonardspencer1874-1965knightprimeministerandhistorian

Would it not be desirable to have the above as a URI which could provide a description (of Winston Churchill) and links to both the MOA, other archives and mentions in the Library Catalogue, all from one URI ID?

My ignorance in this area is high, but my understanding is that these URIs will probably serve up information from elsewhere (i.e. our hopefully soon to be Talis Platform Store) and present it, will having one name space confuse things, as the URI will need fetch data from potentially various stores and sources to present to the requester (human, computer or otherwise).

Perhaps another option is to keep to one namespace, but separate it out in to collections further ‘down’ e.g.

http://data.lib.sussex.ac.uk/id/document/moa/1234

http://data.lib.sussex.ac.uk/id/document/anotherarchive/1234

Is that an option, or will it break conventions?

I should say at this point that we are using Designing URI Sets for the UK Public Sector as a guide for creating URIs and are trying to stick to their guidelines as much as possible.

However: I am torn between the grand, right(?), more technical nirvana of one name space. And the less risky approach of keeping the MOA data in it’s own name space silo (you can’t have a open data blog post without the s word).

The problem ultimately is that I am still so very new to this I find it hard to think about what the issues may be, or what the right and wrong approaches are.

So, I welcome your expertise, thoughts and insights. What would you recommend? What are we not thinking about that we should? What problems are we making for ourselves down the line? What is the right approach? And (building on that last question) what is the right approach considering our somewhat limited resources and time?

So ladies and gentlemen, you thoughts please? Please.