Choosing a base for our URIs. Easy right? The task was recently allocated to me. Should take all of 5 minutes and then I can sit back and sip my coffee at job well done. Simples.
Annoyingly, not quite yet.
First thing: The URIs will resolve to an actual web server. We’ve got loads of servers, hostnames and aliases (cnames) but which to use? We need a server and hostname that will be stable and permanent. In this rapidly changing world, changing services, and consolidation of servers (and a move towards that cloud stuff hosted services) what’s best to use?
Two potential base URI options:
- data.lib.sussex.ac.uk
- www.sussex.ac.uk/library/
The former was my immediate first choice, it fits in with the common naming practice ‘data.organisation.tld’ (admittedly with ‘lib’ in the middle, I don’t think we are ready to roll out an institutional data service just yet).
The latter was a consideration as it built on an already known and trusted URI on a institutionally embedded service: our University website (and corresponding infrastructure). Both URL and service are going to be around for the foreseeable future. What’s more they don’t require the Library to maintain any additional infrastructure. However, this didn’t fit in with the common convention used, might clash with other Library URLs. And there’s a risk: If the University moved to a new Content Management System it might break our URIs, especially if the CMS required full control of the ‘www.sussex.ac.uk’ namespace. Plus, it just doesn’t look cool.
So currently thinking is http://data.lib.sussex.ac.uk/ – We can run it off a server here in the Library, which runs Apache and a number of other undemanding web services (wikis etc). This does require the Library to maintain it, which to be blunt, might be an issue if I leave. But there is nothing stopping us working with our IT Services and moving data.lib.sussex.ac.uk to a centrally run (or even third party hosted) server in the future.
Second issue…
Do we need a to create a ‘Mass Observation’ name space under http://data.lib.sussex.ac.uk/ e.g. http://data.lib.sussex.ac.uk/moa/ ?
In a nutshell, keeping it as http://data.lib.sussex.ac.uk/ keeps a simple URI, and allows us to merge in other datasets in to the same ‘pool’ (I don’t think pool is part of the Linked Data vocabulary but never mind).
However the risk is that should we wish to create more Linked Data sets in the future, whether for the Library Catalogue, the Institutional Repository or other Special Collections, how can we be sure the various identifiers, names and reference numbers will not clash between the different datasets? Will a Library Catalogue and Archive metadata be strange bedfellows?
I’ve been discussing this with Pete Johnston from Eduserv who has provided a lot of advice and things to consider. An example which came up in our discussions was:
Would it not be desirable to have the above as a URI which could provide a description (of Winston Churchill) and links to both the MOA, other archives and mentions in the Library Catalogue, all from one URI ID?
My ignorance in this area is high, but my understanding is that these URIs will probably serve up information from elsewhere (i.e. our hopefully soon to be Talis Platform Store) and present it, will having one name space confuse things, as the URI will need fetch data from potentially various stores and sources to present to the requester (human, computer or otherwise).
Perhaps another option is to keep to one namespace, but separate it out in to collections further ‘down’ e.g.
http://data.lib.sussex.ac.uk/id/document/moa/1234
http://data.lib.sussex.ac.uk/id/document/anotherarchive/1234
Is that an option, or will it break conventions?
I should say at this point that we are using Designing URI Sets for the UK Public Sector as a guide for creating URIs and are trying to stick to their guidelines as much as possible.
However: I am torn between the grand, right(?), more technical nirvana of one name space. And the less risky approach of keeping the MOA data in it’s own name space silo (you can’t have a open data blog post without the s word).
The problem ultimately is that I am still so very new to this I find it hard to think about what the issues may be, or what the right and wrong approaches are.
So, I welcome your expertise, thoughts and insights. What would you recommend? What are we not thinking about that we should? What problems are we making for ourselves down the line? What is the right approach? And (building on that last question) what is the right approach considering our somewhat limited resources and time?
So ladies and gentlemen, you thoughts please? Please.