Following in our footsteps

Wednesday, July 6th, 2011

Question: If others wanted to take a similar approach to your project, what advice would you give them.

Our advice at the start would be:

1. Get your data ready. We are working on our catalogue data to make it more structured so that we can be ready to export to other formats and make it more portable. Regardeless of whether it becomes Linked Data in the future, we are getting ourselves ready. This is also probably the most time consuming aspect. From personal experience, once you start looking at your catalogue data, you’ll find lots of things that you want to change or are missing or don’t make sense so the work starts to grow…

2. Are you in a position to licence your data? We chose the catalogue data of the Mass Observation Archive as we were confident of its provenance so we could make it fully open and available under ODC-PDDL. This hopefully will allow the greatest flexibility for people wanting to use the data and fits with the ethos of the project and the JISC Discovery strand

3. Find out about other similar projects! We at SALDA realise the value of these blog posts to anyone wanting to do a similar project to SALDA. We followed in the footsteps of the LOCAH project and were able to use their stylesheet and experience in tranforming archival data into Linked Data. We are working with the Pete Johnston from Eduserv whose knowledge and experience is invaluable. You can see his contribution to the blog here

4. Find examples of Linked Data in use, in human readable format so that you can show stakeholders, colleagues, friends what it is that you are on about. I use the BBC wildlife pages and how they link to Animal Diversity Web

Licencing part 2

Monday, April 11th, 2011

Thanks to Alexandra and Owen for their thoughts, and please see Chris Keene’s comment also.  This issue was never going to be straight forward and any discussion about licencing makes people edgy.  Perhaps not the licence itself  but the idea that someone could use someone else’s work without asking or crediting them.  As Alexandra says, this is a possibly a separate issue, but it seems that by using CC-BY licences people are hedging their bets – you can use it but you have to say where you got it from, which is perfectly reasonable. There is also uncertainty about whether they have the right to licence the data in the first place.

I’ve discussed this with Fiona and she makes the point that in an academic context, we tell people every day how to reference the materials they are using and so attribution of data is in our very core. Making our catalogue records available as Linked Open Data with no insistence on attribution is contrary to what we do every day. However, naïve as it may sound, though we don’t insist that people attribute in this case, that’s not to say that they won’t.

We know we are diving in there and taking a risk, we don’t know how the data could be used in the future and what impact that will have. But someone has got to take that risk. We are confident in that are able to licence the data for use in the first place and we want to take the most open road.  We don’t do it lightly.

It is perhaps the conflict between archivists and developers. As Archivists we are naturally cautious and as I said earlier, make attribution a key part of our work. Developers/ technicians are much more used to making things out there as open source – I’m assuming – would any developers like to comment?

Licencing our data

Tuesday, April 5th, 2011

We have decided to use a Open Data Commons Public Domain Dedication and Licence ( PDDL) to licence our data once it is open and on the Talis Platform.

Key points of PDDL

  1. Recommended by JISC for collections of factual data
  2. Goal is to eliminate restrictions on the use of data so it can be used for any purpose including commmercial and in combination with other data
  3. There is no requirement to attribute the source of the data
  4. The Licence makes the work – in our case the catalogue records of the Mass Observation Archive – permanently  available to the public for any use of any kind.
    The line above in bold is the scary bit but also the main point of getting the data out there and we are lucky to be sure of its ownership and copyright.

Why we chose PDDL

PDDL is the standard for collections of non personal factual data which is what the catalogue records of the Mass Observation Archive are. The assumption is we own the rights to this data as the original creators were employees of the University of Sussex so we are free to licence this data.

JISC guards against putting variants in licences for special requirements for example no use of images,  as “The introduction of variant terms into a ‘Creative Commons-like licence’ from a single institution may require those potential beneficiaries to pay for legal advice in order to understand the implications of the variation. The value of seeing and understanding a single licence across the web is lost, as every minor variation encountered increases the likelihood that the different licences will conflict when combined in some third party use case” (JISC rights and licencing). I don’t believe a variant is necessary as it is a collection of factual data.