This is to add to Pete’s post on the transformation of our data. The SALDA project is really searching for a framework or a set of tools to enable us to transform our other archive collections into Linked Data. What we have discovered so far is that there is a model that we can apply to the data, based on the LOCAH model, but there is some local tweaking that needs to be done due to the structure of our data.
Prior to 2009, our catalogue data was in HTML lists on our website or printed lists in our reading room. We imported the catalogues into our CALM database in summer 2009 and most information went into the title field. This meant that when we then exported the data to EAD there was no separate fields for date or description. I then revealed to Pete with my head in my hands, that we don’t use access points and this is what the LOCAH process was based around. Pete was optimistic in his outlook saying that there were good points about our data to focus on.
- it was consistent in that it was all from one data provider
- it was consistent in the format of the date and where it appeared in the data (albeit not in the date field)
We decided then to think about other ways into the data. I provided Pete with 28 names out of the data in authorised form using National Register of Archives rules . I was able to confirm that these were definitely those people, so when it says “Churchill” in the data, it is:
Churchill, Sir Winston Leonard Spencer (1874-1965)Knight, prime minister and historian |
not churchill insurance, churchill college etc.
I also provided 100 or so keywords that appeared in the data and covered subjects from air raids to sex including places and organisations (Labour Party, Communist Party) events (the Coronation in 1953) and wider concepts like class, family, education and death.
Future proofing our data
Realising the limitations of our data as it stands in our archival management system has made the team at Sussex really look at how we catalogue things. We need to future proof our data so that we can export or transform our data or map across to other systems easier. We are compiling cataloguing guidelines to ensure that all our collection level records are ISAD (G) compatible and that certain fields are always populated in our componant records. This is not a small change and it will take a long time to modify 67,000 existing records. This has been an unexpected by-product of the SALDA project and one that we can’t ignore.