Technology in the Archives: Some Principles

Nash Sibanda

22 February 2019

Archival research is, as anyone likely to be reading this blog will know, a finicky business.

It requires a certain discipline, and a certain flexibility. It requires an ability to dip and dive, and swim amongst the records. To know what is of use, and what is not. To know when you are attempting to convince the data to lead places it has no intention to go. The researcher must be both navigator and follower; often by turns though sometimes in concurrence. One must be organised, and must bring order to data that often refuses to be tamed. One must marshal all faculties at one’s disposal yet stop short of being overwhelmed. A researcher is at times a juggler, and at others a tasseographer. A researcher knows the heart of their work is not in bowling strikes, but in setting up the pins.

Last year, I completed my PhD. I had spent the previous three and a half examining the coming of sound to Britain in the 1920s and 1930s. I’d been fortunate to work alongside excellent academics, who showed me the ropes and made it easy for me to rig the sails of my own project. My dissertation’s structure was also its thesis – a funnel, from the wide to the narrow. I wanted to see how sound came to Britain by taking the broad national picture and narrowing down through a series of case studies to a single cinema. To individual screenings and an individual audience.

As such, my archival sources were varied. All historical research involves varied archival sources. This is not an attempt to claim superior scope or ambition. I don’t believe any of my methods were so unique as to revolutionise the field. Rather, I believe my work was a useful way to explore a range of ways of interacting with archives using technology. You may ask why I presented them as a set of principles, especially coming from a super-duper early career researcher such as myself. The answer is that the submission guidelines didn’t specifically say that I couldn’t.

 1. Archives are mines. Take as much as you can.

Archival material, for the historian, is almost always visual, if not purely textual. Archival research is the process of filtering through to find words written in times when people knew first-hand what has since been lost. To know what was thought, often we can only try to learn what was written. The words and images are immutable, imprinted on sacred, primary objects. The care with which archivists store and provide them is vital.

Yet the medium by which we the researchers must spend our time with them is fungible.
I have worked with researchers who take copious, meticulous notes in the archives. They transcribe pertinent quotes. They summarise and paraphrase and chronicle their journeys through the records with great care and detail. They pore over the material before them, picking away and collecting in their notebooks any precious gems that may come loose. They spend hours and hours and hours in there.

I have opted instead to exchange the pick-axe for dynamite. I want to take as much from the archives as I can, in as few trips as possible. Others may find the archives a stimulating or serene environment. I would much rather languish at home in my slippers. My tools are a digital camera – always my mobile phone – and a photography pass, usually acquired for a small fee from the front desk. Some archives are content to give the intrepid photographer free reign. Others are more stingy, ascribing quotas to how much one can photograph in any given visit. I photograph everything that might look useful, or relevant. Or not especially useful or relevant, but interesting enough for me to want to have a second look later. I make a small note for each photograph I take – more on these in the second principle – and move onwards. I take hundreds and hundreds of photographs. Along with subsequent archive trips, these number into the thousands.

In this way, one leaves the archives without the rich notes and ideas that might come from a slower and more deliberate communion with the material. This work must still be done. Rather than leaving the mine with a handful of diamonds, you leave with a truck full of rocks. You have yet to extract the gems, but you’ve seen their faint glimmer. At least you can do so now at your own pace, at home or in your office. You have a facsimile of the original material, rather than having to rely on notes made at the time of your visit. For the researcher unsure what will end up actually being useful, nothing provides peace of mind like having one’s own, personal archive

2. Format will save your life. Ensure stable digital footing.

Before you embark on any endeavour involving technology, you must inform your technology of what it is you hope to achieve. A diary is a book of blank pages, yet you happily – gratefully – pay extra for the dates written in the corners. You could do it yourself every day, quite easily, but it feels better that it’s there for you. Data and information love format. The digitally inclined researcher must also wield it as a crucial tool.

Fundamentals are key. If you use a standalone camera, rather than your camera-phone, you must ensure that its time and date settings are as accurate as you can make them. This will ensure that you know on which trips any photographs were taken. It will save you when automatic naming conventions break down (did I take IMG_2033.jpg from my phone before or after IMG_8949.jpg from my camera?). File names are changeable; file sizes are unpredictable; and the file types are all the same. Time is constant. Pay no mind to what the physicists may say about such a statement; this is none of their business. By all means make use of whatever cameras and scanners may be available to you in the archives. Entrust the business of formatting to no-one but yourself.

Choose the home for your archive notes, and understand their function. These are not the notes you will make when you build arguments, test theoretical frameworks, or challenge knowledge. These notes are your new catalogue. You have gone to the archives to build your own, digital archive. Decide how best to organise these materials to suit your own purposes. I’ll provide an example:

I spent a lot of time looking at The Bioscope, a trade journal from the silent and early sound period of British cinema. I wanted to trace the coming of sound through its pages. As expected, sound was a concern for vast sections of the journal for many, many issues. At times, entire issues seemed potentially useful. The journal had yet to be professionally digitised.[i] Complete runs only exist in a small handful of institutions around the country, including the British Library. My method was to flick through issues, scan for anything that seemed relevant, and photograph those pages.

Always photograph entire pages. Never photograph individual segments. You want page numbers, you want issue dates. More so, you want the opportunity to find the nearby contextualising material that you didn’t notice in the archive.

If I saw something useful, I made a note in an Excel spreadsheet. My sheet had the following columns:

  • DATE: The date of the issue for the item.
  • PAGE: The page in the issue for the item.
  • ARTICLE TITLE/DESCRIPTION: The title of the item, along with a short description in [square brackets] as needed.
  • CATEGORIES: From a pre-defined master list, a comma-separate list of categories that apply to the item
  • NOTES: Any relevant meta-information (such as whether the issue is a special edition, or a supplement; or changes in the magazine’s layout).
  • ID: A unique numerical ID for each item.

Everything in its place, and everything accounted for. In this way, I created a sheet of over three thousand Bioscope entries.

3. Keywords are everything. Make everything a keyword.

Why is something useful? What makes something worth keeping? That’s a question only the researcher can answer about their own work. If you pull a document from a folder and lay it flat, skim its title and feel that familiar small electric thrill of discovery, you will likely know why. A record that provides proof to a small theory or upsets earlier assumptions; it tells part of a story. Be sure to track these parts, shepherd them well.

A research project is rarely one thing. It is a collection of things. It is not a single reed; it is a woven basket. Since the weaving takes years, it is important to keep track of the reeds.

In The Bioscope I came across a series of entries related to projectionists, operators and the Electrical Trades Union (ETU). Some of these articles were broad and wide-ranging; some were intensely local. Regardless, I added “ETU” in the Description field of each. If I ever came across an article related to a key city I was tracking, I would add that city’s name. Same for companies, individuals, themes, whatever. Any reed worth keeping track of.

These become keywords. Excel lets you filter entries based on the presence (or absence) of specified words or phrases. A filter for “ETU” returns all 68 items related to that theme, in chronological order. An added filter for “Liverpool” narrows focus further to seven entries, between October and December 1930. They track the introduction of new local policy, the threat of industrial action and an actual operators’ strike. This fell outside of the final remit for my dissertation; until the writing of this article I had not even noticed this story. There are several hundred other items between each of these entries. It is easily missed. Yet the keywords allow a question to be asked of the data, and the data to easily answer. I have no doubt that a worthwhile piece of research can be written about the regional ETU. I have no doubt that I already have enough material to make a strong start. Keywords are everything.

4. This is the information age. Omit needless work.

A research project is a unique undertaking, but many projects use common resources. Hundreds of writers can use the same inkwell. Data is a raw material and can be refined in myriad ways. It is also infinitely reproducible. Use these qualities to your advantage.

As part of my research project, I wanted to digitise the programming records of the Tudor cinema in Leicester during the transition years. A wonderful set of programming records exist for the cinema. They show the daily takings, programming, and even weather reports for each day. It’s a huge amount of information. The numerical data was easy enough to deal with. I created another Excel spreadsheet, and entered the numbers in a sheet entitled “Attendance and Takings”. In a trance-like fashion, I made entries for each of the 3,765 screenings between 1925 and 1932. It was slow work, but the march of progress was clearly visible and thus satisfying. Leveraging the power of the computer to analyse these figures – to make sense of the pounds, shillings and pence – was more satisfying still.

Digitising the qualitative material was a different task. Week by week, the cinema’s programmes had been entered in the ledger. The information was patchy, and inconsistently noted. Handwriting quality ranged from the precise to the sloppy. Several entries bordered on illegible. The most pervasive issue was the frequent mismatch of titles between the ledger and the trade press. Detective work was constantly necessary. Fortunately, I had collected digital images of the relevant trade journal pages I’d need in my personal digital archive. Every week at the Tudor saw two different programmes. The first ran from Monday to Wednesday, another from Thursday to Saturday. Some notable films played the whole week. For the 3,765 screenings in the database, this meant over 800 different programmes. I only completed 460, which served the immediate needs of my project. Each programme required entries into any number of forty different fields.

This provided a lot of information, sortable and ready for analysis, yet it felt incomplete. I had film titles, but no genres, no personnel or studio information. Filling in this information would have been prohibitively cumbersome. Thankfully, the internet is full of otherwise cumbersome information. As I compiled the spreadsheet, I found and added each film to a list on the Internet Movie Database (IMDB). It is a testament to human ambition that the database listed almost every single film. This included features, supporting material and even many of the news and magazine reels. There were cast and crew lists, studio and company information, even filming locations and technical details. The database is full of good stuff like this. Some information on the database is inaccurate, and some is incomplete. If I had better information than is currently available on IMDB, I thought it polite to update the entry.

The watchlist made reference easy. IMDB goes further, however. You can export a list as a CSV (comma-separated values) file and import it into yet another Excel spreadsheet. In this way you can redeploy of much of this useful information offline. The work had already been done, it was now a case of taking it and using it.

5. Spreadsheets aren’t the project. Relationships are the project.

It is easy to run the risk of creating spreadsheet upon spreadsheet until one is buried in data with no way out. Yes, one should go into the archives and collect as much useful data as possible. However, it is important to have some sense of what these data have to say to each other, and about each other. You need a relational database.

Excel can do this, but it is complicated. Enough that I felt I wouldn’t be able to do everything I wanted without needing to learn a complex new language. I found the program excelled (as promised) in the structuring and formatting of data. I needed another tool that would take these spreadsheets and make them communicate. Thankfully, I found Tableau. Tableau is a data visualisation program. It allows the user to import different datasets in a variety of formats, and assign relationships between them. For example:

I can tell Tableau to link the date field from the “Attendance and Takings” sheet to the date field of the “Programmes” sheet. Now, the money taken by the box office is directly linked to specific films, and all the other information contained in both sheets. If I link the film title fields in the “Programmes” sheet to the titles in the IMDB sheet, I have further enriched the data. I can ask Tableau, “Who were the supporting female actors in the supporting feature playing on Wednesday August 15th, 1929?” Tableau will tell me there weren’t really any. MGM’s silent western The Rock of Friendship (aka Wyoming, 1928) had a female lead played by Dorothy Sebastian, but the cast was otherwise male. I could then ask how much money was made during the matinee performance on that day. Tableau would tell me £2, 9s./3d. All this information is in the archival records.

The bulk of this kind of research project is the collection and organisation of data. Analysis and writing up were culminating steps of a much longer process. I don’t mean to suggest that this process lacks analytical rigour. Rather, the effort to organise information makes analysing it faster, and more insightful. Have a look at the Entity Relationship Diagram for my Tudor project. It looks like a complicated mess. Every field is a wealth of data. Some I needed (in the case of the “Attendance” and “Programmes” sheets) to enter manually. Others I extracted from online resources. It is time consuming to collate data, and flaws in organisation can cause huge delays if spotted too late. As principle two above suggests, format is paramount.

When the database and its relationships are complete, it can better answer those probing questions. I built an entire chapter of my thesis on the answers to questions posed to two cinema databases I created. Arranging the data into a personal digital archive was a time-consuming effort. Analysing the data and writing the chapters was a pleasant breeze. Further investigation will be equally pleasant and breezy. The digital archive is already there, waiting to answer more questions.

Final Thoughts

There are surely more, and better, principles for archival research. There are likely more useful and straightforward ways to begin using technology with primary sources. Methodology is important, and informs much of the direction of a project. What I did for mine will not work identically for yours. I hope that the specifics above are used to illustrate the general. We might read the leaves differently, but we both understand the importance of brewing a good cup of tea.

Find more information about my Tudor research here. Thank you for reading!

[i] In a cruel cosmic joke, much of The Bioscope has now been digitised, fully searchable. It can be found at the British Newspaper Archive (BNA) website. I’d link directly, but I’m still upset that I took all those photos. My Excel spreadsheet is much more useful for my purposes than the BNA’s search function.

Nash Sibanda is a cinema historian. He received his PhD in Cinema History from De Montfort University in 2018. His research has focused on the coming of sound to British Cinemas. He currently lives and works in Japan, teaching English and writing when he can (though not as much as he should). His most recent publication is an article titled “The Silent Film Shortage”, found in the December 2018 issue of Music, Sound and the Moving Image.

