‘Is God a person?’ The questions and choices that arise when transcribing wartime letters

Published on 19 April 2023
In recent decades, more and more historical sources have been digitised. Digitising a text source not only involves scanning the text, but also making the text in the source machine-readable. These days, it is no longer necessary to retype a text manually in order to do this; instead, there are various technologies that use artificial intelligence to convert the text in a photo into computer text. Optical Character Recognition (OCR) technology is used to transcribe printed and typed text (in books and newspapers, for example) on a large scale. In recent years, computers have also been trained to recognise handwritten text. This technology is known as Handwritten Text Recognition (HTR).
Example of an HTR transcript in Transkribus

In October 2022, as part of the project ‘First-hand Accounts of War’, we launched a crowdsourcing pilot in which a group of volunteers is working with us to transcribe and annotate wartime letters. The aim of the pilot is to explore the potential of using volunteers to help digitise wartime letters.

NIOD has considerable experience with crowdsourcing: in previous projects, such as ‘Paper Witnesses’ and ‘Adopt a diary’, we worked with large groups of volunteers to retype paper documents from the NIOD archive. In ‘First-hand Accounts of War’, we deliberately chose to work with a small group of volunteers. This allowed us to provide effective guidance and, conversely, it gave the volunteers the opportunity to contribute their ideas about how to approach the project. The volunteers are helping to transcribe the text in the letters and annotate the metadata.

Often defined as ‘data about data’, metadata consists of keywords that provide information about the linked material and summarise the content of a document. Providing metadata makes it easier for researchers and interested parties to browse documents. In our project, volunteers are creating metadata by adding easily-retrievable labels to locations (e.g., ‘Herengracht’), people (e.g., ‘Eli Fresco’), geopolitical entities (e.g., ‘Germany’), organisations (e.g., ‘N.S.B.’) and dates (e.g. ‘17 September 1944’). We drew up some guidelines for annotating the documents, based on insights from the literature on metadata. One key principle that recurs in much of the metadata literature is that the staff who extract data from sources should not interpret these data themselves. Inconsistencies should be preserved in the process. Volunteers do not need to make decisions about varying spellings of names, for example; they are simply copied from the original into the metadata (e.g., ‘Duitschland’ in old Dutch spelling, or spelling errors).

Our pilot has shown, however, that it is difficult to follow this strict line in practice when annotating information in a collection of ego documents, such as the wartime letters collection. The ways in which letter-writers describe situations, places, moments in time and other people in their letters, and the words they use for this, are deeply personal and differ from those used, say, in newspapers or political documents. A letter-writer might write about someone by referring to their name, but they might also write about ‘my father’. A place might be mentioned literally, but the writer might also write about ‘home’ or being at ‘grandma and grandpa’s’.

These data or entities in the text cannot in themselves be traced back to a specific place or person. There is also a great deal of variation within the letters: not only between letters from different writers, but also in the way in which one writer may refer to a particular person, date or location (over time). We found that the guidelines we had drawn up beforehand could not always be applied. For example, one of the volunteers asked us whether he should label ‘God’ as a ‘person’. Our guideline states that a person should be identifiable – something that is not entirely clear in the case of God. The volunteer in question had to make a decision about this and thus interpret the source. Metadata is not existing information that simply needs to be found; rather, metadata is created and shaped at the moment it is collected.

In email exchanges with the volunteers, the project staff drew on their own expertise to discuss the technical and material aspects of the project, as well as substantive issues and aspects relating to archival science. There was an exchange with one volunteer about whether a letter from the fourteen-year-old Jan Tax had been written under adult supervision or not. Jan Tax was evacuated from Amsterdam to Gaast during the ‘hungry winter’ of 1945. The question of whether he had been supervised while writing the letter is a relevant one: does this letter offer the perspective of a child in wartime, or that of an adult that was merely written down by a fourteen-year-old? As this volunteer had also worked on letters from adults in Jan Tax’s immediate surroundings, he was able to compare them and make well-founded suggestions. Thanks to this exchange and the volunteer’s specific knowledge of the case, we were able to assume that Jan Tax had written and formulated these letters himself.

Letter from Jan Tax to his parents, dated 13 January 1945

The concepts of ‘crowdsourcing’ and ‘volunteer’ actually do a disservice to the volunteers who are working on the generation of transcripts in the ‘First-hand Accounts of War’ project. In practice, the volunteers in this pilot have become citizen scientists; people who undertake systematic work on source collections without a research question. They are developing expertise on many aspects and details of one or several specific source collections. In doing so, they offer a unique perspective on the sources that complements those of the archivist and the historian. 

This blog was written by Muriël Bouman.

