British Library

Translating for a Digital Archive

The Qatar Digital Library

Since 2012, the British Library has been working with the Qatar Foundation and Qatar National Library to create and maintain the Qatar Digital Library. Launched in 2014, this free, bilingual portal hosts a growing archive of previously un-digitised material primarily from the BL’s collections. Focusing on content relevant to the history and culture of the Persian Gulf, items include India Office Records, maps, visual arts, sound and video, and personal papers. The portal also features selected Arabic scientific manuscripts. Alongside these items, the QDL also offers expert articles to help contextualise the collections.

As part of the BL’s translation team, I work to produce and edit the Arabic language content for the QDL. While the collection items themselves are displayed solely in their original language, all of the portal’s supporting and descriptive content is translated, as are the expert articles, meaning that the catalogue can be searched and used just as easily in Arabic as in English.

The bilinguality of the portal has been a key part of increasing the visibility and accessibility of the collections. Users of the QDL are just as likely to access the site in Arabic as they are in English, if not even more so: the most frequently visited individual page on the site is the Arabic homepage and users more often land on one of the Arabic pages than the English ones. Moreover, the terms users enter to search the collections are just as often written in Arabic as they are in English. Consequently, we have a responsibility to maintain the same high stands and make sure that all of the QDL’s features function equally well in both languages.

Our Toolbox: Translation management software

Like many large-scale translation projects, ours involves multiple translators, and several rounds of proofing and quality checks to ensure accuracy and consistency. To manage this, we use a piece of software called memoQ that includes two essential tools: a translation memory (TM) and a term base (TB). The TM functions as a bilingual database of previously translated segments of text; it works by storing pairs of original source-language content alongside its approved translation. When a new text is imported, memoQ breaks it into smaller segments on the basis of punctuation and line breaks, and automatically conducts a search for exact and partial matches. These are then presented to the translator for approval and/or review.

Caption: A segment in memoQ with an exact match (100%) in the TM

Caption: A segment in memoQ with a partial match (85%) in the TM

While a human expert still has the final say on whether to accept any suggestion from the TM, frequently only a minor edit is needed to make the old translation suitable for the new context. This serves the double purpose of saving time and maintaining consistency across the catalogue as a whole. Translation memories tend to prove their worth the larger they are and the more repetition there is in the content. Having grown over the years since the start of the project, our TM now routinely recognises a third of content in a new file, and often much more.

While the TM grows organically over time by compiling and storing translation segments, the term base is maintained manually. It works as a glossary for key terms, allowing us to suggest preferred equivalents for individual words or phrases, and/or to blacklist translations that should be avoided. As the TB is visible to all parties at all stages of translation and proofing, it helps to ensure the consistency of these terms in Arabic.

Caption: A segment in memoQ with terms recognised by the TB highlighted in blue

Caption: Terms recognised by the TB, with approved translations in blue and forbidden ones in black

Authorities: making the most of memoQ

The TB has proved especially useful when it comes to translating authority files. An authority record serves to identify and describe a person, corporate body, family, place name, or subject term that is featured in a catalogue description. Each term is authorised and unique. As every record and every expert article on the QDL is linked to at least one authority file, they form an index through which users can search for all the content related to a specific term.

Caption: Authorities displayed as filters on the QDL

Caption: Authorities displayed at the end of a record on the QDL

To be effective, authorities must be reproduced in exactly the same way for every record. For the English side of the portal, they are extracted from the same central database each time, with no opportunity for them to mutate or change before arriving on the portal – but not so with the Arabic!

For every record, the linked authorities are included as part of the English text to be translated, no matter how many times they may have been translated in the past. This repetition of the process creates an opportunity for discrepancies to creep in. If, for instance, there are several new records, all linked to the same new authority, that are sent to several different translators, it is not only possible but quite likely that each translator will produce a valid but slightly different version of the term in Arabic. If the same records then also go to different proof readers, there is a good chance that the discrepancies will slip through unnoticed, rendering the Arabic authority much less useful than the English equivalent, as any one variant will not be linked to all the related content.

After spending much time and energy on trying (and sometimes failing) to catch these discrepancies at the end of the proofing process, we now make sure to pre-translate any new authority and add it to the TB, along with a unique identifying number (arkID), before sending the related files for translation. This means that when the term appears for translation, it is displayed in the TB along with its arkID, adding an extra means of checking whether this is the approved and appropriate translation for this specific context. Once confirmed and thereby added to the TM, it registers as a 101% match, meaning that there is an exact match not only in the text, but also in the metadata.

Caption: Authority term with arkID displayed in TB, registering as 101% match in memoQ

Cataloguing for Translation

Working in-house at the BL alongside the cataloguers allows the translation team to understand and appreciate their processes and standards, and has also allowed us to show them the impact of their decisions and choices on translation. Over time, we have developed guidelines to help them create the English records with translation in mind. For example, where possible, the cataloguers now use stock phrases for repeated content, leading to a much higher hit rate in the TM, and they understand that their use of punctuation can make a big difference to the likelihood of a match appearing.

Caption: Stock phrase with multiple TM hits in memoQ

Caption: List of correspondents written using punctuation marks to help break the text into smaller translation segments in memoQ

Small changes like this help to streamline the translation process, so we can focus on maintaining the QDL’s high standards across the Arabic side of the portal and make sure the content is just as accessible in either language.

Translation in Digitisation

In my work as a freelancer, I have found more often than not that clients arrive at translation as something of an afterthought. It is frustratingly common to find that they have budgeted neither the time nor the funds required for the work – the deadline tends to be yesterday, and the fee mere pennies. Pleasingly, this is not the case working on this project, where translation has been built into the process from the beginning and is understood to take time, thought, research, and expertise. Moreover, the decision to have an on-site team, working in the same office as the cataloguers, affords a rare opportunity to consult the specialists about their writing when queries inevitably arise, and to reciprocate by sharing our linguistic, cultural, and technical knowledge. We could of course always do more in our efforts to create bi- and multilingual resources for ever wider audiences, and with more and more institutions planning and investing in digitisation, there are deeper and broader questions about how, for whom, and in which languages we do so. Bilinguality has been a vital part of the QDL’s success in opening up the collections to new users and ought to be part of the ongoing discussions in digitisation.

See further:
Copyright:

Banner: Brief Principles of the Arabic Language ‎[F-1-14] (14/184), Qatar National Library, 10680, in Qatar Digital Library. Author: Filippo Guadagnoli. ©Qatar National Library. Usage Terms: Creative Commons Attribution Licence

memoQ Images:  ©memoQ.

QDL Images: ©Qatar National Library. Terms: Creative Commons Attribution Licence

 

 

The Polonsky Foundation England and France Project: Manuscripts from the British Library and the Bibliothèque Nationale de France, 700–1200

In November 2018, the British Library and the Bibliothèque Nationale de France launched two new websites that offer access to digitised copies of medieval manuscripts. The two libraries worked together to digitise 800 illuminated manuscripts from the period 700–1200, sharing them online for the first time.

The project focused on manuscripts produced on either side of the English Channel over half a millennium of close cultural and political interaction. These 800 manuscripts were selected to build on existing digitised manuscript collections, based on their artistic merit, research value and wider public interest. The project manuscripts comprise a wide range of texts, including liturgical, biblical and theological works, and legal and scientific treatises that reflect the interest of monks, abbots and clerics, who were responsible for much of book production in the period before 1200.

The project drew upon the expertise of curators, cataloguers, conservators and imaging specialists from both institutions, who have learned from one another through a programme of knowledge exchange and reciprocal visits. Each manuscript was checked by a conservator before it was filmed, and any necessary preservation work was performed, to ensure that all manuscripts could be digitised safely. All the manuscripts have been newly catalogued to include up-to-date bibliography, the identification of texts and descriptions of the artwork. These descriptions can be viewed on Explore our Archives and Manuscripts for British Library manuscripts; and on Archives et manuscrits for Bibliothèque Nationale de France manuscripts.

British Library, Cotton MS Caligula A VII/1, f. 6v

Two websites

In November, the libraries launched two innovative websites that complement each other. Using the International Image Interoperability Framework (IIIF) technology, the Bibliothèque Nationale de France hosts a site, France et Angleterre: manuscripts médiévaux entre 700 et 1200, that allows side-by-side comparison of 400 manuscripts from each collection. This new website will enable users to search the manuscripts in English, French and Italian, and to annotate and download images.

The second website, hosted by the British Library is a bilingual online resource, Medieval England and France, 700–1200, that presents a curated view to the project manuscripts in English and French. The site features over 140 manuscript highlights from some of the most important of these manuscripts. It includes 30 articles on a wide range of themes, including medieval science, manuscript illumination and the development of vernacular languages; as well as discussions of prominent figures from the period, such as Thomas Becket, Hrabanus Maurus and Anselm of Canterbury. The site also features a series of videos, narrated by Patricia Lovett MBE, detailing the stages of making a medieval manuscript; two interviews with Professors Julia Crick (King’s College London) and  Nicholas Vincent (University of East Anglia) about manuscript production during the period; and an animation inspired by a medieval bestiary (British Library, Harley MS 4751).

Highlights now available online include the lavishly illuminated Winchester Benedictional, created around the year 1000, as well as the 12th-century collection of St Thomas Becket’s letters, including the earliest depiction of Becket’s martyrdom. There are exquisite Anglo-Saxon manuscripts from the centuries before the Norman Conquest of 1066 that include Psalters, saints’ lives and Gospel-books, and spectacular manuscripts in the Romanesque style, including the giant two-volume Chartres Bible (12th century). The magnificent Canterbury Psalter (12th century), with a tri-lingual translation of the Psalms in Latin, French and English, was made in Canterbury. The book is sometimes known as the Anglo-Catalan Psalter because some of its illustrations were left unfinished and were completed several centuries later in Catalonia
This exciting project was made possible by a generous grant from The Polonsky Foundation. Dr Leonard Polonsky commented that:

“This project brings together riches of the Bibliothèque Nationale de France and the British Library and makes them available to researchers and the broader public in innovative and attractive ways. Our Foundation is privileged to support this collaboration, which continues the cultural exchange and profound mutual influence that characterises the history of these two nations over many centuries.”

The Polonsky Foundation is a UK-registered charity that supports cultural heritage, scholarship in the humanities and social sciences, and innovation in higher education and the arts.

Follow new discoveries and featured content on Medieval Manuscripts blog and Twitter.

Cover photo: British Library, Cotton MS Caligula A VII/1, f. 6v

British Library Qatar Foundation Partnership Programme

In October 2014, the British Library Qatar Foundation Partnership launched the Qatar Digital Library (QDL), an online bilingual portal that provides free access to material from the British Library’s collections.

The portal displays content related to the history and culture of the Gulf and its surroundings, as well as the Library’s Arabic Scientific Manuscripts. Among the collections that we are working on are: the India Office Records on Gulf History (Agencies and Residencies), personal papers, maps, photographs, and manuscripts. The portal is fully bilingual, supporting study in both Arabic and English. At the moment, there are almost one and a half million images of British Library material on the portal, comprising over 14,000 records and over 136 manuscripts, with more content being uploaded every week. In addition, the Digital Library hosts articles from our experts, developed by the British Library team to help contextualise the collections. There are currently over 140 published articles, with more to come.

Digitising and publishing the documents on the QDL requires the work of a wide range of specialists. We are an interdisciplinary team, made up of more than forty professionals, including computer scientists, photographers, conservators, curators, archivists, administrators, translators, and specialist historians. Together we are working to give users of the portal a comparable experience to seeing the original documents in person.

The most obvious and important benefit of digitisation is the increased visibility and access to the collections. Users no longer have to be physically present in the Library’s reading rooms in London, but can now view these records from any corner of the globe, on a number of different devices. Since the portal has been active, users have been accessing the site from all around the world, with the top five countries being the United States, Saudi Arabia, Qatar, Oman, and the United Kingdom.

Alongside the digital images, each file is published with a short descriptive catalogue record, created by our team of experts. Cataloguing of this kind allows the Library to better understand and document the nature of the collections themselves, improving its own records and highlighting the importance of the material.

When providing free open access to information online, issues surrounding copyright and data protection must be considered.  On the programme we have a dedicated Rights Clearance team, and the programme works with the Library’s Information Compliance Officer to ensure that we are compliant with current legislation and British Library policy. By firstly determining whether the catalogued material is still within copyright or not, our Rights Clearance team then conduct copyright ownership research into the collection items selected for digitisation, tracing and contacting Rights Holders where possible, such as individuals, companies, publishers, estates and other relevant bodies, working to ensure the correct usage terms are displayed on the portal.

Moreover, there are further challenges on a digitisation project such as this. There can be challenges in scoping the material: its condition, size, the style of handwriting, and the languages in which it is written may all make a given file difficult to read. These issues can in turn have knock-on effects on the time needed for conservation, cataloguing, and digitisation. Assessing the time needed for an item to makes its way from the BL’s secure storage onto the portal is no easy task, and requires clear coordination across all teams. To facilitate this, a workflow with three separate streams has been developed, and is now managed through the use of Microsoft SharePoint. Each team also maintains thorough documentation and guidelines to help ensure the consistency of its work.

We are highly aware of the importance of communicating our work to make sure it reaches new audiences. Among our outreach activities, we promote the portal online through social media and in person through talks and tours of the programme. Many of our specialists also offer presentations at academic and archival conferences, participate in seminars, and write articles and blogs for wider publication. The response of users of the portal is overwhelmingly positive: many researchers and students are using this resource, not only in the UK, but also in the United States and across the Gulf region, and the increased access to this material is allowing for studies of a broader and more comprehensive nature than was previously possible.

Thanks to this project, important historical material from the BL’s collections, some of which had not previously been fully catalogued or studied in depth, is now being disseminated and made available to the general public. The Partnership has just agreed a further three years for this project, until the end of 2021, during which time we plan to make even more material available. We hope our efforts will prove useful to all who access the portal.

For more information please visit Qatar Digital Library and our web in British Library.

This article was originally published in ARC Magazine, a publication of the Archive & Records Association of the UK & Ireland, no. 349, September 2018.

Image: Kitāb na‘t al-ḥayawān كتاب نعت الحيوان [‎208v] (427/534), British Library: Oriental Manuscripts, Or 2784

Unlocking Our Sound Heritage under the General Data Protection Regulation (GDPR)

In July 2017 Unlocking Our Sound Heritage (UOSH), a major Heritage Lottery funded five year partnership led by the British Library was launched. The project, which forms part of the British Library’s Save Our Sounds programme, aims to preserve and provide access to as much as possible of the nation’s rare and unique sound recordings. It will be delivered by working closely with ten organisational partners across the UK who will digitise their own collections and selected content contributors. Over the five-years we hope to make 100,000 of these half a million digitised recordings (which range from oral history, wildlife sounds, popular and world and traditional music, radio, language and dialect) available through a freely accessible, purpose-built media player and website hosted by the British Library. In addition to innovative exhibitions and engagement activities in support of this.

One of a number of legal challenges to these aims is the General Data Protection Regulation (GDPR) which came into effect on 25 May 2018. In the UK this lead to the implementation of the 2018 Data Protection Act; the biggest change in privacy laws since 1998. These developments are not only a reflection of technological advancements in the last twenty years but also changing expectations around the protection of privacy.

The GDPR applies to all personal data which can identify a living person. This is referred to as personal and special category data and it can be anything from name and address, all the way through to political and religious beliefs. In order to comply institutions must adhere to the six principles of the GDPR which include: lawfulness, fairness and transparency in their processing of personal data. They must also identify if there is a legal basis for their processing of personal data and special category data. Processing in this context is an operation or set of operations performed on personal data, or sets of personal data which may include collection, recording, dissemination or retrieval.

For personal data three of the six lawful basis options are available to this project: consent, legitimate interests and performance of a task in the public interest. Each processing activity only needs one and there is no option to change the lawful basis further down the line. Some organisations like national libraries, museums, galleries and universities can rely on the performance of a task in the public interest as its lawful basis for processing personal data. The British Library is governed by the British Library Act 1972 and therefore meets this requirement, so for activities which are defined in the Act or the British Library’s Public Task Statement we rely on this exclusively. However for some of the UOSH partner hubs this is not available because they do not need to process personal data either in the exercise of ‘official authority’, or to perform a specific task in the public interest that is set out in law. An alternative legal basis for these institutions therefore is legitimate interest and as ever documenting why this basis has been selected is key.

Another type of legal basis available to the project is consent; however it is the most problematic since understanding of the term often blurs the lines between intellectual property, ethical practices of informed consent in oral history and data protection. To un-blur them we must differentiate between permission to record an individual and the rights needed to make a recording publically available from the consent of a data subject to process their personal data. The main concern is that consent would be required not only from the person speaking but the people they are speaking about. In a practical sense it would be impossible to achieve this and if withdrawn, consent could not be substituted for another legal basis.

Special category data requires a different lawful basis to those mentioned above, this is because the type of personal data it covers such as political beliefs, religious beliefs, race, ethnicity, or sexuality are considered more sensitive than ‘regular’ personal data, such as name, address or data of birth . Article 9 outlines special category data under GDPR and prohibits its processing unless one of the listed provisos apply, for both the British Library and the UOSH partner hubs on this project that is ‘Archiving in the Public Interest’.

For all of our processing of personal and special category data we will rely on the exemptions in Article 89 of GDPR which confusingly is also referred to as ‘Archiving in the Public Interest’. This allows for processing if appropriate safeguards are in place and exempts us from various data subject rights such as erasure or restriction of processing. However for it to apply the safeguards must be designed to prevent causing substantial damage or distress to a living individual.

Understanding and defining what we as an institution mean by substantial damage or distress is an essential focus of our work on the UOSH project. Legal definitions of these terms are difficult to find which leaves them open to interpretation. We can broadly say that damage is financial, physical or reputational in nature and can look to existing law such as defamation, contract and tort for guidance. However distress is far more subjective, based on previous case law we know it can mean embarrassment, anxiety, disappointment, loss of expectation, upset and stress, but that must go beyond annoyance or irritation, strong dislike or a feeling that the processing is morally wrong. One option is the construction of a two stage process, stage one we consider what the individual with the complaint says and stage 2 we examine what the ‘ordinary’ person might say. How as an institution we determine what this means and how it works in practice; and how we ensure it represents this ‘average’ view, is still in progress.

Objective and consistent decision making of which recordings are more or less likely to cause a living individual distress if placed online is and will continue to be a challenge. Those of us making these judgements must be aware of our inherent biases and ensure a wide range of opinions and guidance is sought. The process will always be subject to change in terms of our interpretations, the interpretations of others and following the outcome of case law and regulatory guidance. As ever good documentation of these decisions are key.

The GDPR brought about essential changes to privacy law in the EU and through the UK’s new Data Protection Act (2018) it will continue to impact a project such as this after the UK leaves the EU in March 2019. Like any digitisation project seeking to place large amounts of content online, how to comply with data protection law requires considerable attention. However, unlike many online access initiatives a high proportion of the content we wish to make available contains the personal information of identifiable living individuals and the assessment of which requires hours of listening time. As we embark on this relatively unchartered territory we have the opportunity to develop new and innovative processes and assessment methods in this area of audio heritage, data protection and online access. We are excited about the work we are doing and hope by the end of the project we will have a number of tried and tested methods which will help future endeavours in this area.

With thanks to: James Courthold (Information Compliance Manager, British Library) and Sue Davies (Project Manager, Unlocking Our Sound Heritage, British Library).

For more information on UOSH please visit https://www.bl.uk/projects/unlocking-our-sound-heritage and follow us on Twitter @BLSoundHeritage.