Translating for a Digital Archive

The Qatar Digital Library

Since 2012, the British Library has been working with the Qatar Foundation and Qatar National Library to create and maintain the Qatar Digital Library. Launched in 2014, this free, bilingual portal hosts a growing archive of previously un-digitised material primarily from the BL’s collections. Focusing on content relevant to the history and culture of the Persian Gulf, items include India Office Records, maps, visual arts, sound and video, and personal papers. The portal also features selected Arabic scientific manuscripts. Alongside these items, the QDL also offers expert articles to help contextualise the collections.

As part of the BL’s translation team, I work to produce and edit the Arabic language content for the QDL. While the collection items themselves are displayed solely in their original language, all of the portal’s supporting and descriptive content is translated, as are the expert articles, meaning that the catalogue can be searched and used just as easily in Arabic as in English.

The bilinguality of the portal has been a key part of increasing the visibility and accessibility of the collections. Users of the QDL are just as likely to access the site in Arabic as they are in English, if not even more so: the most frequently visited individual page on the site is the Arabic homepage and users more often land on one of the Arabic pages than the English ones. Moreover, the terms users enter to search the collections are just as often written in Arabic as they are in English. Consequently, we have a responsibility to maintain the same high stands and make sure that all of the QDL’s features function equally well in both languages.

Our Toolbox: Translation management software

Like many large-scale translation projects, ours involves multiple translators, and several rounds of proofing and quality checks to ensure accuracy and consistency. To manage this, we use a piece of software called memoQ that includes two essential tools: a translation memory (TM) and a term base (TB). The TM functions as a bilingual database of previously translated segments of text; it works by storing pairs of original source-language content alongside its approved translation. When a new text is imported, memoQ breaks it into smaller segments on the basis of punctuation and line breaks, and automatically conducts a search for exact and partial matches. These are then presented to the translator for approval and/or review.

Caption: A segment in memoQ with an exact match (100%) in the TM

Caption: A segment in memoQ with a partial match (85%) in the TM

While a human expert still has the final say on whether to accept any suggestion from the TM, frequently only a minor edit is needed to make the old translation suitable for the new context. This serves the double purpose of saving time and maintaining consistency across the catalogue as a whole. Translation memories tend to prove their worth the larger they are and the more repetition there is in the content. Having grown over the years since the start of the project, our TM now routinely recognises a third of content in a new file, and often much more.

While the TM grows organically over time by compiling and storing translation segments, the term base is maintained manually. It works as a glossary for key terms, allowing us to suggest preferred equivalents for individual words or phrases, and/or to blacklist translations that should be avoided. As the TB is visible to all parties at all stages of translation and proofing, it helps to ensure the consistency of these terms in Arabic.

Caption: A segment in memoQ with terms recognised by the TB highlighted in blue

Caption: Terms recognised by the TB, with approved translations in blue and forbidden ones in black

Authorities: making the most of memoQ

The TB has proved especially useful when it comes to translating authority files. An authority record serves to identify and describe a person, corporate body, family, place name, or subject term that is featured in a catalogue description. Each term is authorised and unique. As every record and every expert article on the QDL is linked to at least one authority file, they form an index through which users can search for all the content related to a specific term.

Caption: Authorities displayed as filters on the QDL

Caption: Authorities displayed at the end of a record on the QDL

To be effective, authorities must be reproduced in exactly the same way for every record. For the English side of the portal, they are extracted from the same central database each time, with no opportunity for them to mutate or change before arriving on the portal – but not so with the Arabic!

For every record, the linked authorities are included as part of the English text to be translated, no matter how many times they may have been translated in the past. This repetition of the process creates an opportunity for discrepancies to creep in. If, for instance, there are several new records, all linked to the same new authority, that are sent to several different translators, it is not only possible but quite likely that each translator will produce a valid but slightly different version of the term in Arabic. If the same records then also go to different proof readers, there is a good chance that the discrepancies will slip through unnoticed, rendering the Arabic authority much less useful than the English equivalent, as any one variant will not be linked to all the related content.

After spending much time and energy on trying (and sometimes failing) to catch these discrepancies at the end of the proofing process, we now make sure to pre-translate any new authority and add it to the TB, along with a unique identifying number (arkID), before sending the related files for translation. This means that when the term appears for translation, it is displayed in the TB along with its arkID, adding an extra means of checking whether this is the approved and appropriate translation for this specific context. Once confirmed and thereby added to the TM, it registers as a 101% match, meaning that there is an exact match not only in the text, but also in the metadata.

Caption: Authority term with arkID displayed in TB, registering as 101% match in memoQ

Cataloguing for Translation

Working in-house at the BL alongside the cataloguers allows the translation team to understand and appreciate their processes and standards, and has also allowed us to show them the impact of their decisions and choices on translation. Over time, we have developed guidelines to help them create the English records with translation in mind. For example, where possible, the cataloguers now use stock phrases for repeated content, leading to a much higher hit rate in the TM, and they understand that their use of punctuation can make a big difference to the likelihood of a match appearing.

Caption: Stock phrase with multiple TM hits in memoQ

Caption: List of correspondents written using punctuation marks to help break the text into smaller translation segments in memoQ

Small changes like this help to streamline the translation process, so we can focus on maintaining the QDL’s high standards across the Arabic side of the portal and make sure the content is just as accessible in either language.

Translation in Digitisation

In my work as a freelancer, I have found more often than not that clients arrive at translation as something of an afterthought. It is frustratingly common to find that they have budgeted neither the time nor the funds required for the work – the deadline tends to be yesterday, and the fee mere pennies. Pleasingly, this is not the case working on this project, where translation has been built into the process from the beginning and is understood to take time, thought, research, and expertise. Moreover, the decision to have an on-site team, working in the same office as the cataloguers, affords a rare opportunity to consult the specialists about their writing when queries inevitably arise, and to reciprocate by sharing our linguistic, cultural, and technical knowledge. We could of course always do more in our efforts to create bi- and multilingual resources for ever wider audiences, and with more and more institutions planning and investing in digitisation, there are deeper and broader questions about how, for whom, and in which languages we do so. Bilinguality has been a vital part of the QDL’s success in opening up the collections to new users and ought to be part of the ongoing discussions in digitisation.

See further:
Copyright:

Banner: Brief Principles of the Arabic Language ‎[F-1-14] (14/184), Qatar National Library, 10680, in Qatar Digital Library. Author: Filippo Guadagnoli. ©Qatar National Library. Usage Terms: Creative Commons Attribution Licence

memoQ Images:  ©memoQ.

QDL Images: ©Qatar National Library. Terms: Creative Commons Attribution Licence