Critically assessing AI tools and cultural data in digital humanities

Door Jan Roekens | 19-05-2020

An increasing number of our rich cultural heritage are available in digital formats. Humanities scholars have therefore added AI and other computational techniques to their research methods. In her PhD thesis, Myriam Traub explores sources of bias in data and tools used by humanities scholars.

As more and more of our cultural heritage becomes available in digital formats, humanities scholars are increasingly adding artificial intelligence (AI) and other computational techniques to their research methods. However, the question is just how valuable the insights gained from these tools are. It proves to be surprisingly difficult to assess whether such insights constitute a meaningful and interesting trend or merely reflect an error or bias in the tools and data used. In her PhD thesis, CWI researcher Myriam Traub explores ways to better understand such limitations.

Quality issues

The difficulty to grasp these limitations partly involves well-known quality issues in data, such as errors in optical character recognition (OCR). These errors are easy to spot by scholars, and are widely recognized as a problem in the community. But even for such obvious problems, little is known about how these errors impact AI methods used for research further ‘downstream’. It is entirely unclear how the outcome of culturally-oriented research projects is affected when research methods are provided with erroneous or biased data as input.

Lesser-known sources of bias

On the other hand, there are other sources of bias of which only a few users are aware. One example of this is algorithmic bias in full text search. This has been studied for more than a decade. But still, there is little awareness around this topic when it comes to using search tools in a non-commercial digital library. For these lesser-known sources of tool bias, it is of key importance to measure the amount of bias. Only then researchers can assess its impact on the research conducted with these tools.

Examining digital method use

Myriam Traub explored sources of bias in data and tools used by humanities scholars. She addresses a number of these in her PhD thesis, which she defends today at Utrecht University. For her research, Traub interviewed humanities scholars on their use of digital methods and the role of these methods in the overall research process. She studied retrievability bias in the search engine of the Dutch historic newspaper archive, the impact of partially fixing OCR errors by using human computation, and the potential of crowdsourcing on difficult tasks that are traditionally seen as limited to domain experts.

Better techniques

In particular, Traub shows that digital humanities should not only quest for better performing tools and higher quality data, but also pursue better techniques to measure limitations in tools and data. Also, Traub addresses that better techniques are needed for conveying the results of these computational measures to humanities scholars interested in the historical artefacts or events expressed in the data.

Multidisciplinary collaboration

Traub calls for more intense, multidisciplinary collaboration between humanities scholars, data custodians and tool developers to better understand each other’s assumptions, approaches and requirements. This could help build not only the technical research infrastructure humanities scholars need. It should also help create the human infrastructure where scholars need to be trained in the skills necessary to routinely make critical assessments of the fitness of digital data and tools available in the technical infrastructure.

Traub performed her research at CWI within the research project SealincMedia, which is part of the national COMMIT/ program. Research partners were, amongst others, the Dutch National Library and Rijksmuseum.

Source: CWI, 11 May 2020

Auteur: Jan Roekens, Hoofdredacteur

Deze artikelen vind je vast ook interessant

Actueel, Onderzoek | 08 juli 2026

08 juli 2026

Waarschuwingslabels maken mensen ook wantrouwiger bij correcte informatie

Waarschuwingslabels onder online berichten helpen lezers desinformatie te herkennen, maar maken hen tegelijkertijd ook sceptischer over correcte informatie. Dat is de centrale bevinding uit het proefschrift van communicatiewetenschapper Lina Buttgereit aan de Universiteit van Amsterdam. ▼ Buttgereit onderzocht hoe mensen in het dagelijks leven omgaan met desinformatie. Daarvoor gebruikte zij een ongebruikelijke methode. Deelnemers uit…

Actueel, Onderzoek | 08 juli 2026

08 juli 2026

Vaardigheidskloof tussen migrantengezinnen en andere Nederlanders krimpt sterk binnen één generatie

Kinderen van migranten uit Turkije, Marokko, Suriname en de voormalige Nederlandse Antillen halen een groot deel van de achterstand van hun ouders in op het gebied van taal en rekenen. Volgens nieuw onderzoek van hoogleraar Dinand Webbink van Erasmus School of Economics en wetenschappers Tijana Prokic-Breuer en Stan Vermeulen van Maastricht University krimpen de verschillen…

Actueel, insights impact, Onderzoek | 08 juli 2026

08 juli 2026

Niet kopers zijn een goudmijn voor marktonderzoek

Stel: je voert een survey uit voor een bekende fastfoodketen. Logisch dat je je richt op kopers en potentiële kopers, zij leveren tenslotte de omzet. Toch blijft daarmee een belangrijke groep buiten beeld: mensen die het merk bewust links laten liggen. Dat is een gemiste kans, stellen onderzoekers in een analyse op Quirk’s. Wie begrijpt…

Analytics

Data Science

Marktonderzoek

Branches

Events

Overig

Critically assessing AI tools and cultural data in digital humanities

Quality issues

Lesser-known sources of bias

Examining digital method use

Better techniques

Multidisciplinary collaboration

Gerelateerde bytes

Waarschuwingslabels maken mensen ook wantrouwiger bij correcte informatie

Vaardigheidskloof tussen migrantengezinnen en andere Nederlanders krimpt sterk binnen één generatie

Niet kopers zijn een goudmijn voor marktonderzoek

Onderzoek: steeds meer kiezers zijn de Tweede Kamerverkiezingen moe

Essent over data en klantgedrag: modellen voorspellen, mensen verklaren

Vacatures

Fulltime Senior (Neuromarketing) Researcher at Unravel

Over Daily Data Bytes

Adverteren

Kennispartner worden?

Meld je nu aan voor de Daily Data Bytes nieuwsbrief!

Analytics

Data Science

Marktonderzoek

Branches

Events

Overig

Critically assessing AI tools and cultural data in digital humanities

Quality issues

Lesser-known sources of bias

Examining digital method use

Better techniques

Multidisciplinary collaboration

Deze artikelen vind je vast ook interessant

Gerelateerde bytes

Vacatures

Meld je nu aan voor de Daily Data Bytes nieuwsbrief!

Over Daily Data Bytes

Adverteren

Kennispartner worden?

Meld je nu aan voor de Daily Data Bytes nieuwsbrief!