Stories

Investigating Artificial Intelligence: What lies beyond the algorithm

Artificial Intelligence has become an integral part of our daily lives — but the real cost is often borne by others. At the Dataharvest conference, journalists presented their investigations and shared tips on how to dig deeper into this technology.

Nathan Schepers is not a journalist. A few months ago, when the non-profit organisation Algorithm Watch invited journalists to investigate Artificial Intelligence (AI), Schepers came into contact with investigative journalism for the first time. 

Gabriel Geiger, by contrast, is a journalist who investigates algorithms for Lighthouse Reports, an international non-profit journalism collaborative newsroom organisation. His workshop at the Dataharvest 2025 – European Investigative Journalism Conference opened with these words: “Obviously, there’s a lot of discussion about AI currently in the news, and it’s this trendy topic, and everyone’s talking about it all the time at different conferences, but a lot of those talks are really focused on using AI in your newsroom, or the limits thereof of using AI in your new room. That’s not what we’re gonna be talking about today.”  

At the conference in Mechelen, Belgium, Geiger, Schepers, and others focused on how to investigate AI itself and examine its real-world impact. 

Talking to workers 

The best way to start investigating AI isn’t by opening a computer science textbook — it’s by understanding how the system functions like a production line. 

In this line, the first essential ingredient is data. That data “feeds” the model, enabling it to generate any kind of output. In this process, so-called data annotators play a crucial role. These are people whose sole job is to annotate and categorise the data that will be used to train the AI — essentially telling the model what the information is, how to recognise it, how to analyse it, and how to respond to it. 

When journalist Michael Bird began researching the topic he would later cover with Schepers, he learned of a worker at Outlier – a company that employs remote data annotators – raising complaints about working conditions. Around the same time, a Facebook post revealed that workers from the so-called Global South were trying to obtain European user IDs on the digital black market to boost their daily earnings. 

To investigate the working conditions of data annotators, Schepers and Bird spoke with groups of workers who had gathered on online platforms and social media. The workers themselves described their mechanical and isolating work, the lack of transparency, the low pay, and the exploitation they experience from the large companies whose data they annotate. 

Data centres and drinking water 

The production process itself can be part of the problem. To understand how AI is created, we need to distinguish between two levels: the physical (or material) and the technical. 

The physical level includes large structures known as data centres — buildings that house computers, telecommunications systems, and storage infrastructure. Companies like Microsoft and Meta use these centres to store, process, and distribute vast amounts of data. 

As Naiara Bellio, Head of Journalism at AlgorithmWatch, explained, key questions to guide an investigation include how data centres operate, who owns them, who builds and maintains them, and how they affect both workers and the local environment. Many of these centres are built in low-humidity areas to help maintain stable temperatures. Investigations like this one – by Manuel G. Pascual for El País in 2023 – have shed light on the enormous water consumption of data centres. A year later, journalist Karen Hao of The Atlantic revealed that data centres continuously cool their systems using air and evaporated drinking water — a necessity, as computers must remain at specific temperatures to avoid overheating. 

In contrast, the technical level involves how AI systems actually generate results. According to Article 3 of the European Regulation on AI, an AI system means a machine-based system that is designed to operate with varying levels of autonomy that may exhibit adaptiveness after deployment, and that infers, from the input it receives, how to generate outputs Bellio stressed that not all algorithms are forms of artificial intelligence. There are distinct types, such as deep learning, computer vision, generative systems, and natural language processing — the kind used in chatbots, for example. It’s important for journalists to recognise these differences and preserve those distinctions in their reporting. By clearly identifying which type of AI they’re referring to, journalists can better communicate their findings to institutions, companies, and readers. 

Why crack the algorithm when you can work around it? 

In his 2024 investigation Sweden’s Suspicion Machine, Geiger examined an algorithm used by the Swedish welfare service to create risk profiles for social benefit fraud. Using statistical methods, the algorithm processed personal data — such as gender, age, marital status, and even place of residence — to assess the likelihood that someone might commit fraud. Individuals flagged as high-risk were referred to the authorities, who then launched investigations. In essence, the algorithm was used to “facilitate” decisions about who should be investigated. 

Access to the algorithm developed by the company behind it was a key issue in Geiger’s investigation. When the company refused to provide it, the methodology, statistical data, and documentation he was able to obtain still gave him enough insight into the algorithm’s logic. 

Gabriel Geiger and Soizic Pénicaud during the session “How to investigate AI: advanced level”

In 2023, while researching a similar system used by the municipality of Rotterdam, a random file transfer unexpectedly revealed part of the training data that had been used to develop the system. It was the only time during the Suspicion Machine series of investigations that Geiger was able to access such data. 

In general, however, obtaining access to the original data that feeds AI models is very difficult in Europe, as much of it is protected under GDPR regulations. So in the case of Sweden, instead of trying to obtain the full source code — and hoping for a lucky break like the one in Rotterdam — Geiger asked for documentation on how the data was used and technical specifications of the system’s design. 

“Did we understand how the system works? No,” Geiger told the audience. But they had enough information to tell a story that revealed possible bias or errors in the welfare model. 

It always comes back to people 

When reporting focuses on algorithms, it’s easy for the story to become overly technical or bogged down in the mechanics of AI. While the algorithm may appear to be the problem, Bellio emphasised that people must remain at the heart of every investigation. “The algorithm cannot be held accountable today,” she said. “So if we talk about artificial intelligence doing things, as in this case, we don’t have like a subject to blame and to hold accountable for what is going on.” Humans design the systems, build the models, and make the final decisions. And ultimately, they are also the ones affected by those systems. 

For Geiger, the key is to highlight the real-world impact these systems have on people’s lives. In the case of the welfare algorithm, even verifying the system’s credibility – that is, whether a flagged individual actually committed fraud – can come with serious consequences. 

Soizic Pénicaud, a researcher at the Observatoire des algorithmes publics, also focuses on the human experience. At the same time, she understands the weight the term Artificial Intelligence carries in a headline. While a story might centre on people, it’s often the AI buzzword that draws readers in. Especially when you deal with welfare and social protection, to speak crudely, no one cares about the core, but everyone cares about AI. And so the question is, how do you use AI as the hook to get people to care about the humans?” 

Translation: Anatoli Stavroulopoulou