A Space for Critical
Design Thinking
Conversations
The BackRoom of Hidden Labour behind Machine Vision with Nicolas Gourault
Who authors the vision of AI, and whose labour disappears behind it? This episode of WebRTC BackRooms traces Nicolas Gourault’s investigations into datasets, annotation work, and multi-channel image pipelines — exposing a distributed system of invisible labour and economic asymmetries embedded in machine vision.

This Conversation is an excerpt from the episode The Backroom of Hidden Labor Behind Machine Vision with Nicolas Gourault from the WebRTC BackRooms Podcast by Me AndOther Me. You can listen to the full episode through our podcast channel or by watching the video version on the Me AndOtherMe Substack page.

Me AndOther Me Who really shapes the vision of an AI? What does it mean when self-driving cars quietly depict and categorise the world around us without our consent or awareness? And how might we reclaim some agency over these operational images, which are quietly redrawing the maps of our cityscapes?

Hi Nicolas, thanks for being here with us! Let’s start with your films. Unknown Labels and Their Eyes take us into the hidden world of people who train artificial vision systems — the annotators who label images so machines can learn to see. It’s a part of AI that most people rarely think about. We’re curious how you first stepped into this world. What drew you to the lives and labour of annotation workers in the Global South? And how did your research process begin?

Nicolas GouraultSo it started, I guess… I was already working around ideas of mass culture. I had done previous projects on football supporter culture and the idea of the crowd. At the same time, I was interested in how images are produced — how the apparatus of vision is actually constructed.

I began thinking about how the devices we use today to record the world for machines are trained. It came from curiosity. Then I listened to a talk by Matteo Pasquinelli, who is very influential in this field, and that’s where I discovered the connection between computer vision systems and a mass of invisible workers. For me, there was a link between the idea of the invisible crowd — the crowd of workers — and the apparatus of vision. That connection stayed with me.

"For me, there was a link between the idea of the invisible crowd — the crowd of workers — and the apparatus of vision. That connection stayed with me."

Nicolas Gourault

As I started researching, I came across the work of Florian Schmidt, a design researcher, who had written about how, around 2018, there was a sharp increase in demand for human labour to train AI systems — especially for self-driving cars developed by German manufacturers like BMW and Audi. He described how this demand intersected with what was happening in Venezuela at the same time, where a severe economic crisis pushed many people into precarious conditions. Since many already had access to laptops, internet connections, and the necessary skills, they began working on training self-driving systems. In a way, they became part of the labour force behind these German companies.

This became a key starting point for me: understanding how a technological need — training autonomous vehicles to read the world — intersected with a specific socio-economic condition. It revealed a connection that also pointed toward asymmetric power relations between the Global North and the Global South.

From there, I kept pulling threads, and I realised this topic connected many of my initial interests. It offered a way to explore a complex phenomenon through a very material perspective — by looking closely at how people actually work, how they annotate images, and how they engage with these systems. I became interested in focusing on the human side of this infrastructure. Once I understood that there was a labour force behind it, I felt it was something I could access and engage with directly. It gave me a way into a complex system while maintaining a human, almost intimate perspective.

At the same time, I was trying to grasp a system that is extremely difficult to visualise because it is distributed globally. It operates through networks and tools that stretch across vast spaces. On a cinematic level, I felt a strong need to render this invisible phenomenon visible — something that remains hidden precisely because of its scale.

This also connects to the circulation of images. We are used to thinking about images through social media — images constantly circulating across the world. But there is another layer of images that circulate more quietly, under the radar. These are working images — images used within processes, often for machines. I became interested in this idea, especially in relation to what we might later discuss through Farocki: operational images — images that become invisible because they are embedded within systems and processes.

"We are used to thinking about images through social media — images constantly circulating across the world. But there is another layer of images that circulate more quietly, under the radar. These are working images — images used within processes, often for machines."

Nicolas Gourault

The images that workers annotate are interesting in this sense. They are often similar to something like Google Street View — street scenes where objects are outlined and labeled. These images sit somewhere in between. On one hand, they are part of a technical process meant to train AI. On the other, they are still seen and handled by humans.

What struck me early on was that, despite the assumption of automation, humans are present throughout the process. These images are not only processed by machines; they are continuously seen, checked, and interpreted by people. So these images exist in a kind of hybrid state — between technical images used by machines and more familiar images that circulate among humans as part of communication and meaning-making.

To wrap it up, I would say that images became a way to grasp this global system. They connect different parts of the world, different forms of labour, and different technical infrastructures. They link the Global South, where much of the labour happens, to the Global North, where these systems are deployed. In that sense, images function as a linking device — something that allows us to understand and talk about the system as a whole. And for me, that was also essential in shaping the film.

M&OM Yes, and it really comes through as part of your style. As you mentioned, it feels very Farockian — the way you place images side by side in your films, creating correlations that aren’t immediately obvious or taken for granted.

Through your montage, you bring together images that seem unrelated at first, yet on a global scale are deeply connected and affect one another. I wanted to ask you for a bit more insight into that. Could you share some of your conversations with the annotators? How does the labeling process actually unfold from their perspective? What do they think about the work? We see parts of this in your films, but it would be great to go through it once more in your own words.

NG Yes, what I can say is that the process of interviewing was quite complex. I was trying to find a way to build a connection with the workers I was speaking to — how to get closer to them — while also acknowledging the distance that exists. Because in a way, I’m also part of the other side of these images. Coming from a city in the Global North, I’m embedded in the same system, one that carries a strong asymmetry of power. So I was trying to create a connection while still maintaining that awareness of distance, which is a strange balance.

This meant I had to develop a kind of protocol for the interviews. One important approach was to collaborate with people based in the regions I was researching. I worked with journalists and researchers in Venezuela and Kenya, and during interviews it would always be the three of us: myself, the research assistant — who would also translate when needed — and the person being interviewed. This helped create a sense of proximity and trust.

In terms of content, I focused on the process of work. I would usually ask something like, “Can you show what you’re working on at the moment?” Many would share their screen, so we could talk while they were actively annotating images. This was important for me because it allowed access to very specific details. These details are often overlooked — you don’t think they matter, and when you retell the story later, they disappear. But when you see the screen, all these small elements are there. And once you start paying attention to them, they become significant.

That’s also why, in the film, I highlight the categories the workers use. They work with predefined labels, often in PDF documents. One category, for example, is “covered human,” which refers to people lying on the street under sheets or fabric. When I asked workers about it, they often didn’t mention it. It’s something they learn at the beginning and then forget because it becomes routine. But when we looked at the document together during the screen recording, I could pause and ask, “What is this?”

That interruption of the flow made it possible to focus on a detail that would otherwise remain invisible — precisely because it’s so mundane. It’s part of the banality of the work. Rather than focusing on personal narratives alone, the protocol was really centered on the work itself: how they interact with the interface, how they use the tools, how they perform these repetitive tasks. And it was also important for me on a personal level. The work they do — annotating images — resembles, in some ways, my own practice as an artist or graphic designer. I use similar tools; I do rotoscoping and related processes. So this became a way to connect through the act of working itself. Given the distance between us, that felt like a more grounded and legitimate form of connection.

M&OM I have two thoughts that come to mind — things we didn’t discuss before. The first is about the nature of this work. It seems to carry a significant level of responsibility, because these workers are effectively teaching machines that will later operate in the real world. Even if that responsibility isn’t always visible in the moment, it has real consequences.

At the same time, I wonder about the limits of this process. If I had to do this work myself in a place I don’t know, I’m quite sure I wouldn’t be able to categorise it properly. Even between countries — say, labeling a city in France instead of Austria — there are different rules, behaviors, and cultural nuances. And that’s still relatively close. So I’m curious whether you ever had the impression that this kind of work resists globalisation — that it can’t fully be outsourced in the way it currently is.

The second question relates to categorisation itself. These systems rely on breaking the world down into fixed categories, but that process seems to flatten all the small differences, the ambiguities, the human and environmental variations that shape reality. How do you think about that? How do we deal with the gap between these rigid classifications and the complexity of the world they’re trying to describe?

"These systems rely on breaking the world down into fixed categories, but that process seems to flatten all the small differences, the ambiguities, the human and environmental variations that shape reality."

Me AndOther Me

NG Yes, that was a major question for me throughout the project. It’s also something that appears in the film — the question of subjective bias. There was a lot of discussion around this, especially in relation to human labour in AI training. It often came up in contexts like content moderation or classification: to what extent does the subjectivity or cultural background of the person doing the labeling influence the outcome? That influence is definitely present. But what I also discovered during the research is how strongly the system is designed to minimise or suppress that subjectivity. For example, when I mentioned the PDFs with categories, these come with detailed guidelines, training sessions, and review systems. Workers are taught how to label, which categories to use, and how to apply them. They’re not left to interpret things freely, and they can’t create their own categories — they have to work within a predefined structure.

So then the question shifts: who designed these categories in the first place, and based on what assumptions? This became a central issue for me. And it’s something I tried to approach in the film — not directly through interviews, but through montage. That’s where the idea of the contrechamp comes in — the counterpart to the workers. You begin to see their surroundings, their environments, and that opens up a question: can these categories actually be generalized? Do they still make sense outside the context they were designed for?

"the question shifts: who designed these categories in the first place, and based on what assumptions? This became a central issue for me."

Nicolas Gourault

Because most of the images the workers were annotating came from specific places — often the US, sometimes China. In one of the projects I focused on, the dataset came from Las Vegas, which is itself a very particular environment. So the question becomes: how do these categories translate to places that are very different from that context? Do they still fit? Do they still make sense?

I kept returning to this, but I realised I couldn’t fully address it through direct questioning. Instead, I tried to approach it through the structure of the film — through montage — by creating that moment where the camera turns around and reveals another perspective.

M&OMWhat you just described connects closely to the question of authorship. It seems distributed across an anonymous workforce, the machine’s vision, and the corporations behind it. A self-driving car recognising a stop sign may appear effortless, but your work reveals the many human decisions and layers of labour behind that moment of recognition. In this distributed scenario of agency and authorship, who should be held responsible for the actions of a self-driving car?

I’m asking this in relation to the event you highlight in your film — the accident. The narrative begins with a collision in which an autonomous car strikes a pedestrian. Could you walk us through that moment? How did the situation unfold, and what does it reveal about authorship?

NG Yes, this connects to a previous film I made called VO, which also investigates the development of self-driving cars. The starting point is a fatal accident that happened in 2018 — the first deadly collision between a self-driving car and a pedestrian in the US. The vehicle was operated by Uber at the time.

These systems — robotaxis and autonomous vehicles more broadly — are extremely complex. They involve many components, many layers of training, and many people. So they immediately raise questions of accountability and liability, especially when they can cause harm. That question became central for me, and I continued exploring it further in Unknown Label, by looking more closely at the training process itself. For instance, how do you decide whether something is a pedestrian or just an object on the street? These kinds of distinctions are already embedded at the level of annotation.

In VO, the situation was particularly complex. The car was driving autonomously when it hit the pedestrian, but there was also a vehicle operator (VO) inside. Their role was to monitor the system and take control if necessary. The more I looked into it, the more I realised there were significant issues with the working conditions. These operators were doing shifts that were too long to sustain attention. At a certain point, people naturally lose focus — it’s just how the human brain works. So there was the responsibility of the operator, but also the responsibility of the company — Uber — and the conditions under which the system was deployed.

There was another layer as well, which I didn’t fully explore in the film but came up in the research: urban planning. The pedestrian was crossing the road at night, outside of a designated crossing, and many people initially framed it as her fault. But it turned out that there had previously been a crosswalk at that exact location, which had been removed. So even urban design decisions became part of the chain of responsibility.

It’s a highly layered situation. But what’s striking — and quite troubling — is how responsibility was ultimately assigned. When it came to legal consequences, the only person prosecuted was the vehicle operator, the person behind the wheel. She was the one held accountable and eventually sentenced. In many ways, she became the single point of responsibility within a much larger system. So you have this complex network of technological, corporate, and infrastructural factors, but legally it was reduced to one individual — arguably the person with the least power and the least overview of the system as a whole. There’s something deeply ironic and cynical in that.

M&OM I think this also connects to the broader question of invisibility around this kind of work. The test driver ends up being held accountable because the processes behind the system remain largely unseen. Legally and socially, there is very little awareness of what actually happens behind these technologies.

"There’s a kind of total invisibility — the labour itself is rarely seen, as well as the people doing the labeling and the datasets operating in the background."

Me AndOther Me

"there is a clear overlap with forms of neocolonial organisation, especially in relation to digital labour."

So I’m curious: why do you think this work remains so hidden? Why is it kept out of sight? And what shifts once we begin to recognise and acknowledge the social workforce behind these datasets?

NG Yes, I’ve been thinking about that question quite a lot. On one level, there’s a very mundane explanation. Infrastructure tends to remain invisible. It’s designed that way so everything appears to work seamlessly. Most people don’t really want to think about how something is built as long as it functions. They just want to use it. So there’s a kind of everyday indifference toward the processes behind what we rely on.

But during my research, it became clear that there’s more to it — especially in the case of microwork platforms. These systems are structured in a very specific way. The platforms are essentially double-sided. On one side, there is the public-facing platform — the one that attracts clients, promotes services, and presents itself openly. On the other side, there is a hidden interface used by workers. It’s the same company, but the two sides are separated, sometimes even under different names.

This separation creates a distance between workers and the rest of the system. Workers don’t know who they are ultimately working for. They can’t directly address or challenge the clients, because those relationships are obscured. So beyond the mundane invisibility of infrastructure, there is also a deliberate effort to obscure how the system operates. This fragmentation prevents forms of resistance. That’s also why, in the film, I was interested in showing the informal ways workers communicate with each other. They share information about jobs, payment conditions, and sometimes ways to navigate or bypass the system.

For me, that was an important part — to show that the system is designed in a way that weakens the position of workers, while at the same time, workers still find ways to negotiate, adapt, and push back, even if those strategies remain fragile and temporary.

As for what changes once we begin to recognise this labour, I think it starts with simply understanding what goes into these systems. Things that seem straightforward — like a car driving itself — are actually built on many layers of human decisions and interventions. It becomes harder to take these technologies for granted once you see that. There’s also a broader discourse around automation that presents it as objective or more reliable because it removes human involvement. But in reality, human labour is deeply embedded in these systems, and will remain so for quite some time.

So making this visible matters. And then, maybe, there is also the possibility of change. I don’t know how much a project like mine can influence that, but there are examples. For instance, when MIT Technology Review published an investigation into these labour conditions, companies like Google were pushed to respond and reconsider their suppliers.

At least on the surface, this led to changes toward providers that claimed to offer better working conditions. But even there, things remain ambiguous. One worker I spoke to suggested that these changes were also driven by cost — simply moving to a cheaper supplier rather than genuinely improving conditions. So it’s a complicated situation. But I still think that making this work visible can contribute, even slightly, to shifting the balance — maybe not fully, but at least making it a bit less unequal.

M&OM There’s one last question we’ve been thinking about: if we zoom out from the image itself and consider artificial intelligence in terms of the materials that sustain it — servers, sensors, batteries — we start to see that everything relies on minerals extracted under highly unequal conditions. This suggests a parallel between historical colonial trade routes and today’s digital infrastructures, both in terms of material extraction and data extraction, and how these are controlled.

We know you’ve reflected on this, so we’re curious to hear your perspective. How do you see these connections? And if we were to map what artificial intelligence is today, what would that map look like?

NG Yes, it’s a very difficult and complex thing to grasp. For me, this project was really a first step — an attempt to offer an initial layer of understanding. There are, of course, much more extensive works on this. Kate Crawford and Vladan Joler, for example, have produced an incredibly detailed cartography of these systems. Their work shows just how many layers are involved. It’s almost an endless task, because the scale is so vast.

From what I’ve seen, there is a clear overlap with forms of neocolonial organisation, especially in relation to digital labour. Language already plays a role here. English, as a legacy of colonial history, becomes an entry point into these labour networks. In places like the Philippines or Kenya, where English is widely spoken, workers can directly access platforms tied to markets in the US or the UK. In Kenya, for instance, the historical connection to England is still very present, and it shapes how these labour infrastructures operate.

At the same time, I’ve been looking more closely at the material side of these systems. I recently completed a work called 200,000 Hours a Day, where I try to trace the fabrication of an NVIDIA GPU. These chips are at the core of most contemporary AI systems. Even when you focus on something as seemingly simple as a GPU, the supply chain is already extremely dispersed. Materials and components come from many different places.

But what I found striking is that, within this global distribution, there are also very specific bottlenecks — points where production becomes highly concentrated. China’s role in rare earth minerals is a well-known example, accounting for a large portion of global production. Another example I came across is high-purity quartz, which is used to produce silicon for processors. A significant portion of this material comes from a single location: Spruce Pine in North Carolina.

That was quite surprising. You tend to think of these systems as fully globalised and impossible to pin down, yet they also rely on very localised sites. And because of that, the system becomes vulnerable in unexpected ways. A local disruption — something as simple as extreme weather in one region — can affect the entire global supply chain. So what you have is this strange condition: a system that is globally distributed but also deeply dependent on specific, local points. That’s something I’m still trying to understand — how to map this kind of structure, where global networks and local dependencies are so tightly entangled.

M&OM Thank you so much, Nicolas, for sharing insights from your films and for taking the time to speak with us.

NG Thank you for the questions.

This Conversation is an excerpt from the episode The Backroom of Hidden Labor Behind Machine Vision with Nicolas Gourault from the WebRTC BackRooms Podcast by Me AndOther Me. You can listen to the full episode through our podcast channel or by watching the video version on the Me AndOtherMe Substack page.

BIOS

Nicolas Gourault is a Paris-based artist and filmmaker with a background in visual arts and visual studies. He has worked with Forensic Architecture before graduating from Le Fresnoy, Studio national des arts contemporains. His work is imbued with this double training, navigating between online open-source investigations and the critical use of new media as documentary tools. His films and video installations explore the power relationships embedded in technologies and tries to build counter-narratives through the use of situated testimony and experimental image-making.

Me AndOther Me is a new media-driven artistic and architectural research studio exploring the future of our spatial experiences and communication through practical applications of social mixed reality experiences focused on online culture, counter-platforms, and the spatial web. The studio is directed by Innsbruck-based architects, educators and researchers Cenk Güzelis and Anna Pompermaier. They are interested in how social media and the internet have evolved to accommodate online communities in networked virtual spaces that have become alternative places to practice social and cultural activities, and how these virtual spaces affect the architecture of our social lives and social selves.

PODCAST CREDITS

Direction & Production: Me AndOther Me

Virtual Camera: Cenk Güzeliş, Luca Lazzari, Viktoria Märkl, Lilly Krüger, Ruben Ungerathen, Adrian Weiss, Linus Memmel

Technical Setup: Me AndOther Me, Luca Lazzari

Sound Design: Mehmet Cakir

Audio Mix: Kristaps Andris Austers

Volumetric Streaming: Me AndOther Me, Marek Simonik (Record3D)

Text: Me AndOther Me

Thanks to the ORF III Cultural Advisory Board. Produced with the support of the Federal Ministry for Housing, Arts, Culture, Media and Sport as part of the funding program Pixel, Bytes + Film.

Interviewer(s)
Interviewee(s)
Published
08 May 2026
Share
Related Articles by topic 'Digital'
Related Articles by topic ‘Politics