Large Language Models are increasingly used across various sectors, including recruitment, where they assist in evaluating candidate profiles and opti- mizing hiring processes. While Large Language Models offer significant advan- tages in terms of cost and time efficiency, concerns regarding algorithmic bias have emerged, particularly in relation to gender and ethnic discrimination. This study employs a Factorial Survey Experiment to assess biases in six widely used Large Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By systematically varying candidate attributes such as sex, ethnicity, education, and age, we examine whether hiring recommendations are influenced by taste-based or statistical discrimination. Our findings indicate that among the six models tested, we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show signs of ethnic discrimination, though at varying degrees. These results underscore the need for greater transparency and stronger anti-bias measures in Large Lan- guage Models development and training. We advocate for enhanced oversight in AI-driven salary discrimination tools to mitigate discrimination risks and ensure fair and equitable recruitment practices. Our study highlights the broader impli- cations of biased Artificial Intelligence models, emphasizing the potential risks of productivity loss and workforce homogeneity if biases remain unaddressed

Artificial Intelligence and Discrimination: A Vignette Experiment of Labour Market Discrimination in Large Language Models

Giovanni Busetta;Maria Gabriella Campolo
;
Giovanni Maria Ficarra
2025-01-01

Abstract

Large Language Models are increasingly used across various sectors, including recruitment, where they assist in evaluating candidate profiles and opti- mizing hiring processes. While Large Language Models offer significant advan- tages in terms of cost and time efficiency, concerns regarding algorithmic bias have emerged, particularly in relation to gender and ethnic discrimination. This study employs a Factorial Survey Experiment to assess biases in six widely used Large Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By systematically varying candidate attributes such as sex, ethnicity, education, and age, we examine whether hiring recommendations are influenced by taste-based or statistical discrimination. Our findings indicate that among the six models tested, we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show signs of ethnic discrimination, though at varying degrees. These results underscore the need for greater transparency and stronger anti-bias measures in Large Lan- guage Models development and training. We advocate for enhanced oversight in AI-driven salary discrimination tools to mitigate discrimination risks and ensure fair and equitable recruitment practices. Our study highlights the broader impli- cations of biased Artificial Intelligence models, emphasizing the potential risks of productivity loss and workforce homogeneity if biases remain unaddressed
2025
978-3-031-96302-5
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3334490
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact