Large Language Models are increasingly used across various sectors, including recruitment, where they assist in evaluating candidate profiles and opti- mizing hiring processes. While Large Language Models offer significant advan- tages in terms of cost and time efficiency, concerns regarding algorithmic bias have emerged, particularly in relation to gender and ethnic discrimination. This study employs a Factorial Survey Experiment to assess biases in six widely used Large Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By systematically varying candidate attributes such as sex, ethnicity, education, and age, we examine whether hiring recommendations are influenced by taste-based or statistical discrimination. Our findings indicate that among the six models tested, we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show signs of ethnic discrimination, though at varying degrees. These results underscore the need for greater transparency and stronger anti-bias measures in Large Lan- guage Models development and training. We advocate for enhanced oversight in AI-driven salary discrimination tools to mitigate discrimination risks and ensure fair and equitable recruitment practices. Our study highlights the broader impli- cations of biased Artificial Intelligence models, emphasizing the potential risks of productivity loss and workforce homogeneity if biases remain unaddressed
Artificial Intelligence and Discrimination: A Vignette Experiment of Labour Market Discrimination in Large Language Models
Giovanni Busetta;Maria Gabriella Campolo
;Giovanni Maria Ficarra
2025-01-01
Abstract
Large Language Models are increasingly used across various sectors, including recruitment, where they assist in evaluating candidate profiles and opti- mizing hiring processes. While Large Language Models offer significant advan- tages in terms of cost and time efficiency, concerns regarding algorithmic bias have emerged, particularly in relation to gender and ethnic discrimination. This study employs a Factorial Survey Experiment to assess biases in six widely used Large Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By systematically varying candidate attributes such as sex, ethnicity, education, and age, we examine whether hiring recommendations are influenced by taste-based or statistical discrimination. Our findings indicate that among the six models tested, we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show signs of ethnic discrimination, though at varying degrees. These results underscore the need for greater transparency and stronger anti-bias measures in Large Lan- guage Models development and training. We advocate for enhanced oversight in AI-driven salary discrimination tools to mitigate discrimination risks and ensure fair and equitable recruitment practices. Our study highlights the broader impli- cations of biased Artificial Intelligence models, emphasizing the potential risks of productivity loss and workforce homogeneity if biases remain unaddressedPubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


