Large Language Models (LLMs), although they can initially be described as predictors of the next token in a given sequence, surprisingly display linguis- tic behaviors that resemble those of humans. This suggests the existence of an underlying, sophisticated cognitive system in language production. Such an intriguing circumstance has inspired the use of psychological theories as investigative tools, and the present research falls within this line of inquiry. Our aim is to investigate the potential existence of a core integration of abil- ities in language comprehension and production, metaphorically parallel to human cognitive architecture. To this end, the study proposes two experi- mental components, both focusing on LLMs. In the first part, we employed a well-established psychological theory of narrative coherence in autobiographi- cal stories; in the second, we applied a set of Theory of Mind (ToM) tests. For the evaluation of narrative coherence, the same methodology was applied to autobiographical stories generated by GPT-3.5 and an equal number gener- ated by GPT-4. These stories were elicited by asking the models to assume roles varying along dimensions such as gender, mood, and age. The large number of stories ensures adequate sampling, given the stochastic nature of the models, and was made possible thanks to the adoption of an automated coherence evaluation procedure. The evaluation process was enabled through targeted model training by means of prompting and fine-tuning. The results of this training showed excellent accuracy in the automatic assessment of narrative coherence. Overall, the analysis of the autobiographical stories revealed levels of narrative coherence in the models that are fully consis- tent with data from human subjects, with slightly higher values in the case of GPT-4. These results suggest a high degree of knowledge integration in the models, comparable to the integration of the self in human beings. For the ToM component, prompts with multiple inductions were applied to the models to simulate personality, age, occupation, and gender. Subsequently, the models were administered the classic “Reading the Mind in the Eyes” ToM test, presented in both textual and visual forms, across nine different LLMs. The results confirmed the models’ ability to successfully complete these tasks, often with a success rate higher than that observed in studies with human participants. Although LLMs are not specifically designed to address cognitive tasks, their success in complex domains such as narrative coherence and ToM, as proposed in the present study, together with their fulfillment of many criteria typically used to describe cognitive architectures in cognitive science, may suggest the emergence of an artificial configuration of cognitive architectures.

Large Language Models as Cognitive and Psychology Tools

ACCIAI, ALESSANDRO
2025-11-01

Abstract

Large Language Models (LLMs), although they can initially be described as predictors of the next token in a given sequence, surprisingly display linguis- tic behaviors that resemble those of humans. This suggests the existence of an underlying, sophisticated cognitive system in language production. Such an intriguing circumstance has inspired the use of psychological theories as investigative tools, and the present research falls within this line of inquiry. Our aim is to investigate the potential existence of a core integration of abil- ities in language comprehension and production, metaphorically parallel to human cognitive architecture. To this end, the study proposes two experi- mental components, both focusing on LLMs. In the first part, we employed a well-established psychological theory of narrative coherence in autobiographi- cal stories; in the second, we applied a set of Theory of Mind (ToM) tests. For the evaluation of narrative coherence, the same methodology was applied to autobiographical stories generated by GPT-3.5 and an equal number gener- ated by GPT-4. These stories were elicited by asking the models to assume roles varying along dimensions such as gender, mood, and age. The large number of stories ensures adequate sampling, given the stochastic nature of the models, and was made possible thanks to the adoption of an automated coherence evaluation procedure. The evaluation process was enabled through targeted model training by means of prompting and fine-tuning. The results of this training showed excellent accuracy in the automatic assessment of narrative coherence. Overall, the analysis of the autobiographical stories revealed levels of narrative coherence in the models that are fully consis- tent with data from human subjects, with slightly higher values in the case of GPT-4. These results suggest a high degree of knowledge integration in the models, comparable to the integration of the self in human beings. For the ToM component, prompts with multiple inductions were applied to the models to simulate personality, age, occupation, and gender. Subsequently, the models were administered the classic “Reading the Mind in the Eyes” ToM test, presented in both textual and visual forms, across nine different LLMs. The results confirmed the models’ ability to successfully complete these tasks, often with a success rate higher than that observed in studies with human participants. Although LLMs are not specifically designed to address cognitive tasks, their success in complex domains such as narrative coherence and ToM, as proposed in the present study, together with their fulfillment of many criteria typically used to describe cognitive architectures in cognitive science, may suggest the emergence of an artificial configuration of cognitive architectures.
nov-2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3343584
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact