Purpose: Differentiating conjunctival intraepithelial neoplasia (CIN) from pterygium and pseudopterygium remains clinically challenging, especially in early or atypical presentations. Artificial intelligence (AI), particularly large language models such as ChatGPT, may serve as decision-support tools by synthesizing complex diagnostic input. This study aimed to evaluate the diagnostic reliability of ChatGPT-4 in distinguishing between these ocular surface lesions and its ability to recommend appropriate treatment strategies based on multimodal clinical data. Methods: Sixty anonymized case profiles were compiled from patients with histopathologically confirmed CIN (n = 20), pterygium (n = 20), and pseudopterygium (n = 20) from January 2024 to March 2025. Each case included demographic details, clinical findings, anterior segment optical coherence tomography (AS-OCT), and in vivo confocal microscopy (IVCM) descriptions. For each case, clinical data—including patient history, slit-lamp findings, anterior segment optical coherence tomography (AS-OCT), and in vivo confocal microscopy (IVCM) results—were synthesized into standardized clinical vignettes. These were uploaded to ChatGPT (version 4.0), prompting the model: “Based on this case, what is the most likely diagnosis and what treatment would you recommend?” The model’s outputs were compared to the final clinical diagnoses and treatment plans made by a panel of three ocular surface disease specialists with more than 5 years of experience. It is important to note that ChatGPT analyzed only structured multimodal textual summaries derived from clinical examination, AS-OCT, and IVCM descriptions, and did not process raw clinical images. Results: ChatGPT accurately identified CIN in 85% of cases, pterygium in 75%, and pseudopterygium in 70%. Misclassifications primarily occurred between CIN and pseudopterygium, particularly in cases with inflammatory or traumatic histories. ChatGPT’s treatment recommendations agreed with expert judgment in 80% of cases, with the highest accuracy observed in CIN cases (90%). The model demonstrated a tendency to over-treat benign lesions but rarely missed neoplastic diagnoses. Conclusion: ChatGPT-4 showed promising accuracy in differentiating between CIN, pterygium, and pseudopterygium using detailed multimodal clinical data. While not a substitute for clinical expertise, it may serve as a useful triage or decision-support tool, particularly in settings with limited access to subspecialists. Further integration with image-based AI systems could enhance diagnostic performance.

Accuracy of ChatGPT in differentiating ocular surface neoplasms using text-based summaries: a comparative analysis of pterygium, pseudopterygium, and CIN

Mancini M.;Meduri A.
Ultimo
2026-01-01

Abstract

Purpose: Differentiating conjunctival intraepithelial neoplasia (CIN) from pterygium and pseudopterygium remains clinically challenging, especially in early or atypical presentations. Artificial intelligence (AI), particularly large language models such as ChatGPT, may serve as decision-support tools by synthesizing complex diagnostic input. This study aimed to evaluate the diagnostic reliability of ChatGPT-4 in distinguishing between these ocular surface lesions and its ability to recommend appropriate treatment strategies based on multimodal clinical data. Methods: Sixty anonymized case profiles were compiled from patients with histopathologically confirmed CIN (n = 20), pterygium (n = 20), and pseudopterygium (n = 20) from January 2024 to March 2025. Each case included demographic details, clinical findings, anterior segment optical coherence tomography (AS-OCT), and in vivo confocal microscopy (IVCM) descriptions. For each case, clinical data—including patient history, slit-lamp findings, anterior segment optical coherence tomography (AS-OCT), and in vivo confocal microscopy (IVCM) results—were synthesized into standardized clinical vignettes. These were uploaded to ChatGPT (version 4.0), prompting the model: “Based on this case, what is the most likely diagnosis and what treatment would you recommend?” The model’s outputs were compared to the final clinical diagnoses and treatment plans made by a panel of three ocular surface disease specialists with more than 5 years of experience. It is important to note that ChatGPT analyzed only structured multimodal textual summaries derived from clinical examination, AS-OCT, and IVCM descriptions, and did not process raw clinical images. Results: ChatGPT accurately identified CIN in 85% of cases, pterygium in 75%, and pseudopterygium in 70%. Misclassifications primarily occurred between CIN and pseudopterygium, particularly in cases with inflammatory or traumatic histories. ChatGPT’s treatment recommendations agreed with expert judgment in 80% of cases, with the highest accuracy observed in CIN cases (90%). The model demonstrated a tendency to over-treat benign lesions but rarely missed neoplastic diagnoses. Conclusion: ChatGPT-4 showed promising accuracy in differentiating between CIN, pterygium, and pseudopterygium using detailed multimodal clinical data. While not a substitute for clinical expertise, it may serve as a useful triage or decision-support tool, particularly in settings with limited access to subspecialists. Further integration with image-based AI systems could enhance diagnostic performance.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3349869
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact