Does ChatGPT Resemble Humans in Judgements of Grammatical Acceptability?
Date:
I presented the first large-scale investigation of ChatGPT’s grammatical intuition, building upon a previous research (Sprouse et al., 2013) that examined the grammaticality judgment of 148 pairwise linguistic phenomenon (e.g., 1a. “he was the judge” vs 1b. “he was judge”). These linguistic phenomenon were sampled from the journal of Linguistic Inquiry, where linguists had classified them as grammatical, ungrammatical, or marginally grammatical. Sprouse and colleagues surveyed layman participants for their grammatical judgement of these sentences. In this study, our primary focus was to explore ChatGPT’s judgments of these sentences in comparison to both layman participants and linguistic experts. Overall, our findings demonstrate convergence rates ranging from 73% to 95% (depending on the experiment and statistical test) between ChatGPT and human linguistic experts, with an overall point-estimate of 89%. This means that, in general, ChatGPT correctly distinguishes grammatical sentences from ungrammatical ones approximately 89% of the time. However, the behaviour patterns of ChatGPT and layman participants varied depending on the specific judgement task. We attribute these results to the psychometric nature of the judgment tasks and the differences in the representation of grammatical knowledge between humans and LLMs.