GPT-4-Trinis: assessing GPT-4’s communicative competence in the English-speaking majority world

AI and Society:1-17 (forthcoming)
  Copy   BIBTEX

Abstract

Biases and misunderstanding stemming from pre-training in Generative Pre-Trained Transformers are more likely for users of underrepresented English varieties, since the training dataset favors dominant Englishes (e.g., American English). We investigate (potential) bias in GPT-4 when it interacts with Trinidadian English Creole (TEC), a non-hegemonic English variety that partially overlaps with standardized English (SE) but still contains distinctive characteristics. (1) Comparable responses: we asked GPT-4 18 questions in TEC and SE and compared the content and detail of the responses. (2) Accurate translation: we assessed how accurate and authentic 29 TEC and 34 SE translations were. (3) Language knowledge and attitudes: we asked what language the prompts were written in and categorized the responses and examined any language attitudes that were exhibited. Content and detail in prompts were comparable. The model was proficient at translating TEC pronouns and many grammatical categories. It was weaker at processing spelling and vocabulary items. In addition, it produced several inauthentic features. Only 39% of TEC-generated sentences were fully grammatical. While GPT-4 was perfect at identifying SE, it was 21% accurate at identifying TEC, which it sometimes classified as English with “errors” and “corrected”. GPT-4’s scope of use is limited for non-hegemonic English users. It is problematic that some of its analyses perpetuate bias against underrepresented Englishes. Increased research on lesser-documented Englishes is necessary and we anticipate that this problem affects dialects of other languages. We intend to partner with Trinidadian stakeholders to train GPT-4 in the future.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,891

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3).Nassim Dehouche - 2021 - Ethics in Science and Environmental Politics 21:17-23.
Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3).Nassim Dehouche - 2021 - Ethics in Science and Environmental Politics 21:17-23.
Re-evaluating GPT-4’s bar exam performance.Eric Martínez - forthcoming - Artificial Intelligence and Law:1-24.

Analytics

Added to PP
2024-05-03

Downloads
21 (#727,964)

6 months
21 (#165,879)

Historical graph of downloads
How can I increase my downloads?