Eggs “over easy” and reflections on the authenticity of ChatGPT’s language

Eggs cooked over easy

I was in Vietnam last fall on assignment from the US State Department (Public Diplomacy Division) to serve as an “English Language Specialist,” giving workshops to foreign language teachers (mostly of English, come Korean and Mandarin) on integrating technology into language learning. The main topic: generative AI like ChatGPT. There was a lot of interest, as well as plenty of concern (students cheating, teachers losing their jobs). AI offers a variety of opportunities for language teachers and learners, as well as bringing with it lots of issues, both in education and for society as a whole. I just finished up an upcoming column on that topic for Language Learning & Technology.

One of the topics we talked about in the workshops was the quality of the language output from AI systems. While the language generated is grammatically correct, fluent, and coherent (although sometimes inaccurate), I pointed out to the teachers that AI systems have a quite different orientation to language than do humans, with a likely result that the social and cultural aspects of AI output might be questionable. I decided to try out an experiment with ChatGPT based on an incident I witnessed at breakfast at the hotel where I was staying in Hanoi. I entered the prompt below:

Me

An American tourist is staying at a hotel in Vietnam. At breakfast, he goes to the grill station and requests, in English, fried eggs “over easy” (i.e. with runny yokes). The cook has trouble understanding what he wants. The tourist repeats his request, but the cook still does not understand. The tourist says, “Ok, I’ll come back later. Thank you.” The cook does not respond. The tourist starts to walk away, then turns back to the cook and says, “I SAID thank you.” Why did the tourist make that last statement?

ChatGPT

The tourist likely made the last statement, “I said thank you,” as a polite way of expressing gratitude and acknowledging the cook, even though there was no verbal response from the cook. In some cultures, including American culture, expressing gratitude is considered polite and is a social norm. The tourist may have wanted to convey that he appreciated the cook’s efforts or wanted to ensure that his gratitude was acknowledged, even if the cook did not respond verbally. It’s a common social convention to say “thank you” after making a request or receiving assistance.

Of course, ChatGPT was wrong and took literally what was said instead of understanding the real meaning of “I SAID thank you.” The American tourist expected (unreasonably) an “Ok” or “You’re welcome” or some kind of verbal response and was repeating his “thank you” from irritation over the lack of response (and over not getting the eggs he wanted). From the perspective of language pragmatics, the Vietnamese cook failed to complete an “adjacency pair”, a conversational turn-taking pattern that stipulates a standard reply to an utterance. Unfortunately, the cook didn’t know enough English to play his role as expected. These are formulaic sequences, without real semantic meaning, rather a form of socialization, connecting speakers together. The American English greeting “How are you?” is not asking for a health report, but just offering a greeting, with an expected reply of “Fine, thanks”. Below is an abstract for a talk I am giving (virtually) in Portugal at an upcoming conference on “Digital Learning Environments and Authenticity in English Language Teaching.” My presentation deals with the question of social and cultural authenticity in AI language production:

The ability of ChatGPT and other AI systems to generate language that resembles closely human-produced speech has led to claims that AI chatbots can “facilitate an authentic, interactional language learning environment” (Chiu et al., 2023), that AI use is “essential for promoting cultural sensitivity, intercultural competency, and global awareness” (Anis, 2023, p. 64), and that AI-based VR supplies “the benefits of in-country immersion programs without the hassle” (Divekar, 2022, p. 2354). The suggestion in these studies is that AI output is linguistically and culturally “authentic” and could substitute in language learning settings for human interlocutors or could even provide similar benefits to a study abroad experience.

Such a view ignores the process used by AI systems to reproduce language and the limitations of that process for the linguistic features and cultural content of the resulting output. AI systems break down language (their training data) into mathematical symbols and use machine learning to find patterns and regularities to form a “large language model” that enables next word prediction in a text string, which is then used to very effectively construct sentences, paragraphs, even complete discourses. Humans, in contrast, are socialized into their language abilities, learning gradually how to use language appropriately within an ever-expanding circle of social contexts. Through interactions with others, we acquire the social and cultural norms of language use, including the contextually appropriate use of nonverbal communication, i.e., facial expressions, body language, and paralanguage. The statistical model of language in AI lacks the sociocultural grounding humans have through sensorimotor interactions and from simply living in the real world.

Studies of AI’s capabilities to engage in pragmatically effective language use have shown significant limitations (Lee & Wang, 2022; Su & Goslar, 2023). While AI systems can gain pragmalinguistic knowledge and learn appropriate formulaic sequences (politeness conventions, for example) through the verbal exchanges in their training data, they have proven to be much less effective in sociopragmatic engagement, that is, in generating contextually acceptable speech reflecting an interlocutor’s state of mind, intentions, and emotional status. AI systems are likely to improve through user interactions added to their language models, through enlarging their datasets, and through multimodal incorporation (adding video and image training). However, those measures still will not supply the lived experience humans go through in negotiating common ground linguistically and culturally in social interactions and therefore the ability to deal with nuanced pragmatic scenarios. AI generated language–while valuable as a resource in language learning–will remain artificial and inauthentic in ways that cannot serve as an acceptable substitute for actual learner engagement in the L2 with peers and expert speakers.

Language change

Jean Patrick Niambé, Ivorian rapper, Arlette Bashizi for The New York Times

In workshops I gave this fall in Vietnam on the use of AI in language learning, one of the topics we discussed was the change in English language use brought about through the AI frenzy of 2023. I used as an example a Doonesbury cartoon that highlighted the shift from the career benefits of being a “creative” to the now more lucrative role of becoming a “promptive,” i.e., doing prompt engineering. Traditional careers as lawyers and artists are presented as going away (replaced respectively by ChatGPT and Midjourney), leading to the need to find a “side hustle.” “AI” itself was named the most notable word of 2023 by Collins dictionary. Lots of other AI terms have entered this year into everyday language in English including “machine learning” and “large language model.” “Prompt” is now inevitably tied to AI use.

Language change happens in myriad ways, one being through technological and societal change, another through demographic shifts. That is illustrated in a recent article in the New York Times on changes to French through new uses of the language in the growing number of French speakers in West and central Africa:

More than 60 percent of those who speak French daily now live in Africa, and 80 percent of children studying in French are in Africa. There are as many French speakers in Kinshasa, the capital of the Democratic Republic of Congo, as in Paris. Through social media platforms like TikTok and YouTube, they are literally spreading the word, reshaping the French language from African countries, like Ivory Coast, that were once colonized by France.

The article chronicles how young people in Africa have adapted French creatively in entertainment and the arts. One rapper comments, “We’ve tried to rap in pure French, but nobody was listening to us,” so language used in rapping is adjusted to the social realities of local life. An example of an African neologism having gained wide use is the verb “enjailler” to mean “having fun,” a word originally used in the Côte d’Ivoire in the context of jumping off and on buses in Abidjan. Traditional words have been given new meanings:

In Abidjan this year, people began to call a boyfriend “mon pain” — French for “my bread.” Improvisations soon proliferated: “pain choco” is a cute boyfriend. A sugary bread, a sweet one. A bread just out of the oven is a hot partner.

Interestingly, this development comes at a time when the influence of France in Africa has declined. Some countries have evicted French ambassadors and troops and French has lost its official status in several countries (Mali, Burkina Faso). This demonstrates that language change has a dynamic of its own, often unpredictable and not controllable by political policies or government dictates.

The Beatles, AI, and authenticity

I’m in Vietnam currently, giving workshops on using AI tools in teaching English. Yesterday, we looked at what ChatGPT, Bard, and Bing might suggest as “best practices in using AI in language learning and teaching”. One of the suggestions was using AI chats as “authentic” language practice. That has set me to wonder what that word means in the context of AI text generation. That topic has been raised this month with the release of a new Beatles song, a musical group that disbanded over 50 years ago, with only 2 of the 4 members still living. A recent article in the New York Times discussed the issues related to that release:

Does it really make sense to use a song originally written by [John] Lennon alone, with no known intention of ever bringing it to his former bandmates, as the basis for a “Beatles” song? Is Lennon’s vocal, plucked and scrubbed by artificial intelligence and taking on a faintly unnatural air, something he would have embraced or been repulsed by? “Is this something we shouldn’t do?” McCartney asks in a voice-over, but neither he nor anyone else ever articulates exactly what the problem might be. “We’ve all played on it,” McCartney says. “So it is a genuine Beatle recording.” On one hand, who is more qualified than McCartney to issue this edict of authenticity? On the other: Why did he feel the need?

The author makes the point that this is quite different from what we all have been worrying about with AI, namely brand new “fakes”. In this case it is an example of using tech advances to, in the author’s opinion, make money from recycling old material:

The worry is that, for the companies that shape so much of our cultural life, A.I. will function first and foremost as a way to keep pushing out recycled goods rather than investing in innovations and experiments from people who don’t yet have a well-known back catalog to capitalize on. I hope I am wrong. Maybe “Now and Then” is just a blip, a one-off — less a harbinger of things to come than the marking of a limit. But I suspect that, in this late project, the always-innovative Beatles are once again ahead of their time.

The question of authenticity has one that is at the core of the communicative approach to language learning, with the idea that learners should not be working with made-up, simplified language materials, but be provided with real-world materials that native speakers would themselves might be accessing. For the materials to be comprehensible, learners are supplied with “scaffolding” (annotations, notes, glossaries, etc.). Online materials have been a boon in that respect, in contrast to most materials in textbooks. Now, AI is making the question of authenticity and attribution much trickier. AI generated materials are not products of native speakers, so should we treat them, as we do manufactured texts as lacking in authenticity? Certainly, the cultural perspective is missing, which is one of the principal benefits of using “authentic” materials. Stay tuned, as AI and attitudes towards its output are evolving rapidly.

ChatGPT and the human-machine relationship

There has been an eruption of interest recently in generative AI, due to the public release of ChatGPT from OpenAI, a tool which, given a brief prompt, can generate in seconds texts of all kinds that are coherent, substantive, and eerily human-like. The availability of such a tool has led educators, especially in fields relying on essay writing, to wring their hands over students simply turning in assignments written by ChatGPT. Some have embraced GPTZero, a tool designed to determine whether a text is written by an AI system (in my testing, it was hit and miss in its accuracy). Some school systems have banned the use of ChatGPT.

I believe that is the wrong approach; I believe we need instead to help students use AI tools appropriately, adjust writing assignments accordingly, and lead students to understand the limits of what such tools can do (there are many). ChatGPT will soon be joined by similar tools and their abilities are sure to grow exponentially. That means they will see wide use in all domains of human activity. In their real lives after graduation, students will be expected to use such tools; let’s prepare them for that future. I argued last year for that position in a column in Language Learning & Technology (“Partnering with AI”).

In a forthcoming LLT column (“Expanded spaces for language learning,” available in February), I look at another aspect of the presence of such tools in our lives, namely what it means in terms of the human-machine relationship and in understanding the nature (and limits) of human agency. A spatial orientation to human speech, which emphasizes the primacy of context (physical, virtual, emotional, etc.) has gained currency in applied linguistics in recent years. Rather than viewing language as something set apart from spatio-temporal contexts (as was the case in structuralism or Chomskian linguistics), scholars such as Pennycook, Bloomaert, and Canagarajah show how the spatial context is central to meaning-making. This perspective is bolstered by theories in psychology and neuroscience that cognition (and therefore speech) is not exclusive to the brain, but rather is embodied, embedded, enacted, or extended (4E cognition theory). That places greater meaning-making emphasis on physicality (gestures, body language) as well as on the environment and potential semiotic objects in it (such as AI tools!). I argue that an approach helpful in understanding the dynamics at play is sociomaterialism (also labeled “new materialism”). This is an approach used widely in the social sciences and more recently, in studies in applied linguistics. It offers a different perspective on the relationship of humans to the material world. Reflecting theories in the biological sciences, sociomaterialism posits a more complex and intertwined relationship between an organism and its surroundings, for us bipeds that translates into a distributed agency shared by humans and non-humans (including machines).

Here is an excerpt from the conclusion:

A spatial orientation to language use and language learning illuminates the complex intertwining of people and artifacts physically present with those digitally available. The wide use of videoconferencing in education, for example, complicates concepts of local and remote as well as online versus offline. Neat divisions are not tenable. Mobile devices as well represent the intersection of the local and the remote, of the personal and the social; they are equipped to support localized use, while making available all the resources of a global network. From a sociomaterial viewpoint, the phone and user form an entanglement of shared agency; smartphones supply “extensions of human cognition, senses, and memory” (Moreno & Traxler, 2016, p. 78). The sensors, proximity alerts, and camera feeds function as stimuli, extending cognition while acting as an intermediary between ourselves and the environment. For many users, smartphones have become part of their Umwelt, an indispensable “digital appendage” (Godwin-Jones, 2017, p. 4) with which they reach out to and interact with the outside world.

A sociomaterial perspective and 4E cognition theory problematize distinctions of mind versus body, as they also qualify the nature of human agency. The increasing role that AI plays in our lives (and in education) adds a further dimension to the complex human-material dynamic. AI systems built on large language models produce language that mimics closely human-created texts in style and content. A radical development in writing-related technologies is the AI-enabled incorporation of auto-completion of phrases into text editors and online writing venues, as well as suggestions for alternative wording. Auto-completion features in tools such as Google Docs or Grammarly raise questions of originality and credit. That is all the more the case with tools such as ChatGPT which are capable of generating texts on virtually any topic and in a variety of languages. O’Gieblyn in God, human, animal, machine: Technology, metaphor, and the search for meaning (2021) argues that due to the powerful advances in language technologies, we need new definitions of intelligence and consciousness, an argument bolstered by 4E cognition theory. In consideration of the language capabilities of AI tools today, particularly the text generation capabilities of services such as ChatGPT, we also need new understandings of authenticity and authorship.

O’Gieblyn points out that AI is able to replicate many functional processes of human cognition such as pattern recognition and predicting. That derives from the fact that language generation in such systems is based on statistical analysis of syntactic structures in immense collections of human-generated texts. That probabilistic approach to chaining together phrases, sentences, and paragraphs is capable of producing mostly cohesive and logically consistent texts. Yet these systems can also betray a surprising lack of knowledge about how objects and humans relate to one another. This results in statements that are occasionally incoherent from a social perspective. This is due to the fact that AI systems have no first-hand knowledge of real life. Unlike human brains, AI has no referential or relating experiences to draw on. Since the bots have no real understanding of human social relationships, they assume universal cultural contexts apply to all situations, not making appropriate distinctions based on context. This can lead to unfortunate and unacceptable language production including the use of pejorative or racist language.

The deep machine learning processes behind LLM-based chatbots do not allow for fine tuning or tweaking the algorithms. Today we have better insight into human neural networks through neuroimaging then we do into the black box of artificial neural networks used in AI. That fact should make us cautious in using AI-based language technologies in an unreflective manner. At the same time, advanced AI tools offer considerable potential benefits for language learning, and their informed, judicious use—alongside additional semantic resources that are contextually appropriate—seems to lie ahead for both learners and teachers.

Big data and language learning

The big news in artificial intelligence (AI) this past year was the arrival of GPT-3, a substantially improved version of the “Generative Pre-trained Transformer” from OpenAI, an advanced AI system built on a web of artificial neural networks, deep machine learning, and massive collection of data on human language. The system has been described as a giant step towards the realization of AGI, “artificial general intelligence”, the ability of a system to use language in virtually any domain of human activity. I wrote about this development in the latest issue of Language Learning & Technology, a special journal issue on big data and language learning. I discuss the breakthrough represented by AGI:

Normally, an AI system will be able to deal effectively only within a narrowly defined domain, for which the system has been trained, so as to expect specific language patterns typically used in that context. Google Duplex, for example, does a remarkable job in conversing over the phone with human operators in making dinner reservations or reserving a ride on Uber. GPT-3, in contrast, has been shown to interact through language in a wide variety of genres and content areas: creative writing, journalism, essays, poetry, text-based gaming, and even writing software code. The Guardian newspaper ran an article written by the program, while the New York Times asked it to write about love. A blogger used GPT-3 to write multiple blog posts, subsequently receiving numerous subscribers and notice on tech websites. The fact that many readers were not able to tell that the GPT-3 generated texts were written by an AI system raises questions of trust and authenticity, mirroring the concerns raised about audio and video “deepfakes”, based on training an artificial neural network on many hours of real audio or video footage of the targeted individual.

The system represents a remarkable achievement in its ability to write in natural sounding language (idiomaticity, flow, cohesion). That ability is based on the collection and analysis of huge volumes of speech data collected by crawling the internet, including all of Wikipedia. GPT-3 translates that data into a very large (175 billion!) set of connections or “parameters”, i.e. mathematical representations of patterns. These parameters provide a model of language, based not on rules, but on actual language usage. That allows the system to predict speech sequencing, based on regularly occurring constructions of words and phrases, thereby enabling the machine production of natural-sounding language utterances. One can imagine how powerful GPT-3 could be integrated into a smart personal assistant such as Siri. We are already seeing interesting uses of chatbots and intelligent assistants in language learning. A company called LearnFromAnyone is building on top of GPT-3 a kind of automated tutor, which can take on the identity of famous scientists or writers.

While GPT-3 and other advanced AI systems represent a significant technical achievement, there are, as I discuss in the article, plenty of reasons to be cautious and thoughtful in their use, as is the case generally with big data in both social and educational contexts. While the language generated by GPT-3 mimics what a human might write in terms of language use, compositional structure, and idea development, the texts don’t always make sense in terms of lived human experience, i.e. demonstrating an understanding of social norms and cultural practices. Human beings have the advantage in communicative effectiveness of having lived in the real world and and having developed the pragmatic abilities to generate language that is contingent on human interactions and appropriate to the context. We also can use crucial non-verbal cues, unavailable to a machine: gesture, gaze, posture, intonation, etc.

I argue in the article that a human factor is a crucial mediating factor in implementations of AI systems built on top of big data, particularly in education. Learning analytics (collection of data about student academic performance) tends to treat students as data, not as human beings with complicated lives (especially these days). I discuss these and other ethical and practical issues with data collection and use in the context of D’Ignazio and Klein’s Data feminism (2020). The book explores many examples of inequities in data science, as well as providing useful suggestions for overcoming disparities in data collection (favoring standard language use, for example) and for recognizing and compensating for algorithmic bias.