Asian language in US movies, now becoming more authentic

Screenshot from interactive Web page on Asian movies in NY Times

The New York Times announced recently a new series that examines the output of Asian Americans and Pacific Islanders (AAPIs) in popular culture. The first  piece in the series, “Found in Translation: Asian Languages Onscreen,” is focused on the use of Asian languages in US American movies and TV. The article itself is quite interesting, but particularly effective is its online presentation, which features an innovative visual design by Alice Fang. The article points out that traditionally US movie viewers have been averse to reading subtitles but that has changed recently in respect to Hollywood movies with Asian characters. Recent hit movies Shogun and Everything Everywhere All at Once feature dialog, respectively, in Japanese and Chinese, mixed in with English.

The Web version of the article shows in interactive format how in a scene from the recent HBO mini-series The Sympathizer, a movie about making a movie, there is a kind of hidden dialog going on through the subtitles that illuminates what is happening on screen. The movie is set in Vietnam, but in that scene an actress who is playing a Vietnamese peasant speaks Cantonese. According to the article:

A crew member sheepishly explains to the director that they didn’t bother casting a Vietnamese speaker since “there’s no line in the script.” Eventually, another actress who does speak Vietnamese is brought in, and they reshoot the scene. Instead of the line the director suggests — “Don’t shoot me, I’m only a peasant” — the actress shouts one fed to her by the Captain, the on-set cultural consultant who is actually a Communist spy: “Our hands will close around the throat of American imperialism!” The swap goes over the director’s head, but for the viewer, who can read the subtitles (and for anyone who speaks Vietnamese), the layers of language become a narrative tool for political satire.

The article points out that this trend is likely to continue, as it seems to play well with audiences, adding a welcome tone of realism to movies about Asian characters that in the past were highly Americanized. For audience members who actually speak the language used in the film, the meaning is deepened, giving those viewers a special, inside track on what’s going on. The article points out that Internet users, through exposure to multilingual videos in YouTube, TikTok, etc. have grown more accoustomed and accepting of subtitles. In movies, adding authentic language enhances the story line: “Multiplicity of language is most interesting when it’s used to progress these stories — to ratchet up tension, to encase or reveal secrets, to create emotional resonance, to reflect or deflect identity.” That’s the case, for example, in the rich code-switching in Everything Everywhere All at Once which often features characters mixing Chinese and English in the same sentence. Subtitles not only enrich movies, they are also a wonderful (and entertaining) way to learn or maintain a foreign language. The fact that English language TV programs have long been shown in Scandinavia in English with subtitles has been shown to be a major factor in the citizens of those countries speaking better English than in countries such as Germany, that has always dubbed foreign video into German.

Interesting in terms of language in movies, a recent trailer has stirred a good deal of comment on accents used. The upcoming movie is the sequel Gladiators 2, which features Denzel Washington. The issue with many viewers of the trailer is that Washington does not adhere to the cinematic tradition of actors in movies about ancient Rome using British English but instead speaks in his regular New York accented American English. It’s a rather strange criticism in that ancient Romans certainly did not speak English of any kind. As many have pointed out, the Roman Empire was multicultural and therefore likely had many speakers with different accents speaking Latin. So criticizing the movie for mixed English accents is way off base.

Irish in Dublin

Percentage of Irish who speak Irish daily

I am currently in Dublin. My last visit to Ireland was nearly a decade ago. At that time I was in Limerick and wrote about the status of the Irish language. Unfortunately, since then the language situation has not improved, with currently only around 2% of Irish reporting in the last census that they speak Irish as their main home language. On the other hand, there is an increase in the overall number of speakers of the language, although many have only a basic level of proficiency. That is due to the fact that Irish is a core subject in the primary school curriculum (in the Republic of Ireland, not in Northern Ireland). In Dublin, street signs and names of shops/restaurants are in both English and Irish, often with Irish given first. In fact, Dublin is one of the bright spots for Irish, with more than 50 Irish immersion schools (Gaelscoileanna). In the 2021 UNESCO’s Atlas of World Language Irish is listed as “definitely endangered” but it seems nonetheless to have a prominent presence in Ireland. As Welsh is in Wales, Irish is an important identity marker and source of pride in Ireland, a way to distinguish itself from England.

We were hoping to hear some traditional Irish music here and today went to the most famous pub in Dublin, the Temple Bar Pub, which advertises 24/7 music. We wanted to avoid the crowds so went there at 11 a.m. and sure enough there was music playing: a small band playing John Denver classics and other pop/rock songs – no Irish tunes. Tonight, the big news here in music is a concert happening near our hotel: Taylor Swift. Tickets are going for $1400 each – we will pass and try and again for a pub with Irish music.

If a language is not on the Internet, does it exist?

The quick answer to the question in the title is is yes, but maybe not for long. This week there was an interesting podcast on US public radio (KERA’s Think), AI could drive most languages to extinction, which features a conversation with Matteo Wong, a staff writer for The Atlantic, who published this spring a piece on the same topic, The AI Revolution Is Crushing Thousands of Languages. Both the podcast and the article deal with the troublesome issue of the poor representation in AI of low-resource languages, i.e. languages which do not have a large written record, and which may be under-represented online. As a result, those language, in contrast to high-resource languages like English, Chinese, Spanish, French, or Japanese (and other European and Western languages), do not represent much of the training data for generative AI systems like ChatGPT. As a result, AI systems have little knowledge of those languages and therefore are likely to perform poorly in areas such as translation, providing accurate information, or even generating coherent texts.

Wong gives the example of a linguist from Benin asking ChatGPT to provide a response in Fon, a language of the Atlantic-Congo family spoken by millions in Benin and neighboring countries. The AI response: it was not able to comply, as Fon was “a fictional language”. An additional problem for low-resource languages is that the texts that do appear online may not be genuinely produced by speakers of the language, but be machine-translated, and therefore potentially of questionable quality.

This means that AI, the increasingly important source of information in today’s world, will be unavailable to those who do not know English or another large-resource language. Wong cites David Adelani, a DeepMind research fellow at University College London, pointing out that “even when AI models are able to process low-resource languages, the programs require more memory and computational power to do so, and thus become significantly more expensive to run—meaning worse results at higher costs”. That means there is little incentive for AI companies like OpenAI, Meta, or Google to develop capabilities in languages like Fon.

The information about low-resource languages is not just linguistically deficient, but culturally problematic as well:

AI models might also be void of cultural nuance and context, no matter how grammatically adept they become. Such programs long translated “good morning” to a variation of “someone has died” in Yoruba, Adelani said, because the same Yoruba phrase can convey either meaning. Text translated from English has been used to generate training data for Indonesian, Vietnamese, and other languages spoken by hundreds of millions of people in Southeast Asia. As Holy Lovenia, a researcher at AI Singapore, the country’s program for AI research, told me, the resulting models know much more about hamburgers and Big Ben than local cuisines and landmarks.

The lack of support for most of the 7,000 world languages is evident from the fact that Google’s Gemini supports 35 languages and ChatGPT 50. As Wong notes, this is not just a practical problem for speakers of low-resource languages, in that the lack of support sends the message that those speakers’ languages are not valued. There is of course also the danger of languages lacking AI support will become less widely spoken, as they are perceived as not offering the personal and professional benefits of high-resource languages. Losing languages loses the human values associated with those language, for example, knowledge of the natural world tied to Indigenous languages, unique cultural values, traditional stories.

Wong points out that there are efforts to remedy this situation, such as Meta’s No Language Left Behind project. That initiative is developing open-source models for “high-quality translations directly between 200 languages—including low-resource languages like Asturian, Luganda, Urdu and more”. The Aya project is a global initiative led by non-profit Cohere For AI involving researchers in 119 countries seeking to develop a multilingual AI for 101 languages as an open resource. That system features human-curated annotations from fluent speakers in many languages. Masakhane is a grassroots organization whose mission is to strengthen and spur “in African languages, for Africans, by Africans”.

Let us hope that such initiatives can help bring AI to more languages and cultures. However, the power of the big AI systems is such that only if they commit to adding more diverse language data to their training corpus will AI truly become more fully multilingual and multicultural.

Politicians visiting China: Practice your chopstick skills

Janet Yellen eating in China

Despite the frosty relations between the United States and China these days, occasionally government officials from the two countries meet. Recently U.S. Treasury secretary Janet Yellen and Secretary of State Anthony Blinken visited China for talks with Chinese officials. A recent story in the New York Times did not report on the substance of the talks (the newspaper did elsewhere) but instead talked about something more perhaps equally important for visiting diplomats, namely what and how they ate while in China:

Where, what and how American dignitaries eat when they visit China is a serious matter. Choices of restaurants and dishes are rife with opportunities for geopolitical symbolism, as well as controversy and mockery. Chopstick skills — or a lack thereof — can be a sign of cultural competence or illiteracy.

American officials are closely observed, and their meals are widely reported on in Chinese social media. If they order what Chinese people normally eat, they are praised, particularly if the food is “authentic”, i.e. not modified for Western palates. Ms. Yellen has turned out to be somewhat of a sensation in the food department. She eats in a variety of different Chinese style restaurants (Cantonese, Sichuan,Yunnan), orders the right dishes, and, most importantly, is very good with chopsticks. In fact, her chopstick skill were on display in a video shared widely in China on Weibo. She also won points for not eating in a private room, but in the open with other diners. Her popularity even led one restaurant where she ate to create a set menu based on what she had eaten there: it’s called the “God of Wealth menu” (after all she’s in charge of the US Treasury).

My experience in China is that the Chinese are (justifiably) very proud of their cuisine and find it important for foreigners to try authentic Chinese food, and to like it. They are also very appreciative if foreigners make a effort to learn Chinese, and quite surprised when a Westerner can speak even a little Chinese. I wrote a while back about my positive experience when ordering beer in Chinese and getting the pronunciation correct. US diplomats would be even more warmly received in China if they not only were able to use chopsticks but also could speak a little Mandarin.

Eggs “over easy” and reflections on the authenticity of ChatGPT’s language

Eggs cooked over easy

I was in Vietnam last fall on assignment from the US State Department (Public Diplomacy Division) to serve as an “English Language Specialist,” giving workshops to foreign language teachers (mostly of English, come Korean and Mandarin) on integrating technology into language learning. The main topic: generative AI like ChatGPT. There was a lot of interest, as well as plenty of concern (students cheating, teachers losing their jobs). AI offers a variety of opportunities for language teachers and learners, as well as bringing with it lots of issues, both in education and for society as a whole. I just finished up an upcoming column on that topic for Language Learning & Technology.

One of the topics we talked about in the workshops was the quality of the language output from AI systems. While the language generated is grammatically correct, fluent, and coherent (although sometimes inaccurate), I pointed out to the teachers that AI systems have a quite different orientation to language than do humans, with a likely result that the social and cultural aspects of AI output might be questionable. I decided to try out an experiment with ChatGPT based on an incident I witnessed at breakfast at the hotel where I was staying in Hanoi. I entered the prompt below:

Me

An American tourist is staying at a hotel in Vietnam. At breakfast, he goes to the grill station and requests, in English, fried eggs “over easy” (i.e. with runny yokes). The cook has trouble understanding what he wants. The tourist repeats his request, but the cook still does not understand. The tourist says, “Ok, I’ll come back later. Thank you.” The cook does not respond. The tourist starts to walk away, then turns back to the cook and says, “I SAID thank you.” Why did the tourist make that last statement?

ChatGPT

The tourist likely made the last statement, “I said thank you,” as a polite way of expressing gratitude and acknowledging the cook, even though there was no verbal response from the cook. In some cultures, including American culture, expressing gratitude is considered polite and is a social norm. The tourist may have wanted to convey that he appreciated the cook’s efforts or wanted to ensure that his gratitude was acknowledged, even if the cook did not respond verbally. It’s a common social convention to say “thank you” after making a request or receiving assistance.

Of course, ChatGPT was wrong and took literally what was said instead of understanding the real meaning of “I SAID thank you.” The American tourist expected (unreasonably) an “Ok” or “You’re welcome” or some kind of verbal response and was repeating his “thank you” from irritation over the lack of response (and over not getting the eggs he wanted). From the perspective of language pragmatics, the Vietnamese cook failed to complete an “adjacency pair”, a conversational turn-taking pattern that stipulates a standard reply to an utterance. Unfortunately, the cook didn’t know enough English to play his role as expected. These are formulaic sequences, without real semantic meaning, rather a form of socialization, connecting speakers together. The American English greeting “How are you?” is not asking for a health report, but just offering a greeting, with an expected reply of “Fine, thanks”. Below is an abstract for a talk I am giving (virtually) in Portugal at an upcoming conference on “Digital Learning Environments and Authenticity in English Language Teaching.” My presentation deals with the question of social and cultural authenticity in AI language production:

The ability of ChatGPT and other AI systems to generate language that resembles closely human-produced speech has led to claims that AI chatbots can “facilitate an authentic, interactional language learning environment” (Chiu et al., 2023), that AI use is “essential for promoting cultural sensitivity, intercultural competency, and global awareness” (Anis, 2023, p. 64), and that AI-based VR supplies “the benefits of in-country immersion programs without the hassle” (Divekar, 2022, p. 2354). The suggestion in these studies is that AI output is linguistically and culturally “authentic” and could substitute in language learning settings for human interlocutors or could even provide similar benefits to a study abroad experience.

Such a view ignores the process used by AI systems to reproduce language and the limitations of that process for the linguistic features and cultural content of the resulting output. AI systems break down language (their training data) into mathematical symbols and use machine learning to find patterns and regularities to form a “large language model” that enables next word prediction in a text string, which is then used to very effectively construct sentences, paragraphs, even complete discourses. Humans, in contrast, are socialized into their language abilities, learning gradually how to use language appropriately within an ever-expanding circle of social contexts. Through interactions with others, we acquire the social and cultural norms of language use, including the contextually appropriate use of nonverbal communication, i.e., facial expressions, body language, and paralanguage. The statistical model of language in AI lacks the sociocultural grounding humans have through sensorimotor interactions and from simply living in the real world.

Studies of AI’s capabilities to engage in pragmatically effective language use have shown significant limitations (Lee & Wang, 2022; Su & Goslar, 2023). While AI systems can gain pragmalinguistic knowledge and learn appropriate formulaic sequences (politeness conventions, for example) through the verbal exchanges in their training data, they have proven to be much less effective in sociopragmatic engagement, that is, in generating contextually acceptable speech reflecting an interlocutor’s state of mind, intentions, and emotional status. AI systems are likely to improve through user interactions added to their language models, through enlarging their datasets, and through multimodal incorporation (adding video and image training). However, those measures still will not supply the lived experience humans go through in negotiating common ground linguistically and culturally in social interactions and therefore the ability to deal with nuanced pragmatic scenarios. AI generated language–while valuable as a resource in language learning–will remain artificial and inauthentic in ways that cannot serve as an acceptable substitute for actual learner engagement in the L2 with peers and expert speakers.

What’s in a name? Maybe getting elected

Chinese ballot in San Francisco

Interesting piece recently in the New York Times on politicians in San Francisco, using Chinese names to help them get elected. That city, in fact, has itself an interesting Chinese name: 舊金山 (Jiù Jīn Shān), which translates as “Old Gold Mountain,” referencing the 19th-century gold rush in California. San Francisco has one of the oldest and best-known “Chinatowns” in the US, with many residents speaking Cantonese or Mandarin as their first language. Due to the large number of Chinese-speaking residents, San Francisco has mandated since 1999 that names for political candidates in local elections appear in both English and Chinese. That is no problem for candidates with a Chinese heritage; they likely already have Chinese names. For non-Chinese candidates, they can simply transliterate their English (or Spanish) names. Foreign names are usually transliterated into Chinese by phonetically selecting characters that approximate the sound of the name, as explained in this post from “Fluent in Mandarin.”  So “Anderson” becomes 安德森(Āndésēn). It’s not an automatic process, as there are often different characters that can be used for a specific sound.

As pointed out on that site, “The problem with ‘transliterating’ foreign names into Chinese is that they can often sound very unnatural, and not like a native Chinese name.” It’s also the case that the actual meaning of the name (through the characters used) chosen for phonetic similarity may be quite strange (for Chinese speakers). The Chinese transliteration of “Anderson” (安德森)means Install-Virtue-Forest. For foreigners in China, it’s sometimes better to come up with some kind of nickname, as those are frequently used informally in China. In China, I go by the name 老牛 (Lǎo niú), meaning “Old Bull”; that references my age but was particularly chosen as my oldest son (who’s married to a Chinese woman) is “Young Bull” in China.

In San Francisco, it used to be that non-Chinese politicians could come up with their own Chinese names. As it turned out, many did not simply use transliterations of their names, but instead came up with Chinese names that made them sound good. That has now changed, so that Chinese names can only be used if you were born with that name (i.e. are from a Chinese background) or have publicly used that Chinese name for at least 2 years. Otherwise, your name will be transliterated into Chinese phonetically by the election authorities. This has resulted in some candidates having to change the Chinese versions of their names. According to the Times article,

Michael Isaku Begert, who is running to keep his local judgeship, cannot use 米高義, which means in part “high” and “justice,” a name that suggests he was destined to sit on the bench. And Daniel Lurie, who is challenging Mayor London Breed, must scrap the name he had been campaigning with for months: 羅瑞德, which means “auspicious” and “virtue.” Mr. Lurie’s new name, 丹尼爾·羅偉, pronounced Daan-nei-ji Lo-wai, is a transliterated version that uses characters closer to the sound of his name in English but are meaningless when strung together.

According to a piece in the San Franciso Standard, this rule may actually disadvantage ethnic Chinese Americans: “American-born Chinese candidates may be unintended victims of the law, if they don’t have a birth certificate in Chinese that proves they were born with Chinese names.” They may have to supply alternative proof, which meant, according to the article, candidates, “were scrambling to seek old Chinese newspaper articles, Chinese school homework or Chinese family books to find the appropriate records.”

San Francisco is not the only US locale printing ballots in more than one language. According to the Timesarticle:

Certain towns in Alaska must translate ballots into Yup’ik, an Indigenous Alaskan language, while some counties in Arizona must do so in Navajo and Apache. Hundreds of jurisdictions around the nation must translate their ballots into Spanish, while 19 must print them in Chinese, 12 in Vietnamese and four in Korean.

It would be interesting to know if politicians in those areas who have English names also come up with names in those other languages, and whether in the process they try to play language games to make themselves sound good.

Language change

Jean Patrick Niambé, Ivorian rapper, Arlette Bashizi for The New York Times

In workshops I gave this fall in Vietnam on the use of AI in language learning, one of the topics we discussed was the change in English language use brought about through the AI frenzy of 2023. I used as an example a Doonesbury cartoon that highlighted the shift from the career benefits of being a “creative” to the now more lucrative role of becoming a “promptive,” i.e., doing prompt engineering. Traditional careers as lawyers and artists are presented as going away (replaced respectively by ChatGPT and Midjourney), leading to the need to find a “side hustle.” “AI” itself was named the most notable word of 2023 by Collins dictionary. Lots of other AI terms have entered this year into everyday language in English including “machine learning” and “large language model.” “Prompt” is now inevitably tied to AI use.

Language change happens in myriad ways, one being through technological and societal change, another through demographic shifts. That is illustrated in a recent article in the New York Times on changes to French through new uses of the language in the growing number of French speakers in West and central Africa:

More than 60 percent of those who speak French daily now live in Africa, and 80 percent of children studying in French are in Africa. There are as many French speakers in Kinshasa, the capital of the Democratic Republic of Congo, as in Paris. Through social media platforms like TikTok and YouTube, they are literally spreading the word, reshaping the French language from African countries, like Ivory Coast, that were once colonized by France.

The article chronicles how young people in Africa have adapted French creatively in entertainment and the arts. One rapper comments, “We’ve tried to rap in pure French, but nobody was listening to us,” so language used in rapping is adjusted to the social realities of local life. An example of an African neologism having gained wide use is the verb “enjailler” to mean “having fun,” a word originally used in the Côte d’Ivoire in the context of jumping off and on buses in Abidjan. Traditional words have been given new meanings:

In Abidjan this year, people began to call a boyfriend “mon pain” — French for “my bread.” Improvisations soon proliferated: “pain choco” is a cute boyfriend. A sugary bread, a sweet one. A bread just out of the oven is a hot partner.

Interestingly, this development comes at a time when the influence of France in Africa has declined. Some countries have evicted French ambassadors and troops and French has lost its official status in several countries (Mali, Burkina Faso). This demonstrates that language change has a dynamic of its own, often unpredictable and not controllable by political policies or government dictates.

Is this English: “Get down from the car!”?

For anyone familiar with the ethnic and linguistic make-up of Florida, it will come as no surprise that in the Miami area (and elsewhere in the state), everyday language use is characterized by a mix of English and Spanish. “Spanglish” is a term used to describe the influence of English into spoken Spanish. Now a linguist at Florida International University (Phillip Carter) has published findings that indicate a new version of English is emerging in the Miami area which is heavily influenced by Spanish. An article on the research in Scientific American provides some examples:

“We got down from the car and went inside.”
“I made the line to pay for groceries.”
“He made a party to celebrate his son’s birthday.”

For most speakers of North American Spanish, those sentences sound strange. In fact, as Carter points out, these are “literal lexical calques,” i.e., direct, word-for-word translations.

“Get down from the car” instead of “get out of the car” is based on the Spanish phrase “bajar del carro,” which translates, for speakers outside of Miami, as “get out of the car.” But “bajar” means “to get down,” so it makes sense that many Miamians think of “exiting” a car in terms of “getting down” and not “getting out.” Locals often say “married with,” as in “Alex got married with José,” based on the Spanish “casarse con” – literally translated as “married with.” They’ll also say “make a party,” a literal translation of the Spanish “hacer una fiesta.”

Carter provides additional examples that are based on phonetic transfers: “’Thanks God,’ a type of loan translation from ‘gracias a Dios,’ is common in Miami. In this case, speakers analogize the ‘s’ sound at the end of “gracias” and apply it to the English form.

A YouTube video provides further examples from Carter’s research:

Hamilton auf Deutsch

The German cast of Hamilton

Translating always means approximating the meaning of the original text. That is especially the case for translating literary texts, where it’s not just a matter of finding equivalent meanings but also of conveying the feel and style of the original. When the texts are poetry that makes the process even more complicated, as there is a need to make decisions on issues like rhyme, alliteration, meter, rhythm, etc. Then there is the cultural component. Literary texts are embedded in socio-historical contexts, which may be familiar to the intended readers of the original. Translators need to determine whether to simply convey the cultural context as in the original or add or explain so that it is understandable to the readers of the translation. Then there is humor and word play. Music complicates further, as the translated language must fit in to the time constraints built into song lyrics.

Translating the celebrated musical Hamilton into another language has all those complications and more. The story told in the musical is deeply enmeshed in the history and mythology of the founding of the United States. The story and the central figures in it are well known to anyone who has attended schools in the US. For those not having that background, the dynamics of the exchanges among characters in the musical, taken from historical accounts, will be unfamiliar. Then there is the kind of music in Hamilton, namely hip-hop or rap. While that style of music originated in the US, it has spread across the world, so the musical form will likely be familiar, at least to young theater goers. However, in the US, the cultural context of rap is tied closely to African-Americans and that is reflected in the musical, at least in its original stage version and movie, in which the main characters are Black.

So, translating Hamilton into German was no easy task, as pointed out in a recent piece in the New York Times: “Hamilton is a mouthful, even in English. Forty-seven songs; more than 20,000 words; fast-paced lyrics, abundant wordplay, complex rhyming patterns, plus allusions not only to hip-hop and musical theater but also to arcane aspects of early American history.” It wasn’t just the challenge of keeping the musical character as close as possible to the original, it was also the problem linguistically of going from English to German, as the piece states, “a language characterized by multisyllabic compound nouns and sentences that often end with verbs”. Translations from English to German often end up being considerably longer than the original. That was going to be a problem here. So, the translators had to be flexible and creative, finding ways to keep the wordage down, while maintaining the essentials of the content and including as many of the artistic effects in the lyrics as possible. The latter included the internal rhyming that is characteristic of rapping and is used extensively in Hamilton. The translators were able to work with the musical’s creator, Lin-Manuel Miranda, who monitored the translations to make sure the lyrics in German fit the spirit of the original. The New York Times article and a piece in National Public Radio provide examples of the wording in German. The clip below shows the results.

The German Hamilton started playing this month in Hamburg; it will be interesting to see how German theater goers react. One of the interesting aspects of the German production is that the makeup of the cast mirrors that of the New York production, with actors of color playing the main roles. The fact that this was possible in Germany is a demonstration of the surprising diversity in contemporary Germany, with waves of immigration having significantly changed the homogeneity of the population there. In fact, many in the cast are immigrants, or the children of immigrants. Not all are German citizens, but they all speak fluent German, mostly as their first language. For my money, the songs sound very good (and natural!) in German.

“I’m a fraud within my own identity”: Racial Impostor Syndrome

Emily Kwong at age 2 with her grandparents

There was a compelling story this week on NPR dealing with individuals from families with mixed ethnic backgrounds.  In the story, Emily Kwong discusses her decision to learn Mandarin Chinese at the age of 30. Her Dad had grown up speaking Chinese as his first language, but when he started school, he started learning English, and soon switched over completely. When she was little, Emily had learned some Chinese from her grandmother but had not retained any. As an adult, she started to feel increasingly that she needed to try to reconnect with the Chinese side of her identity and decided that necessitated learning the language of her father.

The feeling that Emily has of “feeling that I’m not Chinese enough” has sometimes been labeled racial imposter syndrome, the sense that individuals may have that their lives are in a sense inauthentic because they don’t conform to key aspects of the ethnic heritage with which they identify. That might be related to a lack of knowledge about traditional customs or aspects of the way of life, or be connected to the individual’s appearance. But quite likely there will be a language component. For Emily, learning Chinese has proven to be a challenge, as it is for many native English speakers, particularly if one starts later in life. However, the difficulties she has had with the language pale when compared to the shame she feels in not being able to connect at all with her Chinese-speaking relatives:

I’ve decided that any shame I might feel about imperfect pronunciation, fumbles with grammar is nothing compared to the shame I felt about not knowing the language at all; the shame I feel as my older relatives rattle off dim sum dishes and I stare down the menu pictures, feeling like a fraud within my own identity, missing something I never had in the first place.

This sense of feeling like a fraud within one’s own identity is likely felt by many individuals from immigrant families or those with connections to multiple cultures. NPR a few years ago did a series on such cases, which are increasing as American identities become more mixed. The accounts of experiences are revealing. A woman whose mother is Panamanian and father a white American recounted:

When I was young (20s) and living in the city, I would get asked multiple times a day where I was from, where my people were from, because Allentown, Pennsylvania, clearly wasn’t the answer they were looking for … It always felt like the undercurrent of that question was, “You aren’t white, but you aren’t black. What are you?” But truthfully, I don’t feel like I fit with Latinas either. My Spanish is atrocious and I grew up in rural PA. Even my cousin said a few weeks ago, “Well, you aren’t really Spanish, because your dad is white.” Which gutted me, truly. I identify as Latina.

Another individual featured was a light-skinned biracial woman:

White people like to believe I’m Caucasian like them; I think it makes their life less complicated. But I don’t identify as 100% white, so there always comes a time in the conversation or relationship where I need to ‘out’ myself and tell them that I’m biracial. It’s a vulnerable experience, but it becomes even harder when I’m with black Americans. It may sound strange — and there are so many layers to this that are hard to unpack — but I think what it comes down to is: they have more of a claim to ‘blackness’ than I ever will and therefore have the power to tell me I don’t belong, I’m not enough, that I should stay on the white side of the identity line.

Emily’s effort to retrieve a heritage through language learning may not apply to everyone, but for many it may be an important tool for feeling less like a “fraud”. The experiences in her family illustrate a frequent pattern in immigrant families, namely that the first generation may give up their first language, as Emily’s Dad did, in order to assimilate into mainstream US society. That was widely the case throughout the 20th century in the US. Second or third generation children, like Emily, discover often that not speaking the language of their immigrant parent or parents leaves a void in their sense of who they are.

Is Ukraine a country?

Russian troops in Ukraine

One of the issues in the war between Russia and Ukraine is the insistence by President Putin of Russia that in fact Ukraine is not a “real” country but historically and culturally belongs to Russia. That is part of his stated rationale for the war, along with the concern he continually expresses that Ukraine will join NATO and thus become an enemy on Russia’s doorstep.

Many have debunked the idea that Ukraine does not have a right to be an independent nation. The historical, cultural, and linguistic reasons are laid out nicely in a recent piece by Keith Gessen in the New York Times Magazine. He cites the conversation from one of the posted videos from the war showing a Ukrainian citizen confronting Russian soldiers after their tank runs out of gas on the road to Kyiv. The Ukrainian asks in Russian, “Can I tow you back to Russia?”

Beyond the courage and the sense of humor on display, Gessen comments that the fascinating aspect of the encounter was not so much how it documented the incompetence of the Russian invasion, but rather that the Ukrainian could communicate so easily, so freely, with the Russian soldiers. Not only do many Ukrainians speak Russian, often as their native language, but they switch easily between Russian and Ukrainian, depending on the context and the situation. Although Russian war propaganda makes much of the minority status of Russian speakers in Ukraine, that does not ring true in the everyday life of most Ukrainians:

Russian propaganda claims that the language is discriminated against, and there are people in Russia who believe that you will get shouted at, or even attacked, for speaking Russian in Kyiv. Yet in the videos now emerging from Ukraine, over and over again, people are speaking Russian. Soldiers speak Russian as they fire rocket-propelled grenades at Russian tanks. Locals speak Russian as they survey annihilated Russian columns.

Gessen points out that Americans are likely to associate war with countries far away, where a foreign language is spoken. He points out that this war is altogether different: “Russia invading Ukraine is less like our wars in Iraq or Vietnam and more like the United States invading Canada.”

The President of Ukraine himself, Volodymyr Zelenskyy, switches easily between the two languages. In a speech rebutting Putin’s claim of the shared identity of Ukraine and Russia, he switched to Russian in order to address the Russian people:

Responding to the Kremlin’s claims that it was protecting the separatist regions from Ukrainian plans to take them by force, Zelensky asked whom, exactly, Russia thought he was going to bomb. “Donetsk?” he asked incredulously. “Where I’ve been dozens of times, seen people’s faces, looked into their eyes? Artyoma Street, where I hung out with my friends? Donbas Arena, where I cheered on our boys at the Eurocup? Scherbakov Park, where we all went drinking after our boys lost? Luhansk? The house where the mother of my best friend lives? Where my best friend’s father is buried?”

Zelensky was making the argument that Ukraine is indeed a nation, formed not by history or by language, but by the histories and memories of its people and through personal connections among its citizens. Zelensky continued his speech: “Note that I’m speaking now in Russian, but no one in Russia understands what I’m talking about. These place names, these streets, these families, these events — this is all foreign to you. It’s unfamiliar. This is our land. This is our history. What are you going to fight for? And against whom?”

It is a powerful and effective argument that he is making, insisting that Ukraine is not Russia, but instead its own nation, multiethnic and multilingual, and rooted to a particular place, culture, and history. And one whose continued existence is worth fighting for.

Latin/Greek at Princeton: optional for classics

Princeton University

Princeton University recently announced that it will no longer be necessary for students majoring in classics to learn Latin and Ancient Greek. The rationale given is that such language requirements disadvantage students from high schools not offering Latin. I assume few US high schools now teach Latin, not to mention ancient Greek. It used to be that the area where I live, central Virginia, had the highest enrollment in Latin of all secondary schools in the US. At my university, VCU, we had a thriving Latin program and had difficulty finding enough Latin instructors to accommodate the demand. But those times are now past, a victim of the general decline in language learning in the US. Latin has the additional disadvantage of not being “useful”, i.e. not relating directly to job prospects.

The Princeton decision has generated controversy. While Latin and Greek will continue to be offered as electives, not requiring classics students to take them will inevitably lead to enrollment declines and to classics majors not learning those languages, so crucial for understanding classical culture and literature. Linguist John McWhorter in an article in the Atlantic argues that the decision, made not to disadvantage incoming students from non-elite schools not offering Latin, actually is likely to have the opposite effect: “By ending a requirement that classics majors learn Greek or Latin, Princeton risks amplifying racism instead of curing it.” His argument is that the decision, instead of encouraging disadvantaged students, African-Americans and Latinos, deprives them of the opportunity to expand their knowledge and their identities by learning second languages related to the content they are studying:

The Princeton classics decision also deprives students—and, to the extent that the change is racially focused, Black students in particular—of the pleasant challenge of mastering Latin or Greek. With their rich systems of case marking on nouns and flexible word order, both are difficult for an English speaker…Crucially, you often must go through a phase of drudgery—learning the rules, memorizing vocabulary—before you pass into a phase of mastery and comprehension, like dealing with scales on the piano before playing sonatas. The Princeton decision is discouraging students from even beginning this process. Professors may think of the change as a response to racism, but the implicit intention—sparing Black students the effort of learning Latin or Greek—can be interpreted as racist itself.

Whether one agrees or not with McWhorter’s argument, I find one assertion he makes to be absolutely valid, namely that reading the classics (or any literary work not written in English) in translation is far different from being able to read the text in the original language, no matter how good the literary translation is.

Big data and language learning

The big news in artificial intelligence (AI) this past year was the arrival of GPT-3, a substantially improved version of the “Generative Pre-trained Transformer” from OpenAI, an advanced AI system built on a web of artificial neural networks, deep machine learning, and massive collection of data on human language. The system has been described as a giant step towards the realization of AGI, “artificial general intelligence”, the ability of a system to use language in virtually any domain of human activity. I wrote about this development in the latest issue of Language Learning & Technology, a special journal issue on big data and language learning. I discuss the breakthrough represented by AGI:

Normally, an AI system will be able to deal effectively only within a narrowly defined domain, for which the system has been trained, so as to expect specific language patterns typically used in that context. Google Duplex, for example, does a remarkable job in conversing over the phone with human operators in making dinner reservations or reserving a ride on Uber. GPT-3, in contrast, has been shown to interact through language in a wide variety of genres and content areas: creative writing, journalism, essays, poetry, text-based gaming, and even writing software code. The Guardian newspaper ran an article written by the program, while the New York Times asked it to write about love. A blogger used GPT-3 to write multiple blog posts, subsequently receiving numerous subscribers and notice on tech websites. The fact that many readers were not able to tell that the GPT-3 generated texts were written by an AI system raises questions of trust and authenticity, mirroring the concerns raised about audio and video “deepfakes”, based on training an artificial neural network on many hours of real audio or video footage of the targeted individual.

The system represents a remarkable achievement in its ability to write in natural sounding language (idiomaticity, flow, cohesion). That ability is based on the collection and analysis of huge volumes of speech data collected by crawling the internet, including all of Wikipedia. GPT-3 translates that data into a very large (175 billion!) set of connections or “parameters”, i.e. mathematical representations of patterns. These parameters provide a model of language, based not on rules, but on actual language usage. That allows the system to predict speech sequencing, based on regularly occurring constructions of words and phrases, thereby enabling the machine production of natural-sounding language utterances. One can imagine how powerful GPT-3 could be integrated into a smart personal assistant such as Siri. We are already seeing interesting uses of chatbots and intelligent assistants in language learning. A company called LearnFromAnyone is building on top of GPT-3 a kind of automated tutor, which can take on the identity of famous scientists or writers.

While GPT-3 and other advanced AI systems represent a significant technical achievement, there are, as I discuss in the article, plenty of reasons to be cautious and thoughtful in their use, as is the case generally with big data in both social and educational contexts. While the language generated by GPT-3 mimics what a human might write in terms of language use, compositional structure, and idea development, the texts don’t always make sense in terms of lived human experience, i.e. demonstrating an understanding of social norms and cultural practices. Human beings have the advantage in communicative effectiveness of having lived in the real world and and having developed the pragmatic abilities to generate language that is contingent on human interactions and appropriate to the context. We also can use crucial non-verbal cues, unavailable to a machine: gesture, gaze, posture, intonation, etc.

I argue in the article that a human factor is a crucial mediating factor in implementations of AI systems built on top of big data, particularly in education. Learning analytics (collection of data about student academic performance) tends to treat students as data, not as human beings with complicated lives (especially these days). I discuss these and other ethical and practical issues with data collection and use in the context of D’Ignazio and Klein’s Data feminism (2020). The book explores many examples of inequities in data science, as well as providing useful suggestions for overcoming disparities in data collection (favoring standard language use, for example) and for recognizing and compensating for algorithmic bias.

Free academic speech or racial slur?

USC Professor Patton, removed from course

Earlier this month, the University of Southern California removed business professor Greg Patton from his classroom. His offense? In a lecture on linguistics, he used a Chinese word as an illustrating example of filler words (“um” or “like” in English). So far, so good, but that Chinese expression, 那个, or ne ga sounds a lot like a racial slur in English (the N word). That word is one that I have found to be tremendously useful when I’m in China. It means “that one” and comes in handy ordering in a restaurant when you can just point at a picture of a dish and say “ne ga”, i.e. I’ll have that one. Additionally, native speakers of Mandarin use it in conversation as a filler, as the USC professor was trying to illustrate, making the point that such words or sounds are common across languages. He made clear that the expression was Chinese (not English). Despite that, several African-American students took offense and complained. They wrote a letter to the dean of the School of Business, describing Patton as insensitive and suggested he be removed from his post. They wrote,

There are over 10,000 characters in the Chinese written language and to use this phrase, a clear synonym with this derogatory N-Word term, is hurtful and unacceptable to our USC Marshall community. The negligence and disregard displayed by our professor was very clear in today’s class.

In fact, the letter sent by the students is incorrect, in that the Chinese term is not a “a clear synonym with this derogatory N-Word term, ” in fact not a synonym at all, i.e. a word with the equivalent meaning. It is at most a homonym (words sounding alike), but that is not normally seen as significant or meaningful when you are dealing with two different languages.

As reported in Inside Higher Education, the complaint and removal have been controversial with a petition for Patton’s reinstatement stating:

For him to be censored simply because a Chinese word sounds like an English pejorative term is a mistake and is not appropriate, especially given the educational setting. It also dismisses the fact that Chinese is a real language and has its own pronunciations that have no relation to English.

The professor himself apologized to those students offended, but also reported to Inside Higher Education, “Given the difference in sounds, accent, context and language, I did not connect this in the moment to any English words and certainly not any racial slur.”

In a report on the incident in The Atlantic, a fellow professor (from UCLA), Eugene Volokh, suggested how the Business School Dean should have replied:

This should go without saying, but of course many languages have words that sound vaguely like English epithets or vulgarities, and vice versa … Naturally, USC students are expected to understand this, and recognize that such accidents of pronunciation have nothing to do with any actually insulting or offensive meaning. To the extent that our first reaction to hearing such a word might be shock or upset, part of language education (or education of any sort) is to learn to set that aside. The world’s nearly one billion Mandarin speakers have no obligation to organize their speech to avoid random similarities with English words, and neither do our faculty (or students or anyone else) when they are speaking Mandarin.

On the other hand, as the article discusses, this kind of reply, as reasonable as it sounds, does not take into account the real feelings of the USC students who were upset by the incident.

Advanced tech: No need to learn a language?

From Ciklopea (Juraj Močilac)

I’m currently in Belgium, attending a conference on language learning and technology (EuroCALL 2019). Many topics are presented and discussed at such conferences, but one which came up repeatedly at this one is the use of smart digital services and devices which incorporate voice recognition and voice synthesis, available in multiple languages. Those include Apple’s Siri, Amazon’s Alexa, and Google Assistant, available on mobile phones/watches, dedicated devices, and smart speakers. In addition, machine translation such as Google Translate is constantly improving, as artificial intelligence advances (especially through neural networks) and large collections of language data (corpora) are collected and tagged. There are also dedicated translation devices being marketed, such as Pocketalk and Illi.

I presented a paper on this topic at a previous conference this summer in Taiwan (PPTell 2019). I summarized current developments in this way:

All these projects and devices have been expanding continuously the number of languages supported, with as well language variations included, such as Australian English, alongside British and North American varieties. Amazon has begun an intriguing project to add additional languages to Alexa. An Alexa skill, Cleo, uses crowdsourcing, inviting users to contribute data to support incorporation of additional languages. Speech recognition and synthesis continue to show significant advancements from year to year. Synthesized voices in particular, have improved tremendously, sounding much less robotic. Google Duplex, for example, has rolled out a service which is now available on both Android and iOS devices to allow users to ask Google Assistant to book a dinner reservation at a restaurant. The user specifies the restaurant, date and time, and the number of the party. Google Assistant places a call to the restaurant and engages in an interaction with the restaurant reservation desk. Google has released audio recordings of such calls, in which the artificial voice sounds remarkably human.

Advances in natural language processing (NLP) will impact all digital language services – making the quality of machine translations more reliable, improving the accuracy of speech recognition, enhancing the quality of speech synthesis, and, finally, rendering conversational abilities more human-like. At the same time, advances in chip design, miniaturization, and batteries, will allow sophisticated language services to be made available on mobile, wearable, and implantable devices. We are already seeing devices on the market which move in this direction. Those include Google Pixel earbuds which recognize and translate user speech into a target language and translate back the partner’s speech into the user’s language.

Conference participant, Mark Pegrum, kindly summarized some of the other informationpresented in his blog.

The question I addressed at the conference was, given this scenario, will there still be a need for language learning in the future. Can’t we all just use smart devices instead? My conclusion was no:

Even as language assistants become more sophisticated and capable, few would argue that they represent a satisfactory communication scenario. Holding a phone or device, or using earbuds, creates an awkward barrier, an electronic intermediary. That might work satisfactorily for quick information seeking questions but is hardly inviting for an extended conversation, that is, even if the battery held out long enough. Furthermore, in order to have socially and emotionally fulfilling conversations with a fellow human, a device would need support far beyond transactional language situations. Real language use is not primarily transactional, but social, more about building relationships than achieving a goal. Although language consists of repeating patterns, the direction in which a conversation involves is infinitely variable. Therefore, language support needs to be very robust, to support all the twists and turns of conversational exchanges. Real language use is varied, colorful, and creative and therefore difficult to anticipate. Conversations also don’t develop logically — they progress by stops and starts, including pauses and silences. The verbal language is richly supplemented semantically by paralanguage, facial expressions, and body language. This reality makes NLP all the more difficult. Humans can hear irony and sarcasm in the tone of voice and receive messages accordingly. We understand the clues that nonverbals and the context of the conversation provide for interpreting meaning.

It remains to be seen how technology will evolve to offer language support and instant translation, but despite advances it is hard to imagine a future in which learning a second language is not needed, if not alone for insights it provides into other cultures. Smart technology will continue to improve and offer growing convenience and efficiency in providing language services but is not likely to replace the human process of person-to-person communication and the essentially social nature of language learning.