I’m currently in Belgium, attending a conference on language learning and technology (EuroCALL 2019). Many topics are presented and discussed at such conferences, but one which came up repeatedly at this one is the use of smart digital services and devices which incorporate voice recognition and voice synthesis, available in multiple languages. Those include Apple’s Siri, Amazon’s Alexa, and Google Assistant, available on mobile phones/watches, dedicated devices, and smart speakers. In addition, machine translation such as Google Translate is constantly improving, as artificial intelligence advances (especially through neural networks) and large collections of language data (corpora) are collected and tagged. There are also dedicated translation devices being marketed, such as Pocketalk and Illi.
I presented a paper on this topic at a previous conference this summer in Taiwan (PPTell 2019). I summarized current developments in this way:
All these projects and devices have been expanding continuously the number of languages supported, with as well language variations included, such as Australian English, alongside British and North American varieties. Amazon has begun an intriguing project to add additional languages to Alexa. An Alexa skill, Cleo, uses crowdsourcing, inviting users to contribute data to support incorporation of additional languages. Speech recognition and synthesis continue to show significant advancements from year to year. Synthesized voices in particular, have improved tremendously, sounding much less robotic. Google Duplex, for example, has rolled out a service which is now available on both Android and iOS devices to allow users to ask Google Assistant to book a dinner reservation at a restaurant. The user specifies the restaurant, date and time, and the number of the party. Google Assistant places a call to the restaurant and engages in an interaction with the restaurant reservation desk. Google has released audio recordings of such calls, in which the artificial voice sounds remarkably human.
Advances in natural language processing (NLP) will impact all digital language services – making the quality of machine translations more reliable, improving the accuracy of speech recognition, enhancing the quality of speech synthesis, and, finally, rendering conversational abilities more human-like. At the same time, advances in chip design, miniaturization, and batteries, will allow sophisticated language services to be made available on mobile, wearable, and implantable devices. We are already seeing devices on the market which move in this direction. Those include Google Pixel earbuds which recognize and translate user speech into a target language and translate back the partner’s speech into the user’s language.
Conference participant, Mark Pegrum, kindly summarized some of the other informationpresented in his blog.
The question I addressed at the conference was, given this scenario, will there still be a need for language learning in the future. Can’t we all just use smart devices instead? My conclusion was no:
Even as language assistants become more sophisticated and capable, few would argue that they represent a satisfactory communication scenario. Holding a phone or device, or using earbuds, creates an awkward barrier, an electronic intermediary. That might work satisfactorily for quick information seeking questions but is hardly inviting for an extended conversation, that is, even if the battery held out long enough. Furthermore, in order to have socially and emotionally fulfilling conversations with a fellow human, a device would need support far beyond transactional language situations. Real language use is not primarily transactional, but social, more about building relationships than achieving a goal. Although language consists of repeating patterns, the direction in which a conversation involves is infinitely variable. Therefore, language support needs to be very robust, to support all the twists and turns of conversational exchanges. Real language use is varied, colorful, and creative and therefore difficult to anticipate. Conversations also don’t develop logically — they progress by stops and starts, including pauses and silences. The verbal language is richly supplemented semantically by paralanguage, facial expressions, and body language. This reality makes NLP all the more difficult. Humans can hear irony and sarcasm in the tone of voice and receive messages accordingly. We understand the clues that nonverbals and the context of the conversation provide for interpreting meaning.
It remains to be seen how technology will evolve to offer language support and instant translation, but despite advances it is hard to imagine a future in which learning a second language is not needed, if not alone for insights it provides into other cultures. Smart technology will continue to improve and offer growing convenience and efficiency in providing language services but is not likely to replace the human process of person-to-person communication and the essentially social nature of language learning.
Thank you for sharing your thoughts!