Reading Time: 4 minutes
Telenor’s Senior Research Scientist Pablo Ortiz is one of the experts trying to make automatic speech recognition (ASR) a reality through the “Scribe” project. This project consists of Telenor and a team of enthusiastic collaborators from the Norwegian national broadcaster NRK, the National Library of Norway, and the Norwegian University of Science and Technology (NTNU), who are exploring the broad societal impact of automatically transcribing speech to text.
From improving accessibility for the hearing impaired to advancing machine learning competence for ASR in Norwegian, this project extends far beyond Telenor customer service. It has the potential to empower both Norwegian society and its language.
“For me, this work is very important because of its direct impact on society, the direct application to industry, and because of the opportunity it offers to model such a complex phenomenon,” explains Ortiz. “I’ve spent half my life trying to understand complex systems in the tiniest detail, and almost everything, deep inside, behaves as a wave. In this project, we are building a model for speech – another wave.”
Ortiz holds a Ph.D. in theoretical physics from the University of Leiden. He has spent much of his academic career building theories for the pre-Big Bang era and testing them with experimental data from NASA and ESA. After leaving academia, he joined Telefonica (in 2016), using his expertise to predict human behaviour with machine learning. He moved to Oslo in 2017 and joined Telenor soon after.
How did you get into the speech–to–text project?
PO: I joined Telenor Norway in 2018, when the work on ASR technology was starting. I came on board to explore the technology to extract reason behind customers’ interactions with customer service. We proved that there was a huge potential to automatically classify all conversations with ASR. The project was paused in late 2019, and in 2020, I moved into Telenor Research and took the lead on this project, working to establish a larger network of academic researchers and increasing the momentum, and collaborating with Telenor Norway, specifically on the data side.
Why would automatic text analysis help Telenor?
PO: It would give us a live dashboard to understand whether customers are angry, happy, have shared issues, and more. This could help with product development, for example, to get a live read on customers to help us market and develop the products and to react even faster and on a more personalised level. It would also help train new customer agents, giving them examples of good and bad conversations from which to learn.
Doesn’t this technology exist already?
PO: Performance is acceptable for the English language but not for Norwegian. Partly because we have a massive amount of data in English, whereas, in Norwegian, it’s scarce. We need a great deal more if we’re going to capture the differences within the language, as people from Trondheim and Oslo have a completely different way of pronouncing some words. Nevertheless, data is not enough when you deal with spontaneous, dialectical speech with multiple participants. This research area is evolving rapidly, and we need new methods to handle real-life situations.
Are there applications for this technology beyond Telenor customer service?
PO: This can have so much impact on a societal level, which is why we applied for a grant with the Research Council of Norway (RCN) and have partnered with NTNU, NRK, and the National Library. Taking this to a broader level helped secure the funding, as the RCN doesn’t fund Telenor customer service projects.
What is it that you find personally motivating about this project?
PO: We are trying to understand how our brains encode knowledge of the world and languages, how we retrieve that information when we hear speech, and then translate it into words. To me, there is something beautiful about this mix of mathematics, physics, language, and neuroscience. It’s much more than data: blindly throwing data into a huge neural network is not the answer. My contribution to this research project is to provide the mathematical abstraction of the mentioned processes and their interaction. Having a meaningful mathematical description of the speech-to-text process will allow us to solve the problem properly. Then we can do something smart with the data.
What’s the next step for Scribe?
PO: It’s to get the data so that we can train and test our algorithms. Our partners can provide data, but if we can record and publish Telenor data (prior informed consent from customers), then the National Library can transcribe it for us, a task valued at several million NOK. Otherwise, we’ll start with NRK data to get the ball rolling. In parallel, we’ll be working on the research and development of new methods.
How long will this project last, and what’s the sentiment among the project members?
PO: The collaboration officially starts in August and will last four years, and it will involve supervision of Master and Ph.D. students as well. There’s so much interest and engagement with this project because everyone sees that we really have a golden opportunity. This is an example of a set of challenges we face in Norway, some of which have been mastered for high-resource languages but not yet for Norwegian. We see that this can have a great impact both for Telenor and Norwegian society-at-large.