Improving Accents Through AI & Innovation, w/ Xavier Anguera, CTO & Co-Founder @ ELSA

Q: Share a bit about yourself and the journey that brought you to ELSA.

I am from Barcelona, Spain. My background is in telecom engineering, electrical engineering, and some computer science, too. I worked a bit with speech processing and fell in love with it. It was the hardest thing I have worked on but that’s what attracted me to it!

I ended up going after a PhD abroad, and did my due time in the corporate world in telephonica. There, I did a lot of research and attended many conferences but I was missing the feeling of working on a project.

I am not the CEO type, but the CTO type. I went for ELSA, and it’s almost been 6 years. To explain, ELSA is a speech pronunciation assessment technology that has nearly 14 million downloads now. We pinpoint phonemic errors and give tips for improvements in intonation and many other levels. It is not for learning the language, but rather learning to speak it properly.  It was inspired by the limitations my partner and I’s accents caused. 

 

Q: How did you see ELSA from inception to where it is now? When you look back and see your progress, what did strategies did you applied along the way etc?

There are some companies that start and pivot their ideas until they get it right, sometimes they never do. ELSA was never that way, Vu already had a clear idea and was determined to get it done.Over the 5 years, we plan yearly goals regarding technology, safety bets, etc. There is still so much to do regarding robust technology, features, and details but we are on it! The goal for ELSA has always been very clear, I joined as an expert in the field of accents and speech because I had previously worked with previous recognition and detection and what not, and that made it obvious that i had to take on bigger roles in that field. I took on roles with the technical and engineering side of ELS, but also had to learn managing skills. We started as 4 founders from various areas of the world, we started remote work before it was even a thing, it was first on skype but then evolved into more things. Having teams in the same spot with the same time zone is different than having distributed team members all over the world. It was challenging to cooperate and figure out what works for each member, but it was a great learning experience!


Q: What are some insights about the challenges with the technical work and AI at ELSA?

People ask me all the time if it’s a speech recognition software, and that opens the door for explanation.

Nowadays, advancements have changed what speech recognition means. However, what it tried to do is to recognize the input audio you spoke. We worked according to the standards, but had to deviate from it very early on because the speech assessment technology differentiates how you speak versus how you should speak.

Our goal was to have the right detection between the American English phonemes and explanation of the errors in pronunciation. That is exactly what makes us unique, because that technology is not easy to find. We evolved the engine to the quality we have today, and will keep improving and developing it for sure!


Q: How does the process and cycle of innovation work at ELSA?

We’re very fortunate that our team of researchers have great experience in linguistics and speech. Our research is driven by ideas, developing them and launching them.

The second motivation for us is problem-solution based. The management team shared the importance of recommendation engines that helped us learn, get some ideas and test them out before doing anything else. We start manually annotating speech and testing the algorithm to see if it is doing what it is supposed to do, which takes many resources and a long time.

Then we brainstorm some ideas without limitation regarding feasibility etc, but we have clear boundaries.  We try to work based on numbers but sometimes passion gets in the way and we need help getting back on track.

It takes a lot of work on various levels before giving life to an idea. We work according to the user journey and modern tongue, the language and speech level; we monitor the user’s skills and match them with our programs. We have a coaching system that recommends certain levels, sounds practices and what not. That feature is our most used one!

Q: What is your process for working with algorithms when you hit a dead end?

This goes to the core of how research is done — researchers need to know their craft. We read multiple papers on the topic that gives us an idea on the algorithm, based on that, we download it and that becomes our baseline for working with standard algorithms. We analyze the output and try to see where the issues are, and try to come up with solutions for that. After that step, we develop a solution and track progress and evaluate in comparison to our start point and go from there.

Q: What is your process for finding the right people for your team?

It’s complicated. We tried everything to find the right team members, the thing that worked the most is connections, recruit someone that you know from someone else’s recommendations, friends and what not. I always ask myself in conferences, “who might know somebody?” and based on that I get started.

It can be very hard because it’s expensive and start ups can’t always afford that but we are very lucky in that sense. We don’t have many limitations regarding location and time zones. Our strategy is to know the right people and get the work done, regardless of if the employee is in a rural village. As long as they have the necessary means, then they’re in.

We talk to potential employees first, background check and set clear expectations, then give them a challenge that gives us a better understanding of their expertise. After that, we evaluate their work and have them explain their process of solving the test. Teamwork is very important for us, and our strategy works really well for us! It happens, sometimes, that we hire the wrong people and from there, we evaluate if they are a lost cause or have room for improvement.

Q: What is the process of content creation?

We don’t outsource it because it is very important to us to offer our users the right content. We have a team of linguists that have a background in pronunciation teaching, give them a specific task for content creation. After developing the content, we pass it through a set of programs that tests the program and gives us an idea of possible problems or errors. Unexpected things sometimes happen, but our process helps ensure our program works from start to launch; it might take a couple of weeks but it’s definitely worth it!

Q: How do you integrate AI into your work?

40% of companies in Europe claim to work with AI but, in reality, they don’t. With us, fortunately, AI was integrated from the very beginning. We always wanted to use the most accurate AIs that will help our users, we wanted an acoustic development with different pronunciation options. We also needed the error detector to provide correction and advice for improvements. It wasn’t that good at first but the accuracy rate evolved and grew, we faked it will we made it!

Q: How do you deal with the constant changes and developments in the world of algorithms?

We don’t change ours every time something new comes up because that’s just not realistic. We have other features to work on and develop and we operate based on areas of need. In the case that we see anything new that we like, we test it out and evaluate whether it fits in our systems or not.

Q: Do you take into consideration the original language of the person when coaching them for their accent?

Absolutely yes. We keep track of the mother tongue of the users because it paves the way for how we help users. The mother tongue helps us notice a pattern in certain errors and gives us better understanding of how to fix these errors. It’s not high-tech AI but our users are big fans of it.

Q: Can you start augmenting data by using “text-to voice” from google and then train your model?

It depends on the task you are trying to solve. We have experimented with many different types of data sets, but that needs a lot of caution because it might be pure english. External data sets need background research to see if they fit with our standards and programs. Therefore, it all depends on the area of operation, and the problem. We bootstrapped for our initial data and it was very simple, we had some volunteers help us by providing audio. Open english data sets also helped us train our model. 

Q: How does the metric for user engagement influence your work, business side and user experience?

You don’t always need to look at the market to know what to do. Looking into the future helps us work on things that will have an impact and come in handy later. We implement what will move our metric in the short term. However, on the AI side, developments are more long term focused such as accuracy of detection. These things don’t happen overnight and need months for proper development. If time and time again, we get things wrong, users will abandon us, hence time is critical. Retention and user visits are very important but we don’t need to look into that as much as we need to look farther from that.

Q: How do you decide on  your rollout, features, and launches etc between various platforms?

We don’t normally release things on all platforms, we choose a specific feature for each platform. It can be a double edged sword, because what works on iOS might not work for Android. Users are very different from one software to another. Hence we have distinct features for each. I am looking into the possibility of having a unified platform. It is not happening anytime soon, but it’s closer than it was before. We are dealing with the platforms available and rolling with the flow of things.

Q: What is your approach to processing the different data?

There is a lot of potential for processing on the edge but we don’t do it. We have data processing on the back that helps us keep things on track on all different levels. Our user experience helps us know which path to go and which path to stray from. In the future, we may look into edge computing and doing mobile processing but not anytime soon.

Q: What about bandwidth optimization and internet connection?

It is very optimized in that sense, we face some difficulties when users don’t have good enough connection and complain.We will have to look more into it.

Q: What is your future vision for ELSA?

Our goal was decided 5 years ago, from the very beginning. We want to be the best in our field, through the different features and acoustic models that work for non-native users. Eventually, we want to go into more languages, teach spanish to americans etc. You will have to be on the lookout to know more! 

Previous
Previous

Pitching to Angel Investors - with Russell Brand

Next
Next

How To Build a Disruptive Startup, w/ Hamid Shojaee, Founder @ AZ Disruptors