As AI models reach excellence in speech recognition and synthesis, text processing, and multimodalism, the ultimate voice-user interfaces could soon be ubiquitous
It was a typical Friday afternoon right at the end of a long week of work on our project developing a radically new concept and app for molecular graphics in augmented and virtual reality, when I found myself in a heated discussion with my friend and colleague. He is a “hardcore” engineer, web programmer, and designer who has been in the trenches of web development for over a decade. As someone who prides himself on efficiency and control over every line of code and especially who always has the user and user experience in mind, my friend scoffed at my idea of voice interfaces becoming soon the norm…
“Speech interfaces? They’re immature, awkward, and frankly, a little creepy”, he said not with these exact words but certainly meaning them, and voicing a sentiment that many in the tech community share. And this was already after having kind of convinced him, maybe by 30–50%, that our augmented / virtual reality tool for molecular graphics and modeling absolutely needs such kind of human-computer interaction because since the users’ hands are busy…