Alex Capecelatro, co-founder and CEO of Josh.ai takes us behind the scenes of the innovative company to show us how they arrived at their cutting-edge system, and where they plan on taking it.
How did the company achieve the concept for Josh.ai?
Almost four years ago, Tim Gill, my business partner, and I were independently remodeling our homes — his in Denver and mine in Beverly Hills. We were both exploring all sorts of options for technology and looking for easy-yet-powerful interfaces. As software guys, we lamented over the fact that the state-of-the-art didn't quite feel as modern as we would have liked. I was in the process of selling a software company focused on machine learning and a delightful user experience, and Tim was in the process of building an AI chatbot for fun after selling his last company for quite a bit. Over a ski trip we started exploring how aspects of his chatbot and my user experience and design focus could make for a home control environment we both would love. Initially the idea wasn't to start a business, but to imagine what the ultimate setup would be for each of us. It quickly became clear voice technology and AI were on the brink of taking off, and would dramatically change how we interact with our homes. We started getting excited about making this a reality. Good timing, complimentary skills, and now an amazing team of almost 30 has taken us from an idea to a fully realized set of products delighting clients across the country.
How does Josh.ai improve upon existing voice-control systems?
Josh.ai is focused on the luxury residential client, with a custom voice engine built for the market. The goal is to speak naturally without having to memorize specific phrases. This is achieved a few ways. First, the hardware, Josh Micro, is location aware. As a result, you can walk in a room and simply say "lights on" or "bring up the shades" and only the room's devices will go up. This makes it incredibly easy, not having to remember the names of devices or rooms. Next, Josh.ai accepts multiple requests in a single utterance, or what we call "compound commands." As a result, you can walk in a room with no predefined scene and say, "bring up the lights, close the blinds, and turn on Netflix." Further, Josh.ai utilizes a knowledge graph to determine the right content and deep link in. For example, if you said "watch Stranger Things," Josh.ai would determine that content is on Netflix, turn on the TV, switch inputs, and take you directly to the show.
Same thing with music, including songs, artists, albums, and genres. This is particularly difficult because a command like "turn on x" could be for music, video, or a general device and Josh.ai has to determine the right content instantly. In addition, Josh.ai has easy-to-use dealer tools to customize the names of rooms, devices, and scenes, including multiple aliases for each. Each room is also identified as belonging to a floor. As a result, Josh.ai gets smart about commands like "turn off the upstairs lights" or "play the Beatles everywhere except outside." The voice engine behind Josh is built in-house and always evolving, making for a constantly improving experience.
With so much relying on smart home systems — security, temperature, privacy, and so on — how can the information gathered be protected?
Privacy and data security are very important at Josh.ai. We strive to be a trusted entity even in the most intimate locations of the home. This is only achieved by having transparent policies and a dedication to the residential channel. As a rule of thumb, limiting the amount of information that has to go to the cloud is vital. Our goal is to ultimately process everything we can locally.
Currently we do all the NLP (natural language processing) local, device discovery and control, building configuration information, and processing in the home. We do have to go to the cloud for certain control like streaming Netflix or Spotify content, or accessing cloud devices like Nest thermostats and cameras. We also have to go to the cloud for the automatic speech recognition (ASR) in order to achieve high accuracy. As technology advances and we can process more of that local we fill.
Finally, we try to make it extremely clear to know when Josh is listening and to review and delete it. For example, if Josh is listening, you'll always see the LEDs light up in a spinning rainbow. Josh only listens after a wake word is said, and Josh only listens for about 15 seconds at a time. You can review what Josh heard on the web portal or in the mobile app. You can delete individual commands or the entire history if you'd like. You can also turn off logging, so any command given isn't stored anywhere, providing the maximum security for privacy-conscious clients.
How far are you looking to take Josh.ai in the future?
Our goal with Josh.ai is to continue pushing the frontier in user experience design and engineering. Voice is just the start, but we're already working on a number of advanced AI and learning features. A truly smart home needs to utilize machine learning to improve over time, and personalize to the owner's preferences. Our goal is to partner with the best hardware products from leading manufacturers in the industry, but we'll continue to build hardware when necessary for achieving the best possible experience. Personally I hope we never sell the company; I'd like to continue building and growing this and seeing just how far we can take the home.
What do you see as the role of AI and voice command in the smart home both now and in the near future (5 years)?
Voice control is already proving to be a significant interface for a variety of members in the home. Whether you're tech savvy and wanting the latest and greatest, or less excited about technology and wanting a simple system that just works, voice has the potential to be the leading interface much of the time. Of course, voice isn't the best interface for every task all of the time, but we're seeing clients who use it so frequently they forget there are other options.
AI will take voice control to a level of working seamlessly, including commands never programmed into the system. For example, if you have a room called "Gym," our system will already work if you said to "turn off the exercise room lights." The same will ultimately be true for commands like "I'm home" that were never specifically programmed. I see voice as a leading driver in smart home adoption and helping to make these systems more approachable and exciting to use.
Want more stories like this delivered to your inbox every day? Then sign up for the free Residential Systems eNewsletter here.
Looking even further (up to 10 years), where do you see the smart home and control systems?
It's really hard to predict 10 years out. About 10 years ago the iPhone was first announced. Since then the world has advanced so much. I think it's safe to say we'll still have luxury homes, the need for professional integrators, some level of installation and programming, and quite a bit of support and service. That said, new hardware will continue to come up, costs will continue to go down, systems will become hyper personalized and our expectations of the system will go up. For example, I think the home will utilize ambient sensors and data to be more proactive, optimizing the safety, energy, and health of a home for a homeowner's needs.
Currently, most home automation is really home control, and almost all voice is control not automation. We're working hard to bring about a future where your home is predictive, helps improve your sleep and eating habits, makes sure you don't forget something, get alerted when appropriate, and overall feel more content and at peace when home. I'm sure there will be some impossible to forecast technological breakthrough that will have a tremendous reach. I don't think it will be VR (virtual reality) or AR (augmented reality) necessarily, but 10 years out I would expect a new set of sensors or cameras or software that transforms our understanding of how humans and machines interact.