Here is a brief synopsis of my experience with Jasper, the open source platform for developing always-on, voice-controlled applications.
While these types of devices have been around for a while, the announcement of the Amazon Echo way back when brought the feasibility of voice controlled 'assistants' up quite a bit. While I did sign up for the ability to purchase this and I did promptly get accepted to purchase it, I couldn't quite pull the trigger on what was at the time $149 for Prime Subscribers. Coming to that realization, I once again thought to dive into getting an open source alternative working.
Enter Jasper. I've tried putting this together a few times in the past, but had little to no luck. Mostly due to my lack of patience, but some things were still a little broken. My parts list, while minimal, is as follows (Excluding the essentials for the R-pi such as power, SD Card, etc.).
To ease the installation process as much as possible, I followed the first method in their documentation. This basically involved flashing their pre-made Raspberry-Pi image and then following through the guide to install the remaining software. After that was complete, I had a very basic, functional build of Jasper up and running. At this point, it was using the PocketSphinx speech-to-text engine (STT) and the espeak text-to-speech (TTS) engine. At this point, you quickly realize how terrible the eSpeak TTS sounds and how inadequate the PocketSphinx STT engine is. Because of this, I set out to explore the options for each that Jasper has outlined quite well in their documentation.
As of right now, the Jasper documentation lists the following STT engines:
Of these five, I have tried three that I felt gave a good representation of the bunch. Those being Pocket Sphinx (default), Google, and Wit.ai.
While definitely lacking in accuracy, Pocket Sphinx does have some things going or it. First of all, it was quite simple to set up. Additionally, all the recognition and speech libraries are stored locally on the device so that no internet connection is required. This is especially beneficial for the paranoid types that don't want all their conversations being sent up into the cloud for processing. Again, the accuracy is lacking and I often found myself programming in contingencies for incorrect words and including several words that were 'close to' what I intended.
This engine was very good at recognizing what I said, which is really all this component needs to do. The downside(s) being that there appears to be a limited number of queries you can perform in a given day. Also, all speech is sent to the cloud for processing and then returned to the device as text. This kind of behavior shouldn't be a surprise to anyone that uses any kind of modern smartphone, but can still be a little unnerving when you're coding it yourself.
Very similar to the Google STT engine, but this one doesn't have a limit on daily queries. I did run into an issue where it would crash while idling. I'll chalk that one up to my specific installation though. Again, all processing is done via the cloud and is dependent on an internet connection.
As of this writing, the Jasper documentation lists seven different TTS engines:
- SVOX Pico
- Mac OS X
Again, I only tested a small sampling of these (eSpeak and Google). This small sampling was due to the fact that once I landed on the Google engine, I was quite satisfied and saw no reason to try others.
This is the default TTS engine used on a fresh install of Jasper. It has a very robotic voice and won't win any awards for being charming or eloquent. There are several different accents you can try to get a different feel for it however. Despite that, it still didn't quite fit the bill for what I was looking for.
Once I read that this is the same TTS engine used by newer Android devices, I was pretty much sold already. Voice reproduction is smooth and flows decently compared to the alternatives. Those familiar with Android devices and their voice functionality will of course recognize it as it is one and the same.
As an introduction into open-source voice control and becoming familiar with the systems and technologies that are required for such a thing, this was a great little project. I have yet to see if it will be a lasting installation in my apartment which depends mostly on how useful it is.
I am 100% a fan of the modularity that was built into this platform with regards to adding functionality. As an owner of Philips Hue lights, I immediately set out to throw together my own module for accepting various commands. During the course of just an evening, I was able to piece together a quick and dirty module that could turn individual groups on/off and dim the lights. The overall usability of this feature is very dependent on the STT Engine used. Naturally, if the device can't accurately decipher your commands, it has no way of knowing what you want it to do.
After a day or so of running, I did notice it becoming more sluggish and taking longer to respond. I would have to repeat the key word a few times for it to pick up. Even after picking up my query, it would idle for several seconds before actually responding. I have yet to generate a log file or hook up a terminal for looking at it long-term, but this could be remedied by a cron job that just restarts the program as a last resort.
In conclusion, for an open source project that appears to be a hobby of the original creators, this is a solid contender to fill the voice-controlled assistant void we never knew we had until now.
Subscribe to Cognipository
Get the latest posts delivered right to your inbox