AppleInsider reports today on a post on Apple’s Machine Learning blog that details exactly what happens when you use the “Hey Siri” command with your iOS device.
In particular, the blog post explains the role of the detector, which is a speech recognizer that is always listening out for the special wake-up words, but also has to be able to filter out other noises.
Apple explains that the hardware inside your iPhone or Apple Watch, in particular, turns the human voice into a stream of instantaneous waveform samples at 16000 per second. Approximately 0.2 seconds of audio at a time is sent to a “Deep Neural Network,” which classifies it and then determines the likelihood of it being the activation phrase and then sends that to the rest of the system.
Several sensitivity thresholds are set, and if the score is in the median range, then the software will listen more closely for the second time so that the phrase is not missed.
The waveform goes to the Siri server, and if the speech recognizer hears that something else has been said other than Hey Siri, such as “Hey seriously,” then a cancellation signal will be sent by the server to the phone so that it will go back into sleep mode.
Apparently, the Apple Watch being so much smaller than an iPhone leads to a unique set of problems that are solved by having the Hey Siri detector only running when the watch motion coprocessor determines that the wrist has been raised.
Sources: Latest Apple machine learning research paper discusses how the 'Hey Siri' invocation works