Enable iOS 10 Speech Recognition in NativeScript Apps

Before iOS 10 was released, it was rather difficult to integrate Speech-to-Text capabilities (Not just Voice-to-Text dictation) into iOS applications. We had a myriad of frameworks to choose from and most of them were commercial products such as Nuance. iOS 10’s Speech Framework can empower your iOS apps with native, Siri-like, speech recognition. In this column, I am going to show you how to use NativeScript to access the Speech Recognition APIs.

If you are not familiar with the Speech API, I advise that you take a look at the documentation articles. You can see that the APIs are available in both Swift and Objective-C. At the time of writing, you can only access those APIs in NativeScript using the Objective-C syntax.

The application that we are going to build will continuously accept the voice input from the user and translate it into text.

NativeScript Speech Recognition app

Define the Native Classes and Objects

NativeScript allows you to easily access native APIs by exposing the native classes and objects to JavaScript. By reviewing the speech documentation, we determine which classes and objects we want to use, and declare them in the beginning of our page code:

declare var interop, NSLocale, SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest, 
SFSpeechRecognizerAuthorizationStatus, AVAudioEngine, AVAudioSession, AVAudioSessionCategoryRecord, AVAudioSessionModeMeasurement;

We can then get started by initializing our SpeechRecognizer and AudioEngine:

    locale = new NSLocale('en-US');
    speechRecognizer = SFSpeechRecognizer.alloc().initWithLocale(locale);
    audioEngine = AVAudioEngine.new();

Authorization

The speech recognition doesn’t occur on the local device. The audio is going to be recorded and temporarily sent to Apple servers for processing. Therefore, the app must request user’s permission to do so. To begin using the Speech Recognition API, we must add to our Info.plist the key NSSpeechRecognitionUsageDescription. That is the message we are going to display when asking for the user’s permission. Since our app is going to perform live speech recognition, we need the key NSMicrophoneUsageDescription that is the microphone authorization message as well. Our added keys look as follows:

 <key>NSMicrophoneUsageDescription</key>
    <string>Your microphone will be used to record your speech when you press the "Start Recording" button.</string>
    <key>NSSpeechRecognitionUsageDescription</key>
    <string>Speech recognition will be used to determine which words you speak into this device's microphone.</string>

Getting Started

To begin using the Speech Framework, we initiate a request for authorization:

    SFSpeechRecognizer.requestAuthorization(function(authStatus) {
        switch (authStatus) {
            case SFSpeechRecognizerAuthorizationStatus.Authorized:
                model.set('recordButtonEnabled', true);
                console.log('User authorized access to speech recognition');
                break;
            case SFSpeechRecognizerAuthorizationStatus.Denied:
                model.set('recordButtonEnabled', false);
                console.log('User denied access to speech recognition');
                break;
            case SFSpeechRecognizerAuthorizationStatus.Restricted:
                model.set('recordButtonEnabled', false);
                console.log('Speech recognition restricted on this device');
                break;
            case SFSpeechRecognizerAuthorizationStatus.NotDetermined:
                model.set('recordButtonEnabled', false);
                console.log('Speech recognition not yet authorized');
        }
    });

This will prompt the user to authorize Speech Recognition and Microphone access:

NativeScript Speech app

Once the authorization is granted, the user will be able to tap on the record button. To begin recording, a new audio session is made, the speech recognition request is created, and the input device that is associated with audio will begin receiving the audio buffer and transmitting it for translation. It will continue doing that until we stop speaking or we click on the Stop Recording button.

Creating the Speech Recognition Request

Let’s create the audio session and ensure that we can accept and record audio input:

    let audioSession = AVAudioSession.sharedInstance();

    let errorRef = new interop.Reference();

    audioSession.setCategoryError(AVAudioSessionCategoryRecord, errorRef);
    if (errorRef.value) {
        console.log(`setCategoryError: ${errorRef.value}`);
    }

    audioSession.setModeError(AVAudioSessionModeMeasurement, errorRef);
    if (errorRef.value) {
        console.log(`setModeError: ${errorRef.value}`);
    }

    audioSession.setActiveError(true, null);

Let’s create the Speech Recognition Request:

let recognitionRequest = SFSpeechAudioBufferRecognitionRequest.new();

If we don’t want to wait till the end to receive to view the transcription results, we can set the flag on the request:

recognitionRequest.shouldReportPartialResults = true;

We then create the recognition task to give us the results from the request. Here we are checking for “result”. If any is received back from the server, we display it in our Label using result.bestTranscription.formattedString which gives us the final transcription:

    recognitionTask = speechRecognizer.recognitionTaskWithRequestResultHandler(recognitionRequest, function(result, error) {
        let isFinal = false;

        if (result) {
            model.set('speechText', result.bestTranscription.formattedString);
            isFinal = result ? result.isFinal : false;
        }

        if (error || isFinal) {
            audioEngine.stop();
            inputNode.removeTapOnBus(0);

            recognitionRequest = null;
            recognitionTask = null;

            model.set('recordButtonEnabled', true);
        }
    });

This method tells the audio input node to append the buffer to the request every time audio is received:

    inputNode.installTapOnBusBufferSizeFormatBlock(0, 1024, inputNode.outputFormatForBus(0), function(buffer, when) {
        if (recognitionRequest) {
            recognitionRequest.appendAudioPCMBuffer(buffer);
        }
    });

That’s it! Now we can run our app and see the Speech Recognition in action!

Limitations

It’s important to note that there are limits for Speech Recognition in iOS. For instance individual devices may be limited in the amount of recognitions that can be performed per day. I recommend watching this WWDC 2016 video for more information.

Wrapping Up

In this tutorial, we covered the iOS 10 Speech Recognition APIs that NativeScript allows us to access. Although it is still limited to Objective-C, it is nevertheless a great example of utilizing the latest features of iOS in my favorite cross-platform framework.

The full source code is available in this github repository.

References

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *