App Development

Four Things To Know About the iOS Speech Framework

Voice experiences have become more common through interactions with assistants like Alexa, Google Assistant and Siri. What if you wanted to create a voice experience directly in your iOS app? Here are key things to keep in mind when working with Apple’s Speech framework:

Requests may require a connection to Apple’s servers

Apple’s Speech framework may require access to the internet, depending on what language it will be used for. This is important to keep in mind if your app needs to be accessible offline and if some users don’t have access to a reliable internet connection. More information on how to address this issue can be found here.

Apple allows for speech recognition on prerecorded or live audio The classes that allow for this are SFSpeechURLRecognitionRequest and SFSpeechAudioBufferRecognitionRequest. SFSpeechURLRecognitionRequest is helpful when developing in a noisy environment since speech could be prerecorded and won’t have surrounding noise cause issues with speech recognition. These are the steps for setting up a SFSpeechURLRecognitionRequest:

  1. Record your speech.
  2. Add the audio file to your Xcode project.
  3. Double check that the audio file has been added to the app target.
  4. Use the Bundle class to get the URL for the audio file.
  5. Pass the URL to the SFSpeechURLRecognitionRequest initializer.
var recognitionRequest: SFSpeechURLRecognitionRequest?

if let audioURL = Bundle.main.url(forResource: "prerecorded-audio", withExtension: "m4a") {
	recognitionRequest = SFSpeechURLRecognitionRequest(url: audioURL)

Check information after speech is final

SFTranscriptionSegment provides a way to access information about individual words from the recognized speech. It can be used to check the confidence levels, timestamps, and duration of words. Later in this post is an example of printing out the confidence levels. In this case, a confidence level is a decimal between 0 and 1. The closer to 1 the more confident Apple is in the accuracy of the speech recognized. It’s important to make sure the recognition task is finalized before checking those values, though. If it isn’t, the values could change by the time the task is finalized. The reason for the changing values is most likely because calculating them as the user is speaking is computationally intensive, and they may change as more speech is coming in. I noticed confidence levels of 0% when the task wasn’t finalized, and confidence levels as high as 90% after it was finalized. How is a recognition task finalized? It’s done by calling finish() on the SFSpeechRecognitionTask instance.

Use a timer to stop speech recognition

What if you wanted to stop speech recognition after the user has stopped talking for a couple of seconds? A possible solution is to create a timer that calls finish() on the recognition task after two seconds where no new speech is recognized. Since the task would be finalized and there’d be access to an instance of SFSpeechRecognitionResult, the confidence levels, timestamps, and durations values would be accurate. Here’s how that might look:

var timer: Timer?
var recognitionTask: SFSpeechRecognitionTask?
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { [weak self] (result, _) in
	guard let result = result else { return }
	if result.isFinal {
		// Check info on recognized speech here
		let transcriptionSegments = result.bestTranscription.segments
		let confidenceLevel = transcriptionSegments.reduce(0, { $0 + $1.confidence }) / Float(transcriptionSegments.count)
		print(“\(confidenceLevel) - \(result.bestTranscription)”)
	} else {
		self?.timer = Timer.scheduledTimer(timeInterval: 2, 
target: self, 
selector: #selector(finalizeSpeechRecogntion), userInfo: nil, 
repeats: false
@objc func finalizeSpeechRecognition() {


The important things to remember when working with Apple’s Speech framework are:

  • Internet connection may be necessary since some languages require reaching out to Apple’s servers.
  • Apple allows for speech recognition on audio files and live recordings.
  • Be sure that recognitionTask.finish() is called before checking info on the recognized speech.
  • Using a timer can help stop speech recognition after user has stopped speaking.

Join our team to work with Fortune 500 companies in solving real-world product strategy, design, and technical problems.

Find Your Role

Moving from Monolith to Microservices Architecture

When a client decides to move from a monolith platform to microservice architecture,...

Read the article