Apple just released information about their new SiriKit API this week. With SiriKit, you can provide ways for your users to use Siri when interacting with your app. However, there are some caveats about what exactly you can and can’t use SiriKit for. It’s not as wide open as you might think.

What can you do with SiriKit?

SiriKit supports seven specific domains of functionality:

  • Audio or video calling
  • Messaging
  • Payments
  • Searching photos
  • Workouts
  • Ride booking
  • (and in a footnote) Automotive Commands for CarPlay

Each domain above defines a group of what are called intents. An intent is the object Apple uses to represent a user’s “intention” when a request is made through Siri or Maps.

What can’t you do with SiriKit?

Anything that doesn’t fall into one of the specified intents for one of these domains of functionality. So, a lot. This is a pretty closed scope for integrating with Siri. Apple is using the idea of intents to encapsulate the things Siri can do well. With this being the first iteration of this API, I imagine Apple will add additional intents in the future, but for now, there isn’t a way to make a custom intent.

How does SiriKit work?

blog-post-image SiriKit 01 BF (source)

SiriKit uses these intent objects to communicate with Intents Extensions, the app extension that allows your app to interact with Siri. There are three responsibilities your app extension will be in charge of:

1. Understand the parameters of the user’s intent The intent object is created with user-supplied information, for example, when creating a ride, the extension would need to understand the starting point and the number of people. Siri asks the app’s intents extension for a handler object for the specified intent, and then the intent is passed to that handler for further processing. This is where your intent handler can resolve any data based on the user information provided and prepare for actually handling any actions later.

2. Show the user what will happen if/when they confirm the intent Siri will then ask the app’s intents extension how it plans to handle the intent. At that point, Siri may provide the user with information about how the app plans on handling the intent, in the case that it needs user confirmation. You are able to augment what Siri displays to the user using an Intents UI Extension which provides a view controller that Siri will integrate into the views it displays. This view controller participates in the normal view controller processes so, for example, you may provide events that happen based on timers. However, this view controller won’t receive touch events, so it’s not quite as dynamic as a normal view controller would be in your app.

3. Handle the intent Once the user accepts the planned actions, Siri asks the handler to go ahead and handle the intent.

What do I provide?

1. Vocabulary Your app’s intents extension is able to aide Siri by providing app or user-specific vocabulary to use. Siri will use this information to help fill out the intents objects based on the user’s verbal response. For instance, in your app, your user may have a nickname for one of their contacts. You can provide this information to Siri so that it knows how to link the contact to a given nickname.

2. App logic This is the logic for how your app’s intents extension responds to and handles the intents passed to it.

3. User interface Using an Intents UI Extension you can customize the way Siri displays the intent handling to the user. This can be simple branding or adding controls that Siri wouldn’t normally have included.

Other SiriKit caveats

1. Permissions Before SiriKit can be used you must request authorization from the user, similar to NSNotifications. You can provide a use description in your info.plist file to help explain to the user how you plan on using this integration. Just like with notification permissions, the OS will only ask once and store that answer for response to future requests.

2. Localization One of the nice things about SiriKit is that out of the box it handles all 36 locales that native Siri can handle without your app having to do any additional localization.

3. Contextual responses Siri takes into account how she is summoned to determine what level of detail needs to be displayed versus spoken. For instance, if the user says “Hey, Siri” to request an intent, Siri will recognize that the user is probably in a scenario where they can’t interact with the phone with great dexterity (for instance, driving their car) and Siri will opt for speaking over displaying information.

For further reading…