App Development

Apple's Natural Language Processing (NLP) API

Eyeglasses sitting atop a MacBook laptop, next to an Apple Magic Mouse on a wood table.

Enhancing your mobile app’s features with machine learning is becoming more and more popular. Gathering intelligence from natural language input is one of the most complex and sought-after AI features and it has a lot of underlying processes involved. There are multiple libraries and services that can perform natural language processing tasks for you, and it can be quite challenging to make a decision on which one to use. We’ll walk you through Apple’s native API for natural language analysis and give you some code examples in Swift to try out.

Let’s start off with what Apple’s NLP API can do:

  1. Language identification - Leveraging machine learning to detect the language of the text.
  2. Tokenization - Segmenting the text into words, sentences and paragraphs.
  3. Parts of speech analysis - Identifying the part of speech that the current word belongs to.
  4. Lemmatization - Identifying the lemma (dictionary form) of the word.
  5. Named Entity Recognition - Extracting entities of certain types (such as names of organizations and people).

Apple’s natural language processing APIs are not all new. NSLinguisticTagger is the class in Foundation framework that provides the interface for analyzing natural language. It has been available since iOS 5. So what has actually changed in the latest release of 2017?

According to Apple, the implementation of NSLinguisticTagger has been completely revamped in the following areas:

  • Higher accuracy - About 90% accurate results for part of speech identification and named entity recognition for English texts.
  • A lot more languages - Language recognition is available for 52 languages. Lemmatization, part of speech, and named entity recognition are now available in English, French, Italian, German, Spanish, Portuguese, Russian, and Turkish.
  • Faster and multi-threaded - Part of speech tagging speed, for example, has been estimated as 80,000 token/sec compared to 50,000 tokens/sec from the older implementation. Named entity recognition speed has increased from about 40,000 to 65,000 tokens/sec. Basically, think processing hundreds of articles in a matter of seconds.
  • Highly optimized on-device for all platforms.

As for the code for all of these different NLP tasks, well… It is rather short and very similar for each task. In all cases, NSLinguisticTagger class is initialized with an array of NSLinguisticTagSchemes. As you will see in the examples below, the NSLinguisticTagSchemes correspond to the kinds of information that you want to retrieve with NSLinguisticTagger. The returned result would be a collection of tags (constants declared in NSLinguisticTag class) that will depend on the specified scheme. The enumerateTags method allows you to go through all tags in the desired range for the desired scheme and call a certain block for each of those tags.

Language identification

let tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0) 
tagger.string = "NSLinguisticTagger provides text processing APIs." 
let language = tagger.dominantLanguage 

Tokenization

let tagger = NSLinguisticTagger(tagSchemes: [.tokenType], options: 0)
tagger.string = "NSLinguisticTagger provides text processing APIs."
 
let range = NSRange(location: 0, length: text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace]
						
tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { tag, tokenRange, stop in
						
let token = (text as NSString).substring(with: tokenRange)
     // Do something with each token 
} 

Lemmatization

let tagger = NSLinguisticTagger(tagSchemes:[.lemma], options: 0)
let text = "Great hikes make great pics! Wonderful afternoon in Marin County."
						
tagger.string = text
let range = NSRange(location:0, length: text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace]
						
tagger.enumerateTags(in: range, unit: .word, scheme: .lemma, options: options) { tag, tokenRange, stop in
						
if let lemma = tag?.rawValue {
// Do something with each lemma						
} 
} 

Named entity recognition

let tagger = NSLinguisticTagger(tagSchemes: [.nameType], options: 0)
let text = "Tim Cook is the CEO of Apple Inc. which is located in Cupertino, California" 
tagger.string = text
				
let range = NSRange(location:0, length: text.utf16.count)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames] let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
						
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) { tag, tokenRange, stop in
						
if let tag = tag, tags.contains(tag) {
let name = (text as NSString).substring(with: tokenRange)
// Do something with the name						
}
} 

Isn’t it great that such a complex task as natural language analysis can be done in only a few lines of code that is easy to understand?

There are a couple helpful debugging tips that were suggested at NLP session at WWDC 2017:

  1. Explicitly set the language if it is known to increase accuracy. There are orthographically identical words, i.e. words that are written precisely the same in different languages. If you ever had to identify the language from a string with a single word that is also present in other languages, you might not get the result that you expect.
  2. If after performing the part of speech tagging or named entity recognition analysis you see that the resulting tags have NSLinguisticTagOtherWord type, that could mean that your device is missing the corresponding machine learning model. Apple delivers machine learning models to devices over-the-air. They constantly improve and update their models with new ones that have higher accuracy. For iOS, as soon as the keyboard in a particular language is installed, the models and all the assets relevant to the keyboard language will be added to your device.

One final note: Apple continues to place emphasis on privacy so when you use NSLinguisticTagger, all the data will be processed solely on the device. Also, if you are building counterparts for your app on other Apple platforms, the user experience will be the same. For these reasons, if you need to solve NLP tasks mentioned earlier, the usage of Apple’s native API is definitely worth considering.

__Resources: __ For deeper information on how to use NSLinguisticTagger, please see Apple’s official documentation.

You can also watch the recording of WWDC 2017 session on NLP here.

Quickstart-Guide-to-Kotlin-Multiplatform

A Quick Start Guide to Kotlin Multiplatform

Kotlin Multiplatform, though still experimental, is a great up-and-coming solution...

Read the article