Attending Adobe Summit? The WillowTree team will be there in full force.  Let’s meet up.
Craft At WillowTree Logo
Content for craftspeople. By the craftspeople at WillowTree.
Engineering

Text Searching with MongoDB

Let’s say you have a blog with tons of posts and want to give your readers an easy way to search for content without paging through a long list of previous articles. One way you can accomplish this is by providing users with a search field that makes an API call and returns all posts matching some of the specified criteria. But how do you get up and running quickly with text search on specified fields for documents in your collection?

One of the fastest ways is to use MongoDB’s $text operator. The $text operator works by providing a string to MongoDB which will perform a query against a collection and return all matching documents. There are other dedicated options available to you for performing this task as well, mainly Solr and Elasticsearch. And from what I’ve read, it seems pretty easy to migrate mongoDB’s built-in text search feature over to a service like Elasticsearch if you have a need to do so. So before getting started, let’s outline and take a look at an example document from a sample post collection:

{
    "_id" : ObjectId("55637f1dfe8766e1393ddd35"),
    "title" : "Why MongoDB is awesome 321",
    "author" : "55637e59fe8766e1393ddd34",
    "body" : "This is the body content in **markdown**",
    "bodyHtml" : "

This is body content in markdown

\n", "description" : "Post description", "tags" : "MongoDB", "coverImage" : null, "slug" : "2015/5/why-mongodb-is-awsome-321", "uniqueViews" : 4, "totalViews" : 16, "createdAt" : ISODate("2015-05-25T19:59:25.865Z"), "updatedAt" : ISODate("2015-05-25T19:59:25.865Z"), "status" : "ACTIVE", "publishedAt" : ISODate("2015-05-25T19:59:25.865Z") }

Now that the Post model is outlined, you’ll need to determine which fields provide the largest value to search against. For example, you might choose to search against the body and title, or maybe the body, title and tags. Once you’ve identified the fields, you need a way to tell MongoDB how to search your post collection, matching against the search fields you chose (for this example we’ll go withbody and title). To do this, you’ll add a MongoDB Text Index to your post collection.

NOTE: The $text operator is enabled by default in version 2.6 of MongoDB and is available in version 2.4 if manually enabled. An upgrade would be required for any version below 2.4.

To add a text index from the command line you can run the following

:db.posts.createIndex( { title: "text" } );

You can then verify that the text index was added by running:

db.posts.getIndexes();

And the output should look similar to the following:

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "portfolio.posts"
    },
    {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "title_text",
        "ns" : "portfolio.posts",
        "weights" : {
            "title" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }
]

Great, now all we have to do is add an index to search the body property:

db.posts.createIndex( { body: "text" } );

Oops…MongoDB’s text index only allows one text index per collection. So how do you perform a search on two or more properties? Well first, let’s get rid of our first text index:

db.posts.dropIndex('title\_text');

Now create a new $text index specifying the two properties we want to target:

db.posts.createIndex({
    title: ‘text,
    body: ‘text’
});

And there you have it. When you check the indexes on the posts collection, you should find the following among them:

{
    "v" : 1,
    "key" : {
        "_fts" : "text",
        "_ftsx" : 1
    },
    “name" : "body_text_title_text",
    "ns" : "portfolio.posts",
    "weights" : {
        "body" : 1,
        "title" : 1
    },
    "default_language" : "english",
    "language_override" : "language",
    “textIndexVersion" : 2
}

Now that both indexes are set up lets make a query and test it out. I’m going to use the Node.js MongoDB driver, but you can reference the documentation of the driver specific to the language you’re using, and it should be very similar.

/**
 * Takes user input ‘search’ string out of the query parameters and passes the value to
 * MongoDB’s text query.
 *
 * @method search
 * @return {Array} posts
 */
search: function(req, res) {
    var query = {$text: {$search: req.query.search, $language: 'en'}};


    Posts.find(query).toArray(function(err, posts) {
        if (err) return res.status(500).json({error: “Internal Server Error”});
        res.status(200).json(posts);
    });
}

That’s all there is to it. You may have noticed the optional $language parameter. By providing this parameter, MongoDB will exclude stop words specific to that language. There are some additional features not covered here like wildcards and document exclusion, so for further reading I recommend checking out docs.

Matthew O'Connell

Recent Articles