How to Improve MongoDB Performance

Using MongoDB Indexes to Improve MongoDB Query Performance

Vladimir Topolev
Enlear Academy

--

Indexes help to handle slow queries. If there weren’t indexes, the database would scan all documents in a collection to find targeted ones. The more documents in a collection, the more slowly query will be.

Indexes may be considered as an array of sorted keys in ascendance/descending order where each key keeps a reference in the particular document (figure 1). Key is a value of the field on which the document has been indexed.

Figure 1 — Index by last_name field

MongoDB uses a structure called a B-tree to store its indexes. If a collection contains N documents, a search complexity will beO(log N) with a query that used an index. Compare it with complexity, where DB has to scan each document — O(N) (figure 2)

Figure 2 — Compare O(N) vs O(log N)

How many indexes I can create per one collection?

You may create as many indexes per one collection as you want. But you need to keep in mind, that index creation isn’t free: with each addition index we decrease write speed for the whole collection. Every time there’s a new document inserted into a collection, a tree of indexes might need to be updated. Also, B-tree might need to be balanced if documents are removed from a collection or updated.

Is there a way to explore query effectiveness?

MongoDB gives a special tool, which allows exploring the effectiveness of a query. Let’s assume, that we have a collection where each document contents personal information among which there’s a unique personal identifier — ssn. We would like to find a person with the particular ssn equal to 683-46-9400 . To explore query effectiveness we need to invoke the method explain('executionStats') after our query as a chain method:

db.people.find({ssn: '683-46-9400'}).explain('executionStats')

It returns the following results (some fields skipped for brevity):

Figure 3

Here you may see, that winningPlan is COLLSCAN that means that Mongo scans all documents ( totalDocsExamined = 50474) in the collection to find only one document ( nReturned = 1 ) which match the query. No so good ratio between nReturned and totalDocsExamined .

Ok, let’s optimize this query, as you may already assume, we need to create the index for ssn field. Here is the pattern which allows creating an index:

db.<collection>.createIndex({<field>: <order>})

You may also use dot-notation for <field>to index by nested field, for example: address.street

Let’s come back to our case and create the index for ssn field:

db.people.createIndex({ssn: 1})

Let’s explore the previous query again:

Figure 4

Here, Mongo at first finds the targeted document in the index (stage=IXSCAN) and after that fetches the whole documents by reference which index keeps ( stage=FETCH). As you may see, totalDocsExamined=1 in this case, this significantly improved our query. Pay your attention here, that totalKeysExamined=1 — it means Mongo uses the index to find the document, in a previous case it equaled to 0.

Mongo may use indexes not only when you search one document, but also when you need to extract a range of documents:

db.people.find({ssn: {$gte: '455-00-000', $lte: '500-00-000'}})
.explain('executionStats')

Even, when you try to find documents not only by ssn field, but by last_name , Mongo will use indexes to optimize a query here as well. Mongo at first reduces a range of targeted documents using index by ssn field, and only after that filters them by last_name field.

db.people.find({
ssn: {$gte: '455-00-000', $lte: '500-00-000'},
last_name: {$gte: 'P' }
}).explain('executionStats')

What covered queries are?

In the previous example, we created the index with ssn field, and when we run this query:

db.people.find({ssn: {$gte: '455-00-000', $lte: '500-00-000'}})
.explain('executionStats')

Mongo at first finds all documents by index ( stage=IXSCAN ) and after that fetches all documents by references ( stage=FETCH ). So far so good. But if in your application you just need to show a list of ssn fields and don’t take care of other fields in the document, you may optimize this request even further by just setting projections for this query and define only fields which exist in the index:

db.people.find(
{ssn: {$gte: '455-00-000', $lte: '500-00-000'}},
{_id: 0, ssn: 1}
).explain('executionStats')

In this case, all expected information is inside of the index and we may totally skip FETCH stage to extract other fields (figure 5) and instead, it may be extracted from the index itself ( stage=PROJECTION_COVERED )

Figure 5created with usage of MongoDB Compass

Figure 5— Compare simple query with covered query

As you may see we eliminated FETCH stage and we don’t need to examine documents anymore ( totalDocsExamined=0 ) and query execution reduced from 6 to 3 ms.

The covered query is a query that only requested fields included in the index itself and it helps significantly to reduce query execution.

How does Mongo choose which index needs to be used if a collection has more than one index?

First of all, Mongo may use only one index to process a query even if a collection has two or more indexes. You may think that Mongo runs all possible plans with different indexes in parallel and picks only one which is more efficient and give result quicker. In figure 3 you may see the array of rejectedPlans which Mongo tries to apply to process query and winningPlan which actually turns out more effective than others.

Could indexes help to optimize a query with sorting by the particular field?

Of course, indexes may help to optimize a query with sorting. There are two ways to complete sorting in Mongo:
- in memory ( stage=SORT )
- using an index ( stage=IXSCAN )

Let’s imagine, that we don’t have any indexes for the collection so far. We need to find documents that last_name equals to Johnson and sort them by ssn field:

db.people.find({last_name: 'Johnson'}).sort({ssn: 1})

Here is a result:

Figure 6

First of all, Mongo should scan all documents to find all documents which match to query since there’re no indexes ( stage=COLLSCAN ). After that, all documents should be sorted and Mongo does it in memory ( stage=SORT )

Here you should keep in mind two variables: memLimit and totalDataSizeSorted associated with SORT stage. totalDataSizeSorted mean how many memory Mongo was used to handle sort in memory and if totalDataSizeSorted exceeds memLimit , Mongo server is going to cancel this query at all.

Ok, let’s optimize our query and add the index for the collection by ssn field:

db.people.createIndex({ssn: 1})

Let’s have a look at how the query processing has been changed:

Figure 7

Since we need to sort all documents by ssn field, instead of scanning all documents in uncertain order, it’s better to try to extract documents in the order of index and match each document with the target query. Here you see that we eliminate SORT stage at all since the order comes from the index in IXSCAN stage.

Compound indexes

So far we have used only single indexes (includes only one field to be indexed). Mongo allows creating of compound indexes which may include two and more fields:

db.<collection>.createIndex({
<field1> : <direction1>,
<field2> : <direction2>,
...
<fieldN> : <directionN>
})

Here you should know that the order of fields does matter here.

Here you should keep in mind, that if any of your collections have a bunch of indexes similar to these ones:

INDEX1: {<field1>: <order1>}
INDEX2: {<field1>: <order1>, <field2>: <order2>}
INDEX3: {<field1>: <order1>, <field2>: <order2>, <field3>: <order3>}

You should eliminate INDEX1 and INDEX2 since it’s enough to have only INDEX3 which covers all queries supported by INDEX1 and INDEX2. Here you just waste server time to create redundant indexes during insertion/deletion documents in DB. It’s called index prefixes.

Here you also need to keep in mind, if you would like to find documents by filed2 , INDEX3 isn’t applicable here, since field2 isn’t a prefix for field1-field2-field3 index (all prefixes for INDEX3 provided above)

Are there any highlights using compound indexes with sorting?

Here the same rule is applicable with the prefixes indexes. Let’s assume that we create this compound index:

{f1: 1, f2: 1, f3: 1}

Therefore the following queries will fall in this compound index and it will be optimized for both search predicate and sorting:

collection.find({}).sort({f1: 1})
// f1 is a prefix for f1-f2-f3

collection.find({}).sort({f1: 1, f2: 1})
// f1-f2 is a prefix for f1-f2-f3

collection.find({}).sort({f1: 1, f2: 1, f3: 1})
// f1-f2-f3 is a prefix for f1-f2-f3

collection.find({f1: 'val1'}).sort({f1: 1, f2: 1, f3: 1})
// f1-f2-f3 is a prefix for f1-f2-f3
collection.find({f1:'val1', f2:'val2'}).sort({f1: 1, f2: 1, f3: 1})
// f1-f2-f3 is a prefix for f1-f2-f3
db.collection.find({f1: 'val1'}).sort({f2: 1, f3: 1})
// f1-f2-f3 is a prefix for f1-f2-f3

db.collection.find({f1: 'val1', f2: 'val2'}).sort({f3: 1})
// f1-f2-f3 is a prefix for f1-f2-f3

But if you ruin the order of fields, Mongo couldn’t use this compound index anymore, for example:

db.collection.find({}).sort(f2: 1, f1: 1, f3: 1)
// f2-f1-f3 isn't prefix for f1-f2-f3

db.collection.find({f1: 'val1'}).sort({f3: 1})
// f1-f3 isn't prefix for f1-f2-f3
// Mongo finds targeted documents using index (IXSCAN) but
// it will sort documents in memory (SORT)
db.collection.find({f2: 'val1'}).sort({f1: 1})
// f2-f3 isn't a prefix for f1-f2-f2
// Mongo scan all documents following index order to eliminate sorting in memory

Take a time to understand this pattern.

Equality-Sort-Range rule

Let’s assume that we have a collection of restaurants, which contains the following fields: zipCode , cuisine , stars and we would like to complete the following query:

collection.find({
zipCode: {$gte: '40000'},
cuisine: 'Pizza'
}).sort({stars: -1})

We may create a compound key with all those fields that query fall in this index. But as usual — the order does matter.

Well, let’s have a look — if we were put zipCode as the first field in a compound index, then mongo would extract half of the documents from the collection, but it’s not very selective. zipCode in this query is a range query and in the worst case, it may even fetch all records from collection ( zipCode: {$gte: '0'} it literally extracts all records from collection)

It’s better to put as a first field equal condition since it’s better to reduce the number of targeted documents in the first step than a range query. In our case equality query is by cuisine field.

Well let’s assume that we created this compound index:

{cuisine: 1, zipCode: 1, stars: 1}

When you run the query, we will get this result:

Here we have a very good ratio between Documents Examined and Documents Returned, but mongo sort records in memory. And in this case, we may do it’s better. Here we need to remember that we can use an index for both filtering and sorting if keys in the query predicate are equality conditions. In our case search by zipCode is a range condition, therefore we couldn't use our index for sorting. Let’s then swap zipCode and stars in our query:

{cuisine: 1, stars: 1, zipCode: 1}

Let’s rerun query with new brand index in collection:

We still have a good ratio between Documents Examined and Documents Returned, but here we totally eliminated sorting in memory and we reduced execution time from 95 to 69 ms.

Use this phrase EQUALITY-SORT-RANGE when you create a compound index, putting fields according to this order gives the most optimized query.

--

--

Addicted Fullstack JS engineer. Love ReactJS and everything related to animation