Optimise MongoDB queries: all the tools I used

Cris Pintea

July 25, 2022

Last week I had to optimise a lot of MongoDB queries. Here’s how I did it.

Introduction

I run uptimeline.com. It’s a website monitoring tool that makes thousands of calls every minute to check that websites are up and running.

Every call runs inside of its own AWS lambda function which needs to obtain the information of the website to check from the database. This means that there are thousands of lambdas that query the DB every minute.

Moreover these lambdas are spawned by a central lambda that executes once a minute and figures out which websites are due for checking and which are not. This means scanning the entire table of monitors on the DB every minute.

After releasing a large refactor that introduced the architecture I just described, I got a notification from MongoDB Atlas that my spending for the DB had spiked. I had burned hundreds of dollars in a matter of a few hours.

Cost spike of the Serverless DB instance in question.

I don’t like to think about optimisation until performance becomes an issue as a way to make sure not to over-optimise. Now however, the time to optimise had come. This article is a how-to guide for the past me to solving the issue I had.

How to analyse a MongoDB query

I’ll jump straight to the interesting part.

The MongoDB Explain command was the protagonist of the workflow I’ve devised to optimise my queries. Since I use “NoSQLBooster for MongoDB” as a GUI, which conveniently comes with a GUI for explain, I’ll be showing that here.

Here’s a query that’s not optimised

The way explain works is that you write a query and provide it as input to the explain command.

MongoDB query

In this collection there is no index on issues so I expect this query to scan the whole collection to find the documents that match the filter. Here’s what explain returns:

Result of MongoDB Explain inside NoSQLBooster GUI

What you can note is the following:

“No index available for query” indicates that no index was used
Documents returned tells us how many documents match our query, in this case 70k
Documents examined shows how many documents were actually read, in this case 6M

How to optimise your MongoDB query

If we created an index on the issues field we would save up 6M document reads every time this query is run. This last part is very important. The more often a query runs, the more important it is that it be optimised.

Therefore, in your effort to optimise your app, you should prioritise queries to optimise by how often they’re executed.

Here are the results of the same query after adding the index

As you can see the query used the index now, and the number of documents examined matched the number of documents returned. The execution time also went from 500ms to 130ms.

The full workflow

So you now know how to analyse a single query. Next step is to list all the queries that your app makes, and for each determine:

how often we make the query
how big is the collection the query runs on

Sort your queries by decreasing frequency * collection size and analyse and optimise each query like described above!

Optimising Aggregations

The same workflow works for aggregations. Here are a few things to keep in mind however that made me lose some time:

$facet aggregation stage doesn’t use indexes, ever. If you want to optimise a query with a $facet you need to remove the $facet. Rather split the query into multiple indexed queries.
$group aggregation stage only uses indexes in a very specific scenario, described here

Optimising queries on MongoDB Atlas Serverless

If like me you’re running on MongoDB Atlas Serverless, here are some tools that I have found handy.

Monitoring Dashboard

If your objective is to decrease cost, you want to minimise document reads and writes (here’s why). You can check your current performance in that sense from the monitoring tab of your cluster.

MongoDB Atlas Serverless monitoring RPU and WPU

Performance Advisor

A handy tool to prioritise queries is the Performance Advisor. It only shows the most obvious performance issues so don’t assuma that because it’s not showing you anything you’ve already optimised. However if your app has some serious performance issues, it might point you towards the worst performing ones.

MongoDB Atlas performance advisor

Profiler

Finally, through the profiler you can view your queries sorted by their documents scanned : returned ratio. Again it’s not perfect and won’t allow you to find all issues, but it’s a start.

MongoDB Atlas profiler

TL;DR

Key takeaways:

minimise the docs scanned : returned ratio
do this by making sure that all queries use an index
use the explain command to check wether queries use an index, and their docs scanned : returned ratio

and

you want to prioritise queries that execute often AND scan large collections
$facet doesn’t support indexes, $group only supports them in a very specific scenario

I hope you can now go optimise your MongoDB queries!

If you found my article useful and wanna receive useful stuff I learn and write about, consider signing up for my newsletter in the footer below!

Check out the rest of my articles