In this post about Cosmos DB, I’d like to dive into what Requests Units are and what it means to work with them in Cosmos. A Request Unit, like all other units inside of Azure, compile things like CPU, memory and IOPS, so you have a unified pattern with which to work.

When it comes to Request Units, we typically work with a baseline of one Request Unit per second that would deliver a 1K JSON file back to you in a Read-only scenario. So, if I want to read 1K size of JSON document that is 1 Request Unit.

Request Units are also done by seconds. As you start to figure out the capacity you’re going to need, you need to remember it’s a per second operation; it essentially gets renewed every second. If I request 20 RUs per second, then every second I get 20 RUs for the region I’m working in.

As for writes, a write operation will require more of these Request Units than a read. Because when you write a document, its doesn’t just write the document in, maybe the actual process of writing the document into the database is 1 Request Unit, but how many indexes must be updated? Each of those will require a partial, full, or even more than one Request Unit.

To update all the indexes, what makes the whole process powerful is the ability to automate some of those processes, but you need to account for that when you’re estimating. I recommend that you look at the Cosmos DB capacity plan provided by Microsoft.

This looks at the number of different operations and helps you estimate based on your average document size and the number of times you’ll read it and write it in a second. Then it will give you a Request Unit recommendation to start with and you can decide to scale up or down as needed.

Another thing to be aware of is, unlike other capacities, you’re buying this capacity up front. It doesn’t carry over, get re-used or go away, as they’re putting an SLA behind the throughput that you’ve requested. So, if you request 20 RUs a second, that’s what you’re getting charged for. That is also the peak in which you’ll be throttled if you go over.

If you end up going to 25 RUs in that second, 5 of those requests will be throttled back and will either get errors in read scenarios or slow down the write process. Once you’re done being throttled, it resets and is ready to go. Most applications can absorb that without being a big deal, but if it happens regularly, you may want to increase the number of RUs so you’re getting the capacity that you need to support your application.

Another point, you’re reserving capacity at the region levels, not your entire application. If you have 5 regions in a read and you have 20 RUs per second set up for the read environment, you’re being charged for 100 RUs since you’re supporting every region at that capacity.

This region level is another area where you’ll have to do some capacity planning to make sure what you’re paying for matches what you need to have as capacity to support your apps. As you build out you’re going to provision per region, with your primary replica a bit higher as you’re going to have more write, as well as read, in that region.

Hopefully I’ve given you a better understanding of Requests Units. Basically, it’s just another unit of measure that we get to build by in Azure, but it’s important to match the scale that you need.

Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or  [email protected].