<
Return to blog

Breaking Down AWS Prime-Day Compute Metrics

Here's a back of the napkin calculation of the cost of indexing 3 petabytes of data using BTrees and keeping the data available for fast query on a sharded cluster of running servers

By Craxel Founder and CEO David Enga

August 14, 2024

I love that AWS posts their prime-day compute metrics because you can extrapolate some interesting numbers about what certain capabilities cost.

This one caught my eye: https://aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2024-for-record-breaking-sales/

6,311 Aurora DB instances processed 376 billion transactions (storing almost 3 petabytes) during prime day. That's ~59 million transactions per instance. Conservatively assuming most activity happened during 10 hours of prime day, that's 10.4 million total transactions per second and ~1,654 per second per instance. Average size per transaction based on the storage number is ~8,000 bytes (which seems quite large) and that's ~1/2 a TB in storage per instance. (my apologies if I mess up any of the math or if I use any incorrect assumptions - consider this back of napkin calculation).

This Aurora DB cluster provides fast, selective query O(Log N) (I assume using Btree indexes) and ACID transactions. Assuming AWS is really good at pricing their products, we can probably assume that the cost to run this cluster will be similar to the cost to run other sharded clusters of relational DBs large enough to organize 3 petabytes of data with transactional consistency for fast query.

All we need to get a rough order of magnitude cost is to guess the instance size and then plug it into AWS' pricing calculator. The cost is going to vary quite a bit by instance type but will be hard to know for sure what instance type they're using.

db.r6g.xlarge instances is fairly modest and likely a decent guess (4 VCPU and 32 GB of RAM) - an estimated on-demand cost for 6,311 instances came out to be roughly $2,391,048.57 per month or ~$28 million a year plus storage and IO charges.

For us here at Craxel, this gives us a pretty good idea of the cost of indexing 3 petabytes of data using BTrees and keeping the data available for fast query on a sharded cluster of running servers. Hit us up here at Craxel if you would like to hear how our O(1) multi-dimensional indexing algorithm changes the cost structure for organizing petabytes of data for fast query.

Thanks AWS for sharing!!!