Datastore & transactions & batching

Colt McAnlis
3 min readAug 9, 2018

COIN ALL THE THINGS (CATT) is a company riding the wave of Cryptocurrency technology, and if their IPO paperwork says anything, they are serious contenders in the space.

One service they have in their technology stack is to mirror validated transactions on the blockchain, back in typical cloud relational storage, so that they can more quickly deliver results to users of their service, who want to look at historical records.

Being a Cloud Native application, CATT reached out to me when they realized they were having a problem with recording transactions to Datastore. Turns out they were limited by the “only 1 update per second” recommendation of the documentation, and wanted to figure out if there was a way around things.

After a few emails and a couple hours looking at their tech stack, we were able to pinpoint a critical optimization for their Datastore usage : transactions & batching.

What’s a transaction?

In a nutshell, a transaction is how you do a group of operations with Datastore. As long as those operations can all occur independent of each other, (e.g. not overlapping on each entity group), they can be done in parallel.

So, for example, updating two properties on the same entity as two separate operations would not be an independent operation, since the same entity group is touched for both operations.

However, updating the data on two separate entities, residing in two separate groups, is an independent action, and perfect to add into a transaction.

And for CATT’s banking model, they were constantly doing operations where users were trading funds between accounts, which means it’s a perfect opportunity for improvement.

The difference, really

Before committing to the changes, CATT wanted to see a small test to show the performance differences, which we were happy to oblige. I set up a small test which subjected the two use cases. The first one linearly applied updates to a number of entities who were in different entity groups, and the second test applied the entire set as a transactional action. Note the difference below.

Much like other types of batch systems, the per-instance overhead of the action is what dominates the time here. Each linear execution call has overhead to spin up, handle the HTTP request, make the change, and return. Meanwhile the Transaction allows all that overhead to be mitigated across a set of calls that are all grouped together.

Better performance, less restrictions

It’s worth noting a couple limitations about Transactions.

  1. Transactions are capped to 25 ops per call.
  2. Transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them are applied.
  3. As the amazing Valentin Deleplace points out, if you’re really in the market for performance, the batching operations are the way to go.

Much like transactions, batch operations are more efficient because they perform multiple operations with the same overhead as a single operation. You can leverage transactions using the get_multi and put_multi commands. When we put the three, head to head, we see quite a performance difference:

The hype is real

As far as COIN ALL THE THINGS was concerned, the main takeaway here was that using some sort of grouped operation (be it transactions, or batching) resulted in a significant performance improvement over the linear code they were using before.

For your use cases, transactions and batches both offer a faster way to execute groups of operations, and each come with their own pros/cons, again Valentin offers the following advice:

  • If you are confident that all entities to be updated are distinct (e.g. a gigantic one-shot write of millions of entries), then batching is really the best choice.
  • If you need strong consistency, and contention is expected to be low (rare concurrent writes), then multiple clients + multiple workers + transactions is a good idea.

--

--