Datomic: Direct index lookup (August Lilleaas' blog)

A couple of years ago, I read about someone whose MySQL performance bottleneck was SQL parsing. Most of the queries were straightforward primary key lookups, so the author wrote code that called InnoDB (MySQL's storage engine) directly and removed the bottleneck. I thought this was really cool, and I feel like I've reached a small milestone today for doing the same thing myself, only in Datomic.

In our system, we have a querying API that lists everything if an empty query is passed. This means we need to ask datomic for potentially 1000s of entities, and also make it happen in a reasonable amount of time. In the future, we we'll probably be more clever and lazily load stuff, but I wanted to see if there was a quick fix, since lazy loading in the UI takes a lot of work to get right.

Here's the old code:

(require '[datomic.api :as d])

(map
  (fn [public-id]
    (d/entity
      db
      (ffirst
        (d/q
          '[:find ?attendant
            :in $ ?pub-id
            :where
            [?attendant :attendant/public-id ?pub-id]]
          db
          pub-id))))
  public-ids)

The attribute :attendant/public-id is a string attribute where we insert Datomic squuids. We only expose these IDs to the outside world, the Datomic entity IDs stay internal to Datomic.

In the extreme cases, 2600 such public IDs is what we mapped over, resulting in 2600 queries being performed. This took about 1 second, and was not acceptable. The solution is to not query at all, but just look up the value in the Datomic index. This ended up taking about 0.1 seconds, so a ten-fold performance increase. Not bad at all! Here's the updated code:

;; Yields exactly the same output as the above query
(map
  (fn [public-id]
    (d/entity db (:e (first (d/datoms db :avet :attendant/public-id public-id)))))
  public-ids)

Datomic has multiple indexes, and all of them are full copies of the data. So looking things up in an index is the only thing you need to do in order to access the data, and this is what the Datomic query engine does under the hood anyway, since all your data in the database is available to you as an immutable (and lazy) value.

The code to d/datoms flows as following:

Pass the datomic db whose indexes we want to look up.
Pass the index to use. We want the :avet index. (Attribute, value, entity, transaction)
- The following arguments are the components to look up in the index.
The first entry in :avet is :a. So we pass :attendant/public-id, the attribute we want to look up.
The second entry in :avet is :v. So we pass public-id, the value we want to look up.

This will give us a raw seq of facts. Since we know that the attribute :attendant/public-id is set to unique, we can just get the (first) one.

Aside: We could also just have passed one component, the attribute, and gotten back a sequence of facts for all facts for that attribute.

When we have our fact, we extract the entity ID from it. The fact is map-like, so all we need to do is (:e fact).

Then we proceed as normal, and look up the full aggregated entity just as we do with the entity IDs we get from Datomic query results.

So there you go! Again, we should probably be more clever and lazy load stuff, but we just wanted a quick fix, and a ten-fold performance increase is more than good enough for now.