Querying Datomic for the history of an entity

Published January 20, 2013

This post is part of a series: Datomic

The main reason for choosing Datomic for our project was our requirement to have a page that lists all the changes that ever occurred to a specific entity. This turned out to be a bit more tricky than I anticipated, but after I finally wrapped my head around the basics, the solution ended up being quite elegant.

The first step is to get a hold of all the transactions related to the entity in question. This is done with a query on (datomic.api/history db), a special version of the database containing the full history of assertions and retractions.

(:require ['datomic.api :as 'd])

(d/q
 '[:find ?e ?a ?v ?tx ?added
   :in $ ?e
   :where
   [?e ?a ?v ?tx ?added]]
  (datomic.api/history db)
  123123)

This is the generic query to find them all. This query returns the datoms in the history for the specified entity only.

My actual query was a bit different though. I had the entity ID, and needed to find all datoms that touched a referred attribute for that entity. So simply finding all transactions for my entity would not suffice. Here's a short summary of my schema:

;; Attendants

 {:db/id #db/id[:db.part/db]
  :db/ident :attendant/public-id
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :attendant/attributes
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/many
  :db.install/_attribute :db.part/db}

 ;; Attendant attributes

 {:db/id #db/id[:db.part/db]
  :db/ident :attendant-attr/value
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :attendant-attr/schema-attr
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/one
  :db.install/_attribute :db.part/db}

The entity I'm tracking is the attendant, but the attributes of an attendant isn't stored directly on the attendant entity. The attributes are a reference, in order to join the attribute values to the schema-attr for the event the attendant belongs to. The user of the system is able to specify, per event, the attributes that are available to an attendant, so this join maintains data integrity.

Here's the actual query we run:

(d/q
 '[:find ?tx ?attr ?added
   :in $ ?attendant
   :where
   [?attendant :attendant/attributes ?attr]
   [?attr _ _ ?tx ?added]]
  (d/history db)
  123123)

This gets all the transactions that touches the attribute :attendant/attributes for the specified attendant.

And this is where the fun begins.

The next step is to take the tuples of entity IDs we get from the query, and turn it into a full changeset of entities we can use to display the list of changes to the user. Essentially, we want to return a list of maps, where each map contains the timestamp of the change, and the actual list of changes. We also need to handle all the cases: an attribute being created for the first time, and attribute changing from something to something else, and an attribute being removed entirely. As a word of caution, this means we have to be careful when we create the actual transactions for our attendant, and only create transactions for the attributes that actually change, and make sure we retract facts when fields are blanked out. What we're about to do wouldn't be possible if the blanking out of a field simply blanked out the attribute in the database. We need to have a clean transaction history.

We extract the entities for the changeset with a map over the result of the query above.

(map
 (fn [[tx attr added]]
   (let [tx-db (d/as-of db tx)]
     {:tx (d/entity tx-db tx)
      :attr (if added
              (d/entity tx-db attr)
              (d/entity (d/as-of db (dec (d/tx-t tx))) attr))
      :added added}))
 result-of-transaction-query)

It's important to make sure you use the database as of the transaction inside the map. If you just use db as of now (specifically, the same db as you used when querying for all the transactions just before), you'll get the entities as of now, which you don't want. So in the map, we make sure to always use (db/as-of db tx) when we retrieve entities, so we get the entity as of that transaction. This "small" detail caused me lots of pain and trouble when implementing this feature, so make sure you get it right.

I haven't explained what added actually means. In the history db we queried for transactions, ?added is a boolean that tells us whether the datom was an assertion or a retraction.

If the datom was an assertion to the database, the entity is just the entity as of the transaction.

If the datom was a retraction, the entity as of the transaction will (perhaps surprisingly) not have any attributes other than the entity id. When we show the details of the change to the user, we want to show what the old values for the entity were, so we need to know what the entity looked like prior to deletion. Thankfully, Datomic doesn't delete any old data, even when we retract entities from the database. So to get the values, we simply get the entity as of the transaction before the transaction where the entity was retracted. d/tx-t is used to turn our transaction entity id into a timestamp, and dec decrements it by one. So our call to d/as-of gets the instant just before the transaction, leaving us with the database as it was the moment before our transaction occurred.

Note: I'm not entirely sure how safe this is; it's probably better to somehow get the transaction entity ID from before. For all I know, there was a large amount of writes going on at the time of our transaction, so the dec of the transaction instant might be inaccurate.

Phew! We now have a list of {:tx tx-entity :attr attr-entity :added true/false}. I'll leave it as an exercise to the reader to morph this list into something you can show to your users. Essentially, you need to group by :tx, sort by the transactions (:db/txInstant tx) (all transactions automatically gets a timestamp associated with them by Datomic), and traverse the data to figure out which combination of added you have in order to determine whether the transaction added, updated or removed the value.

We're really happy with Datomic so far. We haven't hit production yet, so I'm sure this query will be tweaked even further (we already know we want to add more stuff, such as tracking the user that made the change). But the fact that this feature is just a query away is very convenient indeed.


Questions or comments?

Feel free to contact me on Twitter, @augustl, or e-mail me at august@augustl.com.