Coding a transactional database using Clojure’s ‘reader literals’.

I want to show how you can create a simple persistent database by utilising reader literals in Clojure 1.4. You can copy these lines into a Clojure program if you like – the only dependency you require is clojure.java.io.

Let’s start by defining a transaction as an update to some state using a protocol :-

(defprotocol Transaction (update [_ state]))

Here’s some example transactions we want to perform. For this article let’s imagine we’re creating a database of user accounts. We’ll just create two transaction types- creating users and deleting them. Let’s record the name and email address, plus the date they joined or left.

(defrecord CreateUser [name email start-date]
  Transaction
  (update [this db]
    (update-in db [:users] conj
      {:name name :email email :start-date start-date})))

(defrecord DeleteUser [email remove-date]
  Transaction
  (update [this db]
    (update-in db [:users]
      (fn [coll] (remove (fn [r] (= (:email r) email)) coll)))))

Data is important. We’ll want to keep records of these transactions.

(defprotocol TransactionLog (record [_ tx]))

Here’s an implementation that uses an agent to write the records to a print-stream.

(defrecord DefaultTransactionLog [ag]
  TransactionLog
  (record [this tx]
    (send-off ag
      (fn [out]
        (binding [*out* out *flush-on-newline* true]
          (io! (prn tx)) out)))))

The agent will hold the print-stream. Agents are good for I/O because messages to them only get delivered if the transaction completes its update succesfully. If a retry is needed the message doesn’t get delivered. We let the Clojure prn function figure out how to serialize it. It’s optional, but I’ve wrapped the actual print statement in an io! wrapper as a safety check.

Having written the records we’ll need some way of reading them back. We’ll just use the standard Clojure reader for this. This function returns a list of all the transactions in a file.

(defn read-transactions [f]
  (if-not (.exists f) []
    (let [rdr (java.io.PushbackReader. (clojure.java.io/reader f))]
      (take-while identity
        (repeatedly 
          (fn [] (read rdr false nil)))))))

Now we move on to the database state. Let’s decorate the transaction log with something that will hold this state. As transactions are added to it they will update the in-memory state and be recorded on disk. This can support the TransactionLog protocol as well. The state will be a Clojure ref. The delegate will be a backing transaction log.

(defrecord Database [state delegate]
  TransactionLog
  (record [this tx]
    (dosync
     (alter state (partial update tx))
     (record delegate tx))))

Now we’re ready to create our database with the Clojure ref and backing transaction log. We’ll initialize the ref with a result of updating an empty map with a series of transactions read from our file.

(def db 
   (Database.
     (ref (reduce (fn [db tx] (update tx db)) {}
                    (read-transactions
                      (clojure.java.io/file "my.db.clj"))))
     (DefaultTransactionLog. 
       (agent (clojure.java.io/writer
                (clojure.java.io/file "my.db.clj")
                :append true)))))

Remember those transactions we coded earlier? Let’s add some convenience functions that will create and apply these transactions :-

(defn create-user [name email]
    (record db (CreateUser. name email (java.util.Date.))))

(defn delete-user [email]
    (record db (DeleteUser. email (java.util.Date.))))

Notice how we’re creating java.util.Date instances.

Now, let’s test it.

(create-user "Bob" "bob@example.org")
(create-user "Alice" "alice@example.org")
(create-user "Carol" "carol@example.org")
(delete-user "alice@example.org")

What’s the state of the database?

(deref (:state db))    

You should see something like this :-

{:users 
  ({:name "Carol",
    :email "carol@example.org",
    :start-date #inst "2012-04-19T16:00:35.903-00:00"}
   {:name "Bob",
    :email "bob@example.org",
    :start-date #inst "2012-04-19T16:00:35.901-00:00"})}

What’s in our database file? Below is the raw content. Unlike some database files it’s quite easy to read and even possible to edit by hand if the need arises.

#blog.CreateUser{:name "Bob", :email 
"bob@example.org", :start-date #inst 
"2012-04-19T16:00:35.901-00:00"}
#blog.CreateUser{:name "Alice", :email 
"alice@example.org", :start-date #inst 
"2012-04-19T16:00:35.903-00:00"}
#blog.CreateUser{:name "Carol", :email 
"carol@example.org", :start-date #inst 
"2012-04-19T16:00:35.903-00:00"}
#blog.DeleteUser{:email "alice@example.org", 
:remove-date #inst 
"2012-04-19T16:00:35.904-00:00"}

Note, unless you aren’t using Clojure 1.4 or above you won’t see the #inst tags!

Notice how Clojure is writing out and reading back the java.util.Date instance. You can also register your own Java types to it through the *data-readers* dynamic binding (more details in the Clojure 1.4 release notes). This makes things really easy to create some fairly complex atomic transactions.

Now try recompiling – you should find the database is restored. You can also try removing the user. The transaction log will store both the create-user event and the delete-user event, both with timings.

That’s around 30 lines of code to code a transactional database, and a few more lines to code some custom transactions.

You don’t have to limit yourself to a map as your state. At Deutsche Bank we’ve used a similar design but with RDF triple-stores that are queryable in a datalog syntax. A key benefit with using Clojure’s persistent data-structures for simple in-memory datastores is that you don’t have to worry about locks or concurrent updates.

6 thoughts on “Coding a transactional database using Clojure’s ‘reader literals’.

  1. Hello. I don’t understand all of this but it looks awesome. One question though: isn’t the record inside the transaction a side effect that could be replayed?

    • Hi Huey – I don’t quite understand your question but I’ll do my best do answer anyway. The record inside the transaction is just some data for that transaction and a function which applies this data to an existing state. It doesn’t stay around to be replayed though, once it is used (and logged) it is disposed of. It all takes place under Clojure’s STM so you don’t have to worry about co-ordinating transactions.

  2. Huey is saying that it’s unsafe to call functions with side effects inside of dosync because they may be called more than once. http://clojure.org/refs So you shouldn’t do I/O from within a dosync.

    Agents do not have this restriction, so they would be an option in this case.

    • I understand now, thanks Paul (and thanks Huey). Of course you’re both right. I’ve re-jigged the TransactionLog to use an agent as you suggest and added some explanation in the text.

  3. If I’m following along properly, the agent design isn’t actually Durable.

    Normally, a transactional database ensures that the log data has reached
    durable storage before permitting the caller of the transaction to proceed.
    The agent won’t write the log until an arbitrary amount of time after the
    transaction’s been committed.

    If the computer fails after the transaction completes, but before the
    agent writes the log, there’ll be no record of the transaction.

    In the interim, the transaction caller could’ve done anything:
    report the result to an end-user, send it to another process,
    perform another transaction.

    In practice, the gap should be short, and the likelihood of a lost
    log record small — but there’s a hole here.

  4. That’s quite true, thanks for spotting that.

    You could extend the design with an ‘await’ on the agent. There’s a lot more to make the database production-ready. The intention of the blog was to show a realistic example of using reader literals, a new feature which has just arrived in the latest version of Clojure.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>