Reflections on a real-world Clojure application (take 2)

Last night I gave a talk at the London Clojure Users Group (LCUG) about a ‘real-world’ (16K lines-of-code) application we built in less than a year with Clojure at Deutsche Bank. I really enjoyed the event and thanks to SkillsMatter who were fantastic hosts.

There were a lot of questions during the Q&A at the end which I did my best to answer at the time. Now I’ve had some more thinking time I’d like to add a few extra comments.

If you couldn’t attend the talk you can catch it here.

Below is the original presentation in blog form (thank you Markdown!). My extra comments can be found in the epilogue – feel free to ask further questions in the comments area.

Reflections on a real-world Clojure application-

Background

  • Java background, especially early J2EE circa 1999-2002
  • Test Driven Development – ran 20 courses
  • Mastering TDD helped me to write Java using values rather than objects
  • Began to write Java in a more functional way – but much more verbose!!
  • Started using Clojure at work for user web interfaces in November 2009
  • Began to attend Clojure Dojos in London
  • February 2011 – Clojure used extensively on a new application, now 16K LOCs!

The ‘main’ function

Developer bootstrap

For developers

$ mvn dependency:copy-dependencies
$ ./run

which does this :-

#!/bin/sh
echo "Starting Fandango run script..."

export PATH=$PATH:target/bin

# Set debug to nil to disable JVM debugging.
classpath='src/main/clojure:target/dependency/*'
main=src/main/clojure/com/db/mis4/fandango/main.clj

java -cp ${classpath} clojure.main ${main}

Then slime in with Emacs!

(Let’s look at configuration in more detail)

Configuration

Requirements of a configuration system

  • Flexibility – we should be able to add configuration where we need it
  • Distributed ownership – we shouldn’t have to know the live passwords
  • Source agnostic – we’d like to be able to use local files and centralised storage.

Candidates?

  • Java properties files
  • JSON/YAML
  • XML – tree based, schemas enforces structure rather than value
  • Databases – records for configuration are too diverse
  • RDF – graph based, queryable

Clojure as configuration?

“Protocols and file formats that are Turing-complete input languages are the worst offenders, because for them, recognizing valid or expected inputs is UNDECIDABLE: no amount of programming or testing will get it right… A Turing-complete input language destroys security for generations of users. Avoid Turing-complete input languages! ” — Corey Doctorow

So…

Be careful if you choose Clojure as your configuration format!!

‘Open Data’

All our data (application & environment configuration, report definitions, user details & entitlements, etc.) are stored as RDF statements

  • The cat sat on the mat
    • Subject: the cat
    • Predicate (also known as property): sat on
    • Object: the mat
  • Relations are at an individual level rather than at a set (ie. table) level.

  • More intro to RDF here:
    • http://www.bbc.co.uk/blogs/radiolabs/s5/linked-data/s5.html
    • http://linkeddatabook.com

Our configuration system

  • RDF files (mostly Turtle format)
  • SPARQL queries
  • Uses a dynamic var: (with-config ...)
  • Delays to avoid unnecessary queries

Example

create-assocations :-

(defn create-associations [model]
  {::directories
   (delay
    (sparql/select1-map
     model
     [:proc cmdb/host :host]
     [:proc cmdb/install-dir (as-uri (format "file://%s" (or (System/getenv "FANDANGO_INSTALL_DIR")
                                                             (System/getProperty "user.dir"))))]
     [:host a cmdb/Host]
     [:host cmdb/hostname (get-hostname)]
     [:proc cmdb/userid (System/getProperty "user.name")]
     [:proc ["http://mis4.gto.intranet.db.com/fandango/" "dataDirectory"] :data-dir]
     [:proc ["http://mis4.gto.intranet.db.com/fandango/" "logDirectory"] :log-dir]
     [:proc ["http://mis4.gto.intranet.db.com/fandango/" "workDirectory"] :work-dir]
     [:proc ["http://mis4.gto.intranet.db.com/fandango/" "pidDirectory"] :pid-dir]
     [:optional [:proc cmdb/source-dir :source-dir]]

Security

Entitlements

All users are given FOAF ‘profiles’, with added VCARD and other statements.

Given these prefixes

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

This statement (in the configuration) gives all users a ‘Guest’ role.

foaf:Person rdfs:subClassOf <Guest> .

N-triples

Statements are then added to create users, request roles, approve or reject roles

Creating a user

<events/5afcf604-16c0-4cab-a6d1-656ed3f3420c> <time> "2011-12-25T12:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<events/5afcf604-16c0-4cab-a6d1-656ed3f3420c> rdfs:type <CreateUser> .
<events/5afcf604-16c0-4cab-a6d1-656ed3f3420c> <eventfor> <users/malcolm.sparks%40db.com> .

Request a role

<events/b5bed531-a324-4aec-9ace-2785c65a19b7> <time> "2011-12-25T14:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<events/b5bed531-a324-4aec-9ace-2785c65a19b7> rdfs:type <RequestRole> .
<events/b5bed531-a324-4aec-9ace-2785c65a19b7> <role> <Administrator> .
<events/b5bed531-a324-4aec-9ace-2785c65a19b7> <eventfor> <users/malcolm.sparks%40db.com> .

Language integrated query

Data can be queried directly from Clojure

(defn get-approved-roles-for-user [user]
  (sparql/select-map
   [(get-combined-model) (config/get-config-model)]
   [:approval a events-ns/RoleApproved]
   [:approval events-ns/time :approval-time]
   [:approval roles/approver :approver]
   [:approver foaf/name :approver-name]
   [:optional [:approver foaf/homepage :approver-homepage]]
   [:approval roles/cause :request]
   [:request roles/requester user]
   [:request events-ns/time :request-time]
   [:request roles/role :role]
   [:role rdfs/label :role-name]))

Deployment

Releasing to production

$ git clone http://github....db.com/.../fandango.git
$ git verify-tag 4.5.0
$ git checkout tags/4.5.0
$ make release

Derive the version from git!

GNU Make incantation …

describe := $(subst -, ,$(shell git describe --tags --long HEAD))
version := $(word 1,$(subst -, ,$(describe)))
release := $(shell expr 1 + $(word 2,$(describe)))

And generate the pom.xml – ie. in Make :-

pom.xml:    pom.template.xml
            cat $< | sed -e "s/@VERSION@/$(version)/g" >$@

mvn dependency:copy-dependencies
cp -r src/ dest/

We use RPM but the principle of copying the source and dependency jars over is the same.

Installation

Installation is easy

$ rpm --dbpath /opt/privatedb -Uvh fandango-4.5.0-1-x86_64.rpm

Production bootstrap

$ fandango start

A lot more complex than the developer bootstrap.

  • Init script (from Java Service Wrapper – enhanced with roqet to read environment variables from configuration)
  • Init script generates the wrapper.conf, then calls Java Service Wrapper native executable
  • Native binary spawns JVM with 2 args clojure.main boot.clj
  • boot.clj sets up a classloader which pulls in the dependency jars
  • boot.clj hands off to main.clj, rest is as the developer bootstrap.

But source code is still copied onto the system as is.

Logging

Getting started

Logging is important because it’s what everyone expects to find.

These will get you started :-

(clojure.core/println)
(clojure.pprint/pprint)

However, as your application grows you will eventually need a more sophisticated logging system. We use Log4J and configure it with clj-logging-config.

You’ll need the following packages to do this :-”

(use 'clojure.tools.logging)
(use 'clj-logging-config/log4j)

(with-logging-config)

(with-logging-config 
  [:root {:level :debug 
          :out (io/file workdir "job.log")}]
  ...

(with-logging-context)

For using the NDC and MDC of Log4J.

(with-logging-config
  [:root {:pattern "%d [%p] (for Customer %X{customer}) %m%n"}]
   ...

   (with-logging-context {"customer" "John Smith"}
     ...

Reflections

The Good

  • Retain the JVM
  • No class files, yippee!
  • Sliming in! EDD: Eval Driven Development!
  • Separation of value, identity, state: State is a timeline of changing values.
  • Learning time – even our DBA is now comfortable with Clojure.

The Bad

  • People are justifiably afraid of new things
  • Tooling (for those not comfortable with Emacs)
  • Java interop can bite you

The Ugly

  • Stack traces
  • Debugging

Quality versus value

“Value is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors.” — John Carmack, http://altdevblogaday.com/2011/12/24/static-code-analysis/

cf. ‘Agile’ absolutes

  • Always write the tests first
  • Tests should always pass
  • Always fix the build before working on new features
  • Integrate continuously
  • Refactor prior to adding new features
  • Consistent code style
Our experience of Git + Clojure is prompting us to question certain assumptions.

More info

http://blog.malcolmsparks.com

Q & A

Over to you…

Epilogue

Many of the questions related to the RDF portion of my presentation. There were a lot of others, I can’t remember all now.

How big is your team and how did it grow?

We started with 2 developers and grew to 4. Forcing Clojure on developers is unwise. I know that was tried somewhere else and most developers only used the Java interop!

Why do you use RDF for configuration rather than XML or JSON or even Clojure itself?

JSON is certainly more conventional as a configuration format (or XML in the Java world)
There isn’t a strong reason not to use Clojure itself (I had a slide warning of the dangers of Turing complete input languages but the point stands nevertheless). I don’t think my answer was very good last night so here are some advantages of RDF :-

  • Meaning – RDF allows you to make logic set-based statements to classes of what are otherwise straight name/values pairs.
  • Metadata – RDF allows you to make statements about statements. You can use metadata to label configuration values, add annotations (in multiple languages if you like), or constrain the values to some valid range or set, or say something about the nature of the property. You can do this in a very limited way with XML (perhaps with attributes) but with JSON there’s nothing built-in or idiomatic.
  • Mergeability – RDF allows you to source statements from a wide variety of sources and merge the models together, whereas there’s nothing built-in or idiomatic in XML or JSON. In tree formats config statements have to group inside each other in a single hierarchy – designing this hierarchy is a job in itself. Graphs are more flexible since nodes can exist in multiple hierarchies if needs be.
  • Inference – in RDF, having some data allows you to infer other data which you would otherwise have to make explicitly. This has the potential to reduce data discrepancies. For example, given a database name, listener host and port you can ‘infer’ a database connection string.

That said, I’m not really pushing RDF as a config format. We took a gamble on it and it paid off in our case. Other projects are different. JSON is a great format that enables fast and simple data exchange (when you control both ends).

I also suggested that a domain model is more valuable for persistent data than for transient data structures. Object oriented languages encourage you to design the domain model internal to a program. But in my view there is more value in a domain model you can communicate between systems, and keep for longer periods, than in a domain model that you can only use privately (ie. in a single memory address space) and only while your application is running. This is the exact opposite of designing domain models in Java/C# classes and serializing out to a database or JSON/XML files, hence the need to illustrate with a real-world example (in this case, configuration).

What other Clojure frameworks do you use?

  • Compojure/Ring/Hiccup/Clout for web pages.
  • Plugboard for REST but the intention is to move towards something like compojure-rest
  • Swank – couldn’t manage without it!

It’s a surprise to me how much we manage to do with just the standard Clojure libraries.

Do you think functionality rises linearly or exponentially with lines of code?

I thought this was a great question because it points to the huge amount of algorithmic re-use that we enjoy in Clojure.

Did you have a specific business problem that led you to Clojure?

Honestly, no. In my case it was a growing frustration with large Java systems. But since we’ve been using Clojure in our team there have been a number of business problems that have cropped up that are ideally suited to Clojure. Certainly in my industry (banking) the business is built on mathematical functions and data transformations for which functional languages like Clojure are ideal.

Do you think Clojure be around in 5 years time?

This final question was asked by someone sitting in the front row. I don’t think they would have asked this if they’d seen how many people were in the room! Clojure is building momentum, at least in London, and as I said in my talk I think it’s beyond the point of critical mass now.

But on reflection I think it’s an important question. Why should anyone invest a lot of time in learning something that isn’t going to be around in a few years? However, technology is always about betting on certain horses (VHS or Betamax?) and you can never be 100% certain. LISP is a good bet though, it’s survived over 50 years and people keep rediscovering it. So even if Clojure doesn’t survive, I’m confident the knowledge you get from learning it will remain relevant.