Elasticsearch is a search server based on Apache Lucene. As a developer it is easy to use, has an expressive query DSL and all is based on JSON serialization. Often though I find myself in a position where I need to adapt queries frequently and non-trivially, say in a demonstration in front of customers or product owners.

The questions are mostly similar: “what if I also filter for X”, “how does the ranking change, when I add a freshness function”, “do I get a better result if I boost document types Y” and so on. While these are easy to add, two things bother me: first, I need to come up with the queries during a presentation, which adds pauses to the meetings. When I’m finished with the query the discussion has evolved. Second, I don’t want to be the enabler. If they can find out what they want without me it also means a faster feedback loop for them. In brainstorming sessions it is easy to focus on arguments and skip the sometimes lengthy query finding pauses. Win win for everyone it seems.

For this I have startet working on meta-DSLs for Elasticsearch projects. The idea is simple: given a base query I want to be able to quickly alter or enhance it using simple functions that are aligned with the mapping and index structure. Given my current addiction towards Clojure, this is what I’ve come up with:

(query
 (freshness "1h")
 (tags ["politics" "sports"])
 (prefer-category {"politics" 2,
                   "sports" 0.5}))

Ok, it’s not yet a graphical interface, but it is a start. And it’s intuitive. After demonstrating this to customers a few times they like it and request more features. Their own feedback loop has shortened considerably. And the best of it is that I am out of the loop and can focus on adding features.

In this example the domain will be news articles. They have a title, tags, a published time and categories. Something like this:

{
    :title "The news"
    :tags ["obama" "kerry" "merkel"]
    :timestamp "2015-07-28T10:00:00Z"
    :category ["politics"]
}

This post is my story of how I implemented this. Publishing this as a library might be an idea but in the end all of this is tied to an exact mapping, index structure and use case. If there is interest though, starting something similar in a library could be interesting, if there is interest.

DSLs in Clojure Link to heading

Creating a Domain Specific Language is pretty straight forward in Clojure assuming you expose Clojure or Lisp syntax to the user. Using the clojure reader and eval parsing a DSL into Clojure code is simple and defining the DSL itself does then only involve implementing the functions.

In the next part I focus on the DSL implementation itself and the functions for manipulating the query. In the last section, parsing and evaluating the DSL into a real Elasticsearch query finishes.

The DSL Link to heading

For the custom DSL I started with a base query structure upon which all other functions build. It has four parts: query, scoring, filtering and aggregations:

(def default-query
 {:query
  {:filtered
   {:query {:function_score {:query {}
                             :functions []}}
    :filter {:bool {:must []
                    :must_not []}}}}
  :aggregations {}})

For all functions I am assuming the query to be the first argument in all functions working with it. This simplifies the code later on as I can use the thread-first macro to chain the individual function call.

Defining a function to add a query and for adding aggregations is straight forward and does not even need a helper function:

(defn- set-query
 "Given a valid ES query `q` add this to the generated query and return the
  new version."
 [query q]
 (assoc-in query [:query :filtered :query :function_score :query] q))

(defn- add-aggregation
 "Add a new aggregation to the query"
 [query agg]
 (assoc-in query [:query :aggregations] agg))

To work with this data structure a few helper methods come in handy when developing the individual DSL functions. The first function helps when manipulating lists in a nested map. Basically each scoring function or filter needs to be added like this:

(defn- append-in-nested-list
 "Given a map, append a new element to a nested  list."
 [q ks elm]
 (assoc-in q                        ; the query
           ks                       ; the list of keys in the query
           (apply conj              ; append
                  (get-in q ks)     ; to the list
                  elm)))            ; the new element

With this basic function adding more expressive functions to manipulate the specific parts of the query are easy:

(defn- add-function-score-function
 "Add a function score function to the query"
 [query fs-function]
 (append-in-nested-list query
                        [:query :filtered :query :function_score :functions]
                        [fs-function]))

(defn- add-must-filter
 "Add a must filter to the query"
 [query must-filter]
 (append-in-nested-list query
                        [:query :filtered :filter :bool :must]
                        [must-filter]))

(defn- add-must-not-filter
 "Add a must filter to the query"
 [query must-filter]
 (append-in-nested-list query
                        [:query :filtered :filter :bool :must_not]
                        [must-filter]))

DSL functions Link to heading

The individual functions now basically compose the DSL. Being able to add (q "merkel") is translated into the following Clojure function:

(defn q
  "Simple query"
  [query user-query]
  (set-query query
             {:query_string {:query user-query
                             :default_operator "AND"}}))

Filtering for tags in our dataset ((tags ["merkel"])) is equally trivial:

(defn tags
  "Filter for a list of tags"
  [query tags]
  (add-must-filter query {:terms {:tags tags}}))

Freshness seems more complicated but in the end I can simply add a function score function using an exponential decay. With this the user can even change parameters and see the effects:

(defn freshness
  "Add freshness preferences to the query. When called with query and hours as
   arguments"
  [query hours]
  (add-function-score-function query
                               {:exp {:publishTime {:decay 0.9
                                                    :scale hours}}}))

Prefering categories over other categories is another function score function. Basically I add a boost (weight) to all documents matching a certain query:

(defn prefer-category
  "Prefer categories over all other categories."
  [query category-preferences]
  (let [nested-keys [:query :filtered :query :function_score :functions]
        functions (map (fn[x] {:filter {:term {:category (first x)}}
                               :weight (second x)})
                       (seq category-preferences))
        existing (get-in query nested-keys)]
    (assoc-in query nested-keys (apply conj existing functions))))

Aggregations help in understanding the data there is. Classical example in this case would be getting the number of documents in the result set in a category. In ES this is a simple terms aggregation:

(defn aggregate-categories
 "Aggregate the result by categories."
 [query]
 (add-aggregation query {:terms {:field :category}}))

To tie everything up I need to be able to wrap all functions into one expression. For this I create a new macro called query:

(defmacro query [& body]
`(-> default-query
     ~@body))

Using this macro a query can now be defined like this:

(def simple-query (query
                   (q "test"))

Parsing the DSL Link to heading

Doing this is Clojure is nice and easy for me but then again I want the PO not to contact me about getting into the repl. So in the final step I need a function that converts a string to Clojure code. First I need to parse the string using read-string and then I can eval the resulting code. For this to work as expected I need to set the special var *ns* to the namespace of my DSL functions above using the the-ns function.

(ns demo.dsl)

(defn dsl
 "Compile the DSL string into code"
 [dsl-string]
 (binding [*ns* (the-ns 'demo.dsl-functions)
           *read-eval* false]
  (eval (read-string dsl-string))))

The binding form binds the special var *ns* to the namespace containing my dsl functions. I also bind *read-eval* to false and by this disable the eval function inside the string. The parsed string will have access to all functions declared in there. read-string converts a string into Clojure code and eval will execute it. In this case it will simply return the final Elasticsearch query.

Result Link to heading

In essence this allows me to have a web frontend where a user can input the query from the beginning

(query
 (freshness "1h")
 (tags ["politics" "sports"])
 (prefer-category {"politics" 2,
                   "sports" 0.5}))

get back the equivalent Elasticsearch query:

{:query
 {:filtered
  {:query
   {:function_score
    {:query {},
     :functions
     [{:exp {:publishTime {:decay 0.9, :scale "1h"}}}
      {:filter {:term {:category "politics"}}, :weight 2}
      {:filter {:term {:category "sports"}}, :weight 0.5}]}},
   :filter
   {:bool
    {:must [{:terms {:tags ["politics" "sports"]}}], :must_not []}}}},
 :aggregations {}}

which I can execute in the backend and display the results. With all the domain functions in place I can then keep on improving the DSL or the frontend and enable the PO to experiment at lot easier without my direct involvement, at least in parts that I am not really interested in.