Pixelated Noise is a software consultancy and we're always looking for interesting projects to help out with. If you have unmet software development needs, we would be happy to hear from you. |
Recently a colleague was looking for a good way to flatten reitit routes in order to perform some security testing on all the possible URLs of a website using ZAP.
Essentially he needed to go from this kind of data structure:
[["/yo" [["/man" :man] ["/manman" :manman]]] ["/foo" [] [[]] ["/bar" :bar] ["/baz" ["/quux" :quux] [["/qux0" :qux0] [["/qux1" :qux1]]]] ["/ba" ["/zz" ["/bazz" :bazz]] ["/baq" :baq]]]]
To something that looked like this:
{"/yo/man" :man "/yo/manman" :manman "/foo/bar" :bar "/foo/baz/quux" :quux "/foo/baz/qux0" :qux0 "/foo/baz/qux1" :qux1 "/foo/ba/zz/bazz" :bazz "/foo/ba/baq" :baq}
In this article we present a few different approaches that we came up with. We
start off with a couple of "vanilla" Clojure solutions, a recursive one (that
doesn't use recur
so in theory it could blow the stack at great depths), and
an iterative solution that takes an "instruction stream" approach and simulates
its own stack. We then proceed to look into a more straightforward solution
using zippers and finally we combine zippers with the iteration
function that
was introduced in Clojure 1.11.
All functions presented here produce the same output as above so it won't be listed every time. If you'd like to follow along in your editor, you can clone the repo.
So one approach would be to use run-of-the-mill recursive Clojure to achieve this:
(defn flatten-routes-recursive [[path-part & [sec :as rst] :as all]] (cond (keyword? sec) ; form 1 {path-part sec} (string? path-part) ; form 2 (update-keys (reduce merge (map flatten-routes-recursive rst)) (partial str path-part)) :else ; form 3 (reduce merge (map flatten-routes-recursive all))))
It's not a super-easy function to read, so let's break it down a bit. By looking at the input, we observe a grammar describing a tree of nodes, and each node can be have one of 3 different forms:
[<string> <keyword>]
this is the base case where the recursion "bottoms
out", you can't go any deeper. Example: ["/qux1" :qux1]
[<string> <node-0> <node-1> ... <node-n>]
where the string is a prefix to
all the nested routes represented as nodes to the right of the
prefix. Example: ["/zz" ["/bazz" :bazz]]
[<node-0> <node-1> ... <node-n>]
sibling routes. Example: [["/man" :man]
["/manman" :manman]]
So let's destructure everything we might need: [path-part & [sec :as rst] :as all]
path-part
sec
rst
all
nil
value
The basic building block we are going to use is a single-entry map, which is the
first clause in the cond
and it's also where the recursion bottoms out. These
will then be combined together with reduce merge
in the other clauses. The
first clause matches inputs such as ["/bar" :bar]
by testing that sec
is a
keyword (form 1). Using single-entry maps is preferrable to the - possibly more
idiomatic - approach of using 2-element vectors that are then combined with
into {}
because we prefer to have the same type being returned by all
invocations of the recursive function as it makes things more uniform. This also
ensures that the function will work correctly in the case of the routes being
just ["/foo" :bar]
(which is valid in reitit): (flatten-routes-recursive
["/foo" :bar])
will return {"/foo" :bar}
.
Getting to the second clause, we've established sec
is not a keyword, but
path-part
is a string, so we are in form 2 territory. The rst
part of our
desctructuring is made of nested routes which should be flattened and merged
together ((reduce merge (map flatten-routes-recursive rst))
) and then the keys
of the resulting map (which are strings representing route prefixes) should be
further prefixed using path-part
((update-keys ... (partial str path-part))
).
In the third clause we're dealing with form 3 (sibling routes) we just flatten
each element of all
((map flatten-routes-recursive all)
) and we just merge
everything together with reduce merge
.
update-keys
is a function that appeared in Clojure 1.11, so for previous
versions the code instead becomes:
(defn flatten-routes-pre-1:11 [[path-part & [sec :as rst] :as all]] (cond (keyword? sec) {path-part sec} (string? path-part) (->> (reduce merge (map flatten-routes-pre-1:11 rst)) (reduce-kv (fn [m k v] (assoc m (str path-part k) v)) {})) :else (reduce merge (map flatten-routes-pre-1:11 all))))
But, recursive functions build up call stack frames. This is not so terrible, as allocation is normally much faster on the stack than in the heap and in this case the stack grows by O(path-depth), which shouldn’t be large.
Just for fun though, let’s pursue this a bit further. To turn a recursive process to an iterative one, we’ll have to introduce auxiliary variables to hold the state of the computation, effectively introducing our own "stack":
(defn flatten-routes-iter ([x] (flatten-routes-iter {} [] x)) ([routes path [fst & rst :as returns]] (cond (empty? returns) routes ;1 (keyword? fst) (recur (assoc routes (str/join path) fst) path rst) ;2 (string? fst) (recur routes (conj path fst) rst) ;3 (fn? fst) (recur routes (fst path) rst) ;4 :else (let [pop-marker (when (string? (first fst)) [pop])] ;5 (recur routes path (concat fst pop-marker rst))))))
The traversal gradually transforms the tree into an "instruction stream", held
in returns
. We process each instruction (each first element of returns
)
as follows:
routes
map according to the
current path
(2)path
(3)clojure.core/pop
which is used to remove the last element of the path
(which is a vector) (4)concat
the contents of the first element of returns
in
front of the rest of its elements. If the first element's first element is a
string, we also place a pop instruction between the contents of the first
element and the rest (5)returns
: we've run out of instructions, return routes
(1)
The concat
in (5) is the main operation that performs the flattening of the
tree. What we're essentially doing is case (5) is that we take the contents of a
vector (fst
) and and break them free out of the vector and add them as the
first elements of returns
. When we encounter a string in the first position of
fst
, according to reitit convention, we have a new prefix and therefore we
have gone down a level into the tree. So apart from adding fst
's elements in
front of returns
, we also add the pop
function as an element after the
children of that particular branch so that we know where to pop
the path
later.
Let's walk through this with smaller input to make it a bit more manageable:
(def the-routes ["/yo" ["/hey" :hh]]) (flatten-routes-iter the-routes)
Let's print the values right after destructuring, along with the matching case
in cond
:
routes {} path [] returns ["/yo" ["/hey" :hh]] case 3: Initial call with empty auxiliary variables. We encounter a string, add it to path and recur. ------------------------------------------ routes {} path ["/yo"] returns (["/hey" :hh]) case 5: The first element is a vector with a string as its first element. We "flatten" the first element into returns and add a pop marker after the elements (rst is empty). ------------------------------------------ routes {} path ["/yo"] returns ("/hey" :hh #function[clojure.core/pop]) case 3: Add string to path and recur. ------------------------------------------ routes {} path ["/yo" "/hey"] returns (:hh #function[clojure.core/pop]) case 2: Found a route name. assoc path (as a joined string) to the routes, with the route name as the value. ------------------------------------------ routes {"/yo/hey" :hh} path ["/yo" "/hey"] returns (#function[clojure.core/pop]) case 4: Pop marker found. pop the path and recur. ------------------------------------------ routes {"/yo/hey" :hh} path ["/yo"] returns nil case 1: Ran out of returns. Stop recursion and return routes. ------------------------------------------
One thing to note about returns
is that in essence, it mostly functions as a
pointer stack, therefore it's not as wasteful as it may seem. In our terms
(Clojure doesn't really support pointers), it shouldn't be difficult to
imagine an equivalent implementation where :returns
contains get-in
paths
(like [0 2 1]
) into tree nodes.
So, we have an explicit stack. We also have something like an environment, as the path determines the scope of path chunks, much like bindings do for symbols. Cue the meme:
You can insist on a functional approach, but sometimes sprinkling Clojure with a tiny bit of imperative thinking makes things much simpler. But with zippers you can still maintain functional purity while providing you with a paradigm that feels somehow imperative - so you can have your cake and eat it.
Let's have a look of how we would approach our problem with zippers:
(require '[clojure.zip :as z] '[clojure.string :as str]) (defn flatten-routes-zipper [routes] (loop [res {} z (z/vector-zip routes)] (cond (z/end? z) res (keyword? (z/node z)) (recur (assoc res (->> z z/path (map first) (filter string?) str/join) (z/node z)) (z/next z)) :else (recur res (z/next z)))))
We define a standard vector zipper that will allow us to traverse the nested
vector structure. The first clause just returns our accumulator res
. The last
clause advances the zipper to the next position using z/next
and recurs.
The middle clause matches when the value pointed to by the zipper ((z/node z)
)
is a keyword and this is where all the work happens. In this case, we assoc the
value pointed to by the zipper into res
. In order to calculate the path of
string prefixes down to the current position we do the following:
(->> z z/path (map first) (filter string?) str/join)
z/path
returns all the nodes down to the current position, and then we take
the first element of each. Because empty vectors and grouping of routes
([["/man" :man] ["/manman" :manman]]
) are allowed in reitit, we then need to
filter for strings in the path. Finally we concatenate the strings together with
str/join
.
The loop/recur
in the previous example is imperative and its use can sometimes
be a code smell, especially among Clojure beginners who seek to reproduce their
beloved for-loops in an unfamiliar functional landscape. But unfortunately, with
zippers, loop/recur
often becomes necessary. The main problem with
loop/recur
is the micromanagement of "state" that has to be threaded through
each call.
Clojure 1.11 includes the iteration
function that turns out to fit well with
zippers. Borrowing some phrasing from the official JIRA ticket, the iteration
function was designed to "represent iteration e.g. HTTP endpoints that return
continuation tokens". So iteration
helps you cosume multiple pages of data
from an HTTP endpoint when each page is accessible via a continuation token
returned as part of the previous call. But what is a zipper if not a token that
contains information on how to get the next element?
So let's use iteration
to declutter the "state" out of our previous
loop/recur
solution:
(defn flatten-routes-iteration [routes] (->> (iteration identity :initk (z/vector-zip routes) :kf z/next :somef (complement z/end?) :vf #(when (-> % z/node keyword?) [(->> % z/path (map first) (filter string?) str/join) (z/node %)])) (into {})))
step
) which defines how to produce the next value based
on the continuation token has to be identity
because we would like to keep a
reference to the zipper during the whole iteration.:initk
initializes the zipper.:kf
uses z/next
to move to the next location.:somef
decides whether there are more values to be consumed, so it's the
opposite of z/end?
.:vf
produces the actual value based on the zipper, very similarly to how
it's done in flatten-routes-zipper
. The main difference is that in this case
we produce a 2-element vector (or nil
) which is then collected into a map
with into {}
.This version has the advantage of making the various parts of the iteration very explicit and you could easily imagine abstracting out of it a reusable utility function for zipper iteration:
(defn zipper-iteration [zipper vf] (iteration identity :initk zipper :kf z/next :somef (complement z/end?) :vf vf)) (defn flatten-routes-generic [routes] (into {} (zipper-iteration (z/vector-zip routes) #(when (-> % z/node keyword?) [(->> % z/path (map first) (filter string?) str/join) (z/node %)]))))
Clojure's evolution (both in the core and in community libraries) has provided
us with many opportunities of delightful composability which is so rare to find
in other ecosystems. The combinations of zippers with the iteration
function
is another example of that power that comes from composability that often goes
beyond the original intentions of the authors of the libraries.
Once the essential grammar of the initial input has been worked out, the initial recursive solution is the most "economical", in the sense that it uses core Clojure functions to get to the essence of what we are trying to achieve. The iterative solution borrows a leaf out of Forth's book, essentially being a tiny virtual machine that walks the input tree (program) while maintaining one stack for the path components (environment) and one for return addresses (control flow). Finally zippers possibly produce the most comprehensible code, but that also depends on who's doing the reading!
The iterative one is quite an involved solution, but notice that it shares an
interesting similarity with zippers: in the case of zippers, we also have a
complex data structure that allows us to navigate the input, albeit in all
directions, while our iterative solution allows us to move only forwards. The
similarity extends to having to attach functions to the navigation data
structure: we saw the use of the pop
function as a marker in our iterative
function, and zippers attach functions to the navigation structure via metadata.
Just a quick note on workflow: during the development of the code for the functions presented here, we kept all the functions in the same namespace, and at the bottom of the file we had this expression:
(assert (= (flatten-routes-recursive the-routes) (flatten-routes-pre-1:11 the-routes) (flatten-routes-iter the-routes) (flatten-routes-zipper the-routes) (flatten-routes-iteration the-routes) (flatten-routes-generic the-routes)))
In this way we were able to get quick feedback about the equivalence of all the implementations (at least for the specific input) every time we reloaded the namespace. Of course there are fancier ways to achieve the same with unit test runners and such, but for a small example project we thought it was a neat lightweight workflow trick.