Pixelated Noise is a software consultancy and we're always looking for interesting projects to help out with. If you have unmet software development needs, we would be happy to hear from you. |
This post is a technical tour of Hike, a new library for operating on
spreadsheet grids and identifying cells in them. To this end, hike
offers
an open set of operations, supports undoing and redoing them, and provides a
durable way to identify cells even after the grid is modified via row/column
additions or removals. It can answer queries like "what's the ID of the cell
here?", "where's the cell with this ID?" and "which cells lie between those
with this and that ID?". Moreover, hike
can work with any ID
representation, possibly required by other application components. If you're
interested in its internals and how they came to be, read on!
Hike
started out as part of a spreadsheet application of ours, which is
still under development. Hike
is just a component of the spreadsheet
application. The core feature of spreadsheets is that they let users use
formulas to compute a cell's value based on values of other cells. Beside
individual cells, users can also use slices (a range of neighboring cells)
as inputs, which behave intuitively when applying grid operations, such as
row or column insertions, removals, and transpositions. By intuitively, we
mean that cells preserve their values and interdependencies as these
operations are applied. This implies that values follow cells as we move
them around, references to the moved cells are appropriately updated and
every slice contains exactly the cells between its start and end, wherever
these end up after a change. For example:
We could do all this, if we could refer to each cell via an immutable ID
instead of its coordinates, which change as operations are
applied. Additionally, we would need to translate coordinates to IDs (e.g.,
when storing user-inserted values) and vice versa (e.g., when rendering
automatically recomputed values). Enter hike
.
To ease into requirements and the technical solution later we'll simplify the setting. Since operations such as inserting or removing rows/columns affect a single dimension of the grid, we'll reduce the grid to a single dimension (row) for the time being (we'll extend our solution to more dimensions later, of course). Still, the user should be able to not only insert/modify data in cells, but also define computational dependencies among them (both individual and slices), such that modification of data triggers appropriate updates in dependent cells. Furthermore, the user can also insert, remove and transpose cells, as well as undo/redo such operations.
The basic requirements we need to cover are as follows:
nil
in such case.Dynamic slice contents. Cell slices in spreadsheets are used for referring to a range of cells collectively when defining the derived value of another cell. A slice contains all cells from start up to the end (inclusive). If the end cell lies before the start cell, the slice is considered empty. The following examples show the effects of the supported operations, where numbers inside cells indicate their position. Green and red represent inserted and to-be-removed cells, respectively, while magenta outlines the transformation of existing slices before and after the operation. Other colors, such as cyan and orange show where the original values of the spreadsheet end up after the operation.
Insertion The inserted cells are placed between index
(inclusive)
and index
+ count
(exclusive). Therefore, cells with position
larger than or equal to index
are moved count
positions to the
right. Also, if the insertion point lies inside a slice, this slice
will contain the inserted cells.
{:action :insert :index 5 :count 3} |
slice is pushed to the right after cells are inserted before it |
{:action :insert :index 6 :count 3} |
slice grows after cells are inserted inside its range |
{:action :insert :index 7 :count 3} |
slice grows after cells are inserted inside its range |
{:action :insert :index 8 :count 3} |
slice unaffected after cells are inserted after it |
Removal The cells between index
(inclusive) and index
+ count
(exclusive) are removed. Cells with
position larger than or equal to index
+ count
are moved count
position to the left. Removed cells are of course removed from any
slice that contained them. If a slice's start/end is removed, the
immediately next/previous visible cell (respectively) is from then on
considered to be its replacement. In any case, the slice "contracts",
to include only its remaining contents and no surrounding cells, by an
inward bypass of any removed boundaries.
{:action :remove :index 4 :count 3} |
slice is shortened at its start |
{:action :remove :index 6 :count 3} |
slice is shortened in the middle |
{:action :remove :index 8 :count 3} |
slice is shortened at its end |
{:action :remove :index 5 :count 5} |
the whole slice is removed |
Transposition Cells from index
(inclusive) up to index
+ count
(exclusive) are transposed
offset
positions to the left/right for negative/positive offsets,
respectively. This operation is useful for reordering rows/columns to
a different place in the spreadsheet. This operation may reorder a
slice's boundaries and, as mentioned earlier, if the end comes before
the start, the slice is considered empty (see the second example
below). Observe that there are always two cell groups involved, one
moving left and one right, hence there are two equivalent ways to
describe any transposition.
{:action :transpose :index 5 :count 2 :offset 1} (moving right) |
{:action :transpose :index 7 :count 1 :offset -2} (moving left) |
{:action :transpose :index 5 :count 1 :offset 3} (moving right) |
{:action :transpose :index 6 :count 3 :offset -1} (moving left) |
As a remark, transposition is its own inverse, in the sense that a
second transposition (bringing the transposed cells back to their
original positions) can always nullify the effect of the first one and
return us to a grid that is indistinguishable from the one we started
with. For the two examples above, they are {:action :transpose :index 6
:count 2 :offset -1}
and {:action :transpose :index 8 :count 1 :offset
-3}
, respectively. In the same way, removal is the inverse of
insertion, since we can remove any inserted cells (e.g., with {:action
:remove :index 3 :count 4}
after a {:action :insert :index 3 :count
4}
). Be careful though, removal has no inverse operation! We may be
able to insert cells where we removed some, but we have no way of
bringing back their values, other than undoing the removal itself.
When we first faced with these requirements, our initial approach was to assign identity to every cell that needed one. When the user first inserts data (static or a formula) into a cell, we create and store a record with the data, current cell position, and a fresh UUID. Then, for the construction of the graph that describes the interdependencies between cells, we used references that utilised the UUIDs (because they are durable) instead of the cell positions (because they are mutable).
From then on, we update the data and position every time they change due to
an operation. As explained earlier, application events will refer to cells
by both ID (e.g., to render recomputed values) and position (e.g., to
handle updates from the user), we'll therefore need two indexes over it –
one by ID and one by cell position – if we are to access them fast in both
cases. The position index must be updated after every operation, undo, and
redo. Recall that for insertions and removals every cell past the index
is affected, so, as a result, this process will become slow enough to be
user-perceptible as the number of records grows.
Before complacently engaging in arduous bookkeeping of shared mutable state, and in the interest of deriving IDs instead of assigning them explicitly, let's ponder: what's unique about each cell? If we think of cells as particles, the grid as space, and operations as time-defining events, each particle follows a world line in spacetime. These world lines do not intersect: no two particles occupy the same position at the same time. Because of this, we can identify a world line just by specifying one of its points via spatial and temporal coordinates.
Bringing the analogy back to the spreadsheet grid, we'll call the world line a cell path: it's the cell's trajectory through a sequence of grid layouts created by successive operations. A cell path has two endpoints:
The coordinates of any point (including origin and head) is a tuple of
layout index (time) and grid position (space). For example, a cell with
origin [5 7]
was created by the fifth operation (creating the fifth
layout) at position 7, whereas one with [0 8]
was at position 8 in the
original, initial layout (before any operation). Remember we are still
operating in a single dimension, so the grid position is defined using a
single integer.
To take advantage of the origin's uniqueness and immutability for our purposes, we need to implement two bidirectional mappings:
Origin ↔ Head The mapping between the two ends of a cell path is useful because the head's spatial coordinate is the cell's current position. To find one given the other, we need to be able to trace cell paths, back and forth. But how do we go about it? The key insight is that cell paths are defined by operations, which in turn have a highly regular effect on them. For instance, an insertion of i cells at position p:
Such transitions from a layout to the next can be aptly represented by unidirectional transformers, functions which, given the position of a cell before the operation, return its position after it. Unsurprisingly, it's easy to construct such transformers for the opposite direction, too. By composing such transformers appropriately, it is possible to traverse a cell's history over multiple edits in arbitrary ways.
To facilitate the composition of transformers we need to organize them into a suitable data structure. First, we can pair an operation's transformers (two, one for each direction) into a bidirectional transformation, which completely encapsulates the operation's effect on cell paths. Finally, drawing from (deprecated) OpenGL matrix stacks, we can store transformations in a stack, as each one operates on the layout resulting from those below it. Thus, to apply an operation, all we have to do is push its transformation onto the stack, and to undo it, to simply pop it off.2 The resulting transformation stack looks like this:
In summary, we'll identify cells by their origin. Operation history will be represented by a stack of procedural transformations, which describe cell paths between layouts. To find a cell's ID based on its current position, we descend through the transformation stack to trace its path to the origin, which we then encode into an ID. Similarly, to find the current position of a cell by ID, we decode the ID to recover its origin, and then ascend through the transformation stack to follow the path to its head. Let's see how the implementation of this plan plays out!
As demonstrated earlier, operations transform a grid layout to a new one, as dictated by their semantics. In this section, we'll visualize and study the effect of operations (insertions, removals and transpositions) on cell paths. Afterwards, we'll construct unidirectional transformers and build bidirectional transformations out of them.
Let's begin by specifying the data description of operations, informally introduced previously:
(s/def ::position integer?) (s/def ::nat (s/and integer? (complement neg?))) (s/def ::action keyword?) (s/def ::index ::position) (s/def ::count ::nat) (s/def ::offset integer?) (defmulti ^:private operation :action) (defmethod operation :insert [_] (s/keys :req-un [::action ::index ::count])) (defmethod operation :remove [_] (s/keys :req-un [::action ::index ::count])) (defmethod operation :transpose [_] (s/keys :req-un [::action ::index ::count ::offset])) (s/def ::operation (s/multi-spec operation :action))
Now, to get a feel of unidirectional transformers, let's examine cell insertion, considering the insertion of a single cell at position 2. We'll draw lines connecting the same cells in the two layouts (numbers represent positions in an 1D grid):3
Observe that the cell paths split at the insertion point on ascent, to make up space for the inserted cell. On the other hand, the path of the inserted cell faces a dead end on descent towards layout-0, simply because it does not exist in that layout.
Similarly, for the removal of the cell at position 2:
The picture is the symmetrical opposite of the previous one: there is a dead-end on ascent and a split on descent.
Now with this intuition, we can define a few higher-order functions, which accept an operation and return a unidirectional transformer. This transformer is itself a function that accepts the position of a cell in the layout above or below (doesn't matter, operations are symmetrical!) and returns the new (or old) position in a different layout.
Given that index
is the point on which the operation takes place and
count
how many cells are inserted or removed, the implementation of
split
and dead-end
is quite simple:
(defn- split [{:keys [index count]}] (fn [pos & _] (if (< pos index) pos (+ pos count)))) (s/def ::bypass (s/nilable #{:min :max})) (defn- dead-end [{:keys [index count]}] ;; `pos` passes if outside the non-inclusive (`min-pass`, `max-pass`) ;; interval, otherwise returns `nil`. (let [min-pass (dec index) max-pass (+ index count)] ;; overcome the dead-end by following the specified `bypass` direction (fn [pos & [bypass]] (cond (<= pos min-pass) pos (< pos max-pass) (get {:min min-pass :max index} bypass) :else (- pos count)))))
Note that the function returned by dead-end
accepts a second, optional
argument named bypass
, specifying a direction towards which one can
overcome the dead-end (prevent it from returning nil
), and is used to find
the "immediately next/previous visible cell" when determining which cells
belong to a given slice. For signature compatibility, the function returned
by split
also accepts (and ignores) extra arguments.
Finally, consider transposing the cell at position 2 one position to the right:
Here, the paths cross in both directions. It is important to note in this case that, even when moving multiple adjacent cells, their paths are again crossing those of a "complementary" group of cells. For example, if we transpose 4 cells (at 2-5) 3 positions to the right, the next 3 cells (at 6-8) will be transposed 4 positions to the left. In code:
(defn- cross [{:keys [index count offset]}] (s/assert ::nat offset) ;; `pos` passes if outside the non-inclusive (`min-pass`, `max-pass`) ;; interval, `cross-index` marks the start of the complementary cell group. (let [min-pass (dec index) cross-index (+ index count) max-pass (+ cross-index offset)] (fn [pos & _] (cond (or (<= pos min-pass) (<= max-pass pos)) pos (<= index pos (dec cross-index)) (+ pos offset) :else (- pos count)))))
See the (s/assert ::nat offset)
bit? It's actually there to simplify the
implementation by ensuring that the transposition is to the right. This
incurs no loss of generality, due to the aforementioned symmetry between the
two complementary groups of cells crossing paths as a result of a
transposition. This means that we can translate every transpose-left to an
equivalent transpose-right, which we'll take advantage when constructing
transformers in the next paragraph.4
With these unidirectional transformers at our disposal, we can start building the bidirectional transformations that lie between layouts as pictured in the transformation stack diagram. Beside the transformers themselves, a transformation also contains a status flag (allowing us to selectively disable/enable them), as well as an optional data description of the operation. The code serves nicely as a section summary:
(s/def ::ascend fn?) (s/def ::descend fn?) (s/def ::transformers (s/keys :req-un [::ascend ::descend])) (s/def ::active boolean?) (s/def ::transformation (s/merge ::transformers (s/keys :req-un [::active] :opt-un [::operation]))) (defmulti make-transformers :action) (s/fdef make-transformers :args (s/cat :operation ::operation) :ret ::transformers) (defmethod make-transformers :insert [op] {:descend (dead-end op) :ascend (split op)}) (defmethod make-transformers :remove [op] {:descend (split op) :ascend (dead-end op)}) (defmethod make-transformers :transpose [{:keys [index count offset] :as op}] ;; normalize `transpose` (ensuring a positive `offset`) to enable swapping ;; of `count` with `offset` for the `descend` transformation (let [{:keys [count offset] :as op} (if-not (neg? offset) op {:index (+ index offset) :count (- offset) :offset count})] {:descend (cross (merge op {:count offset :offset count})) :ascend (cross op)})) (defn- make-transformation [op] (-> op make-transformers (assoc :active true :operation op))) (s/fdef make-transformation :args (s/cat :operation ::operation) :ret ::transformation)
The methods of make-transformers
merely express in code what we saw in the
corresponding cell path diagrams. The only noteworthy part is the input
normalization in the :transpose
method (translation of transpose-left to
equivalent transpose-right operations).
Finally, the make-transformers
multimethod also serves as an extension
point for implementing other operations. New implementations should return a
::transformers
map, containing :ascend
and :descend
transformers with
conforming signatures (accept (s/cat :pos ::position :bypass (s/?
::bypass))
and return (s/nilable ::position)
).
This section will present the construction and usage of the transformation stack by functions that thread through the transformations. It will also provide a case study demonstrating the inner workings and capabilities (e.g., selective undo/redo) of our system.
The transformation stack implementation is very simple:
(s/def ::transformation-stack (s/coll-of ::transformation)) (defn- make-transformation-stack [] []) (defn- push [tf-stack op] (conj tf-stack (make-transformation op))) (defn- disable [tf-stack tf-idx] (assoc-in tf-stack [tf-idx :active] false)) (defn- enable [tf-stack tf-idx] (assoc-in tf-stack [tf-idx :active] true))
The stack is represented by a vector of transformations constructed by
make-transformation
and pushed in the order they were applied. This way,
the index of each transformation coincides with the index of the layout on
which it was applied (base-layout
): layout-0 is the base-layout
of the
first transformation, layout-1 of the second, and so on. Finally, a
transformation's status flag can be set to enable
or disable
it, as a
basis for selective undo/redo.
We'll now delve into the functions that traverse the transformation stack
to trace the cell origins (descend
) and heads (ascend
). Remember that
cell coordinates consist of both the position (space), and the layout index
(time):
(s/def ::layout ::nat) (s/def ::coordinate (s/tuple ::layout ::position))
We can now refine our preliminary plan for computing the mapping between visible cell positions (heads, essentially) and origins:
descend
) We'll construct the cell head as a vector
of the topmost layout index and the cell position. We will then descend
from the top through active transformations until we reach a dead-end or
the bottom, returning the found origin.ascend
) We'll enter the stack at the layout
specified by the cell's origin and ascend through active
transformations. If we reach a dead-end before the top, the cell has been
removed, and we will therefore return nil
. If we reach the top, we will
return the position of the found head.
Let's see the implementation of descend
:
(defn- descend [tf-stack pos] (reduce (fn [[layout pos] {:keys [active descend]}] (if-not active ;; `dec` used for tracking the layout regardless of status [(dec layout) pos] (if-let [to-pos (descend pos)] [(dec layout) to-pos] (reduced [layout pos])))) [(count tf-stack) pos] (rseq tf-stack))) (s/fdef descend :args (s/cat :tf-stack ::transformation-stack :pos ::position) :ret ::coordinate)
Indeed, the code does pretty much what's in the plan above.
On the contrary, ascend
is more involved:
(defn- ascend ([tf-stack coordinate] (ascend tf-stack coordinate nil)) ([tf-stack [layout pos :as coordinate] bypass] (letfn [(active? [layout] (or (zero? layout) (:active (get tf-stack (dec layout))))) (to-active [[layout pos :as coordinate]] (transduce (take-while (complement :active)) (completing (fn [[layout pos] {:keys [descend]}] [(dec layout) (descend pos bypass)])) coordinate (rseq (subvec tf-stack 0 layout)))) (to-visible [[layout pos]] (transduce (filter :active) (completing (fn [pos {:keys [active ascend]}] (or (ascend pos bypass) (reduced nil)))) pos (subvec tf-stack layout)))] (cond (active? layout) (to-visible coordinate) bypass (-> coordinate to-active to-visible))))) (s/fdef ascend :args (s/cat :tf-stack ::transformation-stack :coordinate ::coordinate :bypass (s/? ::bypass)) :ret (s/nilable ::position))
Most of the deviation from the description has to do with a few extra
complications that affect ascend
but not descend
:
descend
always returns an origin
coordinate, but ascend
may return nil
.ascend
also accepts an optional
third argument (bypass
), specifying the direction to follow in order to
overcome dead-ends (:max
for slice start and :min
for slice end) as
we thread through the stack.descend
the entry point for the stack traversal
always corresponds to an active (the user-visible, topmost) layout, this
does not hold for ascend
, because the origin may lie in a now inactive
layout. This can occur if we insert some cells, define some dependencies
on them and then undo the insertion.
Having considered these points, the body of ascend
becomes easier to
understand. If the origin belongs to an active layout, we follow the cell's
path upwards through the transformation stack towards the user-visible grid
(to-visible
). Otherwise, and only when given a bypass
direction
(meaning we're resolving a slice boundary), we have to first descend
through the stack to find the nearest cell in the first active layout we
encounter (to-active
). Afterwards, we ascend to find that cell's (or
nearest one's) position in the user-visible layout (using to-visible
, as
in the previous case – note that both to-active
and to-visible
inherit
the initially supplied bypass
direction, if any).
The following diagram illustrates such a case. Enabled transformations and
active layouts are drawn with gold, while disabled
and inactive ones with gray. This means for example
that transformation-8
at the top of the stack has been disabled,
therefore the user sees layout-8
. The magenta arrow
traces ascend
's traversal and ellipses mark transformation application
points along the way. The input, a slice boundary, originates from
layout-5
(inactive):
Slice boundaries with origin in disabled layouts are resolved this way to both ensure that the slice contains no cells originating from inactive layouts (such as the slice boundary itself). The downward traversal reverses the operations undone by the user after defining the slice and before performing the next still-active operation higher up in the stack.
Let's see how this system works and explore its capabilities by going through an example together. Suppose that we first insert 4 cells at index 2, and then remove 3 cells at index 4 (the removal intentionally includes cells present from the start, as well as cells inserted by the previous operation). We can easily construct the corresponding transformation stack:
(def our-tfs (-> (make-transformation-stack) (push {:action :insert :index 2 :count 4}) (push {:action :remove :index 4 :count 3})))
Now, if we query for origins of the first 10 cells, we get:
(map (partial descend our-tfs) (range 10))
([0 0] [0 1] [1 2] [1 3] [0 3] [0 4] [0 5] [0 6] [0 7] [0 8])
The first element of each pair represents the layout index and the second element represents the position of each cell within the layout.
A diagram of the transformation stack, following the same visual language built up until now, accompanies the result – we'll keep presenting the stack this way throughout our case study. Read this diagram from bottom to top as before. To reiterate, green and red represent inserted and to-be-removed cells, respectively, while magenta outlines the transformation of existing slices before and after the operation. Other colors, such as cyan and orange show where the original values of the spreadsheet end up after the operation. Finally, enabled transformations are drawn with gold, while disabled ones with gray (inactive layouts are grayed out, too).
Also recall that coordinates are tuples of the form [layout position]
. We
can now interpret the above result: only 2 of the 4 inserted cells ([1 2]
and [1 3]
) survived the removal (the other 2 cells created at layout 1,
[1 4]
and [1 5]
, were removed).
Indeed, if we undo the removal we are able to see all 4 inserted cells:
(map (partial descend (disable our-tfs 1)) (range 10))
([0 0] [0 1] [1 2] [1 3] [1 4] [1 5] [0 2] [0 3] [0 4] [0 5])
What happens if we selectively undo the insertion? Let's see:
(map (partial descend (disable our-tfs 0)) (range 10))
([0 0] [0 1] [0 2] [0 3] [0 7] [0 8] [0 9] [0 10] [0 11] [0 12])
Since no cells were ever inserted, all three removed cells originate from
the initial layout ([0 4]
, [0 5]
and [0 6]
).
Finally, to see the bypass
direction in action, suppose we define a slice
in the intermediate layout (after the insertion and before the removal),
between cells 2 and 6 (inclusive):5
(def our-slice (map (partial descend (pop our-tfs)) [2 6]))
Therefore, the origins of the slice boundaries are:6
our-slice
([1 2] [0 2])
We now query for their positions in the layout after the removal:
(map (partial ascend our-tfs) our-slice)
(2 nil)
Bummer. That nil
tells us that the cell corresponding to the slice end is
not currently visible, which means that its path reached a dead-end on its
ascent through the transformation stack. With the appropriate bypass
directions though (:max
and :min
for the slice's start and end,
respectively), we get:
(map (partial ascend our-tfs) our-slice [:max :min])
(2 3)
which really are the correct slice boundaries. Observe for example that [0
3]
at position 4 is not included, despite being inside the absolute range
of the original slice. And now, the final challenge: what are the slice
boundaries if we selectively undo the insertion?
(map (partial ascend (disable our-tfs 0)) our-slice [:max :min])
(2 2)
Interesting. The algorithm correctly determines the nearest visible cell in
the appropriate direction for both the inactive ([1 2]
) and active ([0
2]
) cell origin, in the respective bypass
directions. Furthermore,
observe that if the slice only contains cells of the inactive layout (2-5),
disabling its underlying transformation results in an empty slice:
(map (partial ascend (disable our-tfs 0)) (map (partial descend (pop our-tfs)) [2 5]) [:max :min])
(2 1)
More documented examples/scenarios for all layers of this library may be
found in the project's README as well as our tests, written using the
excellent com.cognitect/transcriptor
library.
In the previous section, we saw how the transformation stack is a simple yet powerful representation of history for a single-dimensional grid. In this section, we'll use it as a building block for the complete chart, capable of handling multi-dimensional grids. The chart will also contain the missing pieces that are necessary for fulfilling our requirements, such as the translation of origins to IDs (encoding) and back (decoding).
Up until now, we've restricted our system to a single dimension, which allowed for an elegant procedural representation of transformations between layouts. For a spreadsheet though (our driving example), this is inadequate, as it will have to accommodate at least two (rows and columns). It turns out that, if we admit that dimensions are mutually independent and we restrict operations to one dimension at a time (so that our logic/code continues to work as-is), we can just use one transformation stack per dimension. These conditions don't pose any practical problem, since in the (usual) case of mutually orthogonal (independent) dimensions, any complex operation can be decomposed into a sequence of elementary ones.
To construct a chart, we'll have to specify either the :dimensionality
of
the grid or the :dimensions
by name, which the dimensionality can be
derived from.
(s/def ::dimensions (s/and (s/coll-of keyword?) (partial apply distinct?))) (s/def ::dimensionality nat-int?)
We'll store one transformation stack per dimension under :tf-stacks
. We
also need a function (:dim->index
) to absorb the difference when
referring to dimensions by name (when provided at construction) or directly
by index.
(s/def ::tf-stacks (s/coll-of ::transformation-stack)) (s/def ::dim->index ifn?)
To support linear undo/redo, we'll keep a :trail
of operations. This
trail consists of a collection of :op-pointers
, as well as the index of
the :last-active
one (going up/down the pointers as we redo/undo
operations, respectively). Each pointer points to the transformation pushed
by the operation onto the respective dimension's stack.
(s/def ::dim-index ::nat) (s/def ::tf-index ::nat) (s/def ::op-pointers (s/coll-of (s/tuple ::dim-index ::tf-index))) (s/def ::last-active (s/or :none #{-1} :idx ::nat)) (s/def ::trail (s/keys :req-un [::op-pointers ::last-active]))
Finally, we need two functions to encode
origins to IDs and decode
them
back. Each is the inverse of the other, so that both (comp encode decode)
and (comp decode encode)
are equivalent to identity
(for applicable
data types).
(s/def ::encode ifn?) (s/def ::decode ifn?)
Now we can describe the chart's structure and construction:
(s/def ::chart (s/keys :req-un [::tf-stacks ::dim->index ::trail ::encode ::decode] :opt-un [::dimensions])) (defn make-chart "Constructs a multi-dimensional chart. If `dimensions` are supplied, dimensions are named after them, otherwise anonymous. Iff anonymous, their number is defined by `dimensionality`. The `encode`/`decode` functions convert cell origins to/from IDs (both default to `identity`)." [{:keys [dimensions dimensionality encode decode] :or {encode identity decode identity}}] (let [named? (seq dimensions)] (cond-> {:tf-stacks (vec (repeat (if named? (count dimensions) dimensionality) (make-transformation-stack))) :dim->index (if named? (zipmap dimensions (range)) identity) :trail {:op-pointers [] :last-active -1} :encode encode :decode decode} named? (assoc :dimensions dimensions)))) (s/fdef make-chart :args (s/cat :chart-options (s/keys :req-un [(or ::dimensions ::dimensionality)] :opt-un [::encode ::decode])) :ret ::chart)
Both encode
and decode
default to identity
, but any pair of mutually
inverse functions to and from the ID data type will do.7
The encode
and decode
functions were the last missing piece of the
pipeline. Its API consists of three functions, two for mapping (visible)
cells to IDs (cell->id
) and vice versa (id->cell
), as well as one for
computing slice contents (slice->cells
). Mirroring the juxtaposition of
transformation stacks for addressing multiple dimensions, cells are
represented by a similar juxtaposition of positions, and origins by a
juxtaposition of coordinates:
(s/def ::cell (s/coll-of ::position)) (s/def ::origin (s/coll-of ::coordinate))
This way, positions, coordinates, and transformations for each dimension are located at the index of that dimension.
Mapping a cell to its ID is straightforward:
(defn cell->id "Returns the ID for `cell` according to `chart`." [cell {:keys [tf-stacks encode] :as chart}] (let [cell->origin (partial map descend tf-stacks)] (-> cell cell->origin encode)))
It's taking advantage of the correspondence between transformations
and
cell
for calling descend
to obtain the origin. The inverse function is
similar:
(defn- origin->cell ([origin chart] (origin->cell origin chart nil)) ([origin {:keys [tf-stacks]} bypass] (reduce (fn [cell-position [coordinate tf-stack]] (if-let [position (ascend tf-stack coordinate bypass)] (conj cell-position position) (reduced nil))) [] (map vector origin tf-stacks)))) (defn id->cell "Returns the cell for `id` according to `chart`, `nil` if not visible." [id {:keys [decode] :as chart}] (-> id decode (origin->cell chart)))
This time, the origin->cell
helper is factored out and accepts an
optional bypass
argument, because it will also be used for computing
slice contents. It is watching for a possible nil
return by ascend
,
which in turn triggers an early nil
return for itself (meaning the cell
isn't currently visible).
Lastly, slices. Slices are represented by their boundaries:
(s/def ::id any?) (s/def ::start ::id) (s/def ::end ::id) (s/def ::slice (s/keys :req-un [::start ::end]))
To get the cells in a slice, we invoke id->cell
on its boundary IDs. To
ensure that we include only cells that actually belong to it, we specify a
:max
bypass
direction for the start and a :min
for the end. We then
compute the Cartesian product (from org.clojure/math.combinatorics
) of
the ranges delimited by the resolved bounding cells along each dimension to
generate the positions of all cells in between them:
(defn slice->cells "Returns the `slice`'s cells according to `chart`." [{:keys [start end]} {:keys [decode] :as chart}] (let [start (-> start decode (origin->cell chart :max)) end (-> end decode (origin->cell chart :min)) ranges (->> (map inc end) (map range start))] (apply comb/cartesian-product ranges)))
As explained earlier, the chart should support linear undo/redo.8 Recall that every operation in such a system discards
previously undone (and not redone) operations. This is precisely the role
of discard-undone
, which will be called inside operate
before pushing a
new operation to the appropriate stack. Note that operate
supports charts
with named and anonymous dimensions (as determined by the chart
's
construction parameters); it treats the dimension
parameter as a name in
the former case and as a numerical index in the latter.
(defn- discard-undone [{{:keys [op-pointers last-active]} :trail :as chart}] (assoc-in chart [:trail :op-pointers] (subvec op-pointers 0 (inc last-active)))) (defn operate "Performs `operation` on the `dimension` in `chart`. For charts with named dimensions, `dimension` is a name, otherwise an index. Discards any previously undone operations." [{:keys [dim->index tf-stacks] :as chart} dimension operation] (let [dim-idx (dim->index dimension) tf-count (count (nth tf-stacks dim-idx))] (-> chart discard-undone (update-in [:tf-stacks dim-idx] push operation) (update-in [:trail :op-pointers] conj [dim-idx tf-count]) (update-in [:trail :last-active] inc))))
Notice that discard-undone
truncates the
collection of op-pointers
inside trail
at the point indicated by
last-active
, but leaves all tf-stacks
intact. This is crucial for
preserving the meaning of the mapping between origins and IDs. To see why,
observe that if we remove a transformation from a stack, a later operation
would push a new one onto it, in the same place. Afterwards, IDs from the
discarded layout (created by the undone operation) would be erroneously
mapped to the new layout. In other words, transformations may only be
pushed to stacks, never popped.
Using the chart
's trail
, we can implement undo
and redo
:
(defn undo "Deactivates the last active operation in `chart`, if any." [{{:keys [op-pointers last-active]} :trail :as chart}] ;; check that there is some operation to undo (if (neg? last-active) chart (let [[dim-idx tf-idx] (get op-pointers last-active)] (-> chart (update-in [:tf-stacks dim-idx] disable tf-idx) (update-in [:trail :last-active] dec))))) (defn redo "Reactivates the last deactivated operation in `chart`, if any." [{{:keys [last-active op-pointers]} :trail :as chart}] ;; get the operation *after* `last-active` (if-let [[dim-idx tf-idx] (get op-pointers (inc last-active))] (-> chart (update-in [:tf-stacks dim-idx] enable tf-idx) (update-in [:trail :last-active] inc)) chart))
Both of these functions first perform a sanity check, i.e., that there is
an operation to be undone/redone. Afterwards, they disable
/ enable
its
transformation in the corresponding stack, before moving the last-active
pointer in the appropriate direction along the operation trail
.
While small, this library should hopefully offer some useful general precepts. For instance, it demonstrates how a composition of simple concepts can provide an elegant solution to a non-trivial problem. If we'd like to extrapolate a bit, we'd say that it also presents a case where a pure, abstract representation for implementing identity can be both effective and efficient. The derivation (as opposed to assignment) of identity, as well as the capture of an operation's procedure (rather than its effect) were key design aspects which helped to approach the problem's essence. As always, there should still be room for fixes of possible bugs (none known yet), improvements, optimizations, and possibly new features. Please feel free to report issues, provide feedback/suggestions or ask questions in the project's GitHub repository!
Actually, undo/redo is a bit more involved (e.g., we'll only simulate pop), for reasons that we'll be able to discuss towards the end.
The term "same" here appeals to intuition: two cells are considered "same" across two layouts if they should be associated with the same ID.
The cross
function is private,
and therefore acceptable for it to place somewhat peculiar restrictions on
its input. As will become apparent shortly, such a translation in its caller
improves clarity.
We're using pop
on the
transformation stack instead of the equivalent disable
on the last
transformation merely for conceptual clarity: to emphasize that the slice
was defined before removal and any undo.
In an actual application, a slice would be stored as a pair of IDs, which may be represented differently from coordinates. This doesn't affect our case study though, since the encoder and decoder should preserve an one-to-one correspondence between them.
While at it,
one should always measure before getting carried away with sophisticated
encodings. For example, to construct integer IDs, one might be tempted to
implement pairing functions, whereas the performance of something as simple
as (fn [origin] (-> origin pr-str .getBytes BigInteger.))
and (fn [id]
(-> id .toByteArray String. clojure.edn/read-string))
might be better than
a rudimentary implementation of the former.
Although our underlying system is also capable of selective undo/redo (as
demonstrated by the case study), the chart
API only supports linear for
the time being.