Refactoring & Type Errors in Clojure: Experience and Prevention

While refactoring a relatively simple Clojure code to use a map instead of a vector, I have wasted perhaps a few hours due to essentially type errors. I want to share the experience and my thoughts about possible solutions since I encounter this problem quite often. I should mention that it is quite likely that it is more a problem (an opportunity? :-)) with me rather than the language, namely with the way I write and (not) test it.

The core of the problem is that I write chains of transformations based on my sometimes flawed idea of what data I have at each stage. The challenge is that I cannot see what the data is and have to maintain a mental model while writing the code, and I suck at it. Evaluating the code in the REPL as I develop it helps somewhat but only when writing it - not when I decide to refactor it.

Refactoring & Type Errors in Clojure

While refactoring a relatively simple Clojure code to use a map instead of a vector, I have wasted perhaps a few hours due to essentially type errors. I want to share the experience and my thoughts about possible solutions since I encounter this problem quite often. I should mention that it is quite likely that it is more a problem (an opportunity? :-)) with me rather than the language, namely with the way I write and (not) test it.

The core of the problem is that I write chains of transformations based on my sometimes flawed idea of what data I have at each stage. The challenge is that I cannot see what the data is and have to maintain a mental model while writing the code, and I suck at it. Evaluating the code in the REPL as I develop it helps somewhat but only when writing it - not when I decide to refactor it.

A typical example of an error-prone code:

(->> data
     expr1  ; this is often a core function, a keyword, sometimes my own fn
     ...
     exprI  ; <-- suddenly I get here a nil, an empty seq, an exception, or just miss the expected data
     ...
     exprN)

I have struggled with the question how to do a refactoring of data in Clojure in a safer way before and got some useful tips.)

Solutions for preventing type errors

I see three possible solutions:

  1. Make it possible to see the data as it flows through the transformation, i.e. the last data produced by each expression during the last execution. LightTable's watches and insta-repl can do this and J.P. Posma called this "omniscient debugging" and demonstrated how it can be done in JavaScript in his StrangeLoop talk Visualising program execution. This is how programming should work anyway. (Preferably with having a control over what/how data is shown and being able to interact with it to explore it.) However I haven't managed to get this working in LT with my ClojureScript project (due to a weird "Can't dynamically bind non-dynamic var: clojure.tools.reader/resolve-symbol at line 1 :: {:file nil, :line 1, :column 61, :tag :cljs/analysis-error").

  2. Use core.typed to define the shape of the data and let the type checker verify that my code conforms to it at each stage. [Core.typed has its own challenges](TODO: CircleCI blog) but in this case it would help me nicely. Sadly the ClojureScript integration is currently not up to date. Alternatively I could use Prismatic Schema, which makes it possible to specify & check the expected shape of the data but can't, I believe, really help to check that each step of a transformation makes sense.

  3. Develop a little differently. David Nolen has recommended the following approach that I could try to apply more, though it doesn't feel 100% suitable solution for the particular problem I have encountered: Try something at the REPL first, then and pre/post-conditions around that. but do not get crazy with :pre/post, f.ex. only add them at entry points. And of course have a good dose of cljs.test. David a small blog series about this: Life with Dynamic Typing, Faster Validation Through Immutability, and Lazy Contracts in 30 lines.

Let's now have a look at my data, code, and the flawed refactoring.

The data

Input (simplified):

(def hardwareCatalog {
    :productId111 {
        :type "ACCESSORY"; ...:variations {
            :111222 {
                :sku 111222:name "Awesome Watch"
            }
        }
    }
    ; ...
    })
(def availability {:111222 { :availableQuantity 0, #_(...) }})

Expected output:

([:7056418 "Mordor Design Flipcover Xperia Z3 Bloody" 0]
 [:7056419 "Mordor Design Flipcover Xperia Z3 Dark" 3]
 ...)

The original, vector-based code

We want to provide a list of variations with their SKU (an ID), name, and available quantity.

(def hardwareCatalog ...)
(def availability ...)
(def low-stock-treshold 10)

(defn- ->sku+name [hardwareCatalog]
  (->> hardwareCatalog
       vals
       (mapcat :variations)
       (map (fn [[sku {:keys [name]}]] [sku name]))))

(defn low-on-stock [hardwareCatalog availability]
  (->> hardwareCatalog
       ->sku+name
       (map (fn [[sku _ :as row]]
              (conj row
                    (->> availability
                         sku
                         :availableQuantity))))
       (filter (fn [[_ _ stock]] (<= stock low-stock-treshold)))
       (sort-by second)))

(comment (low-on-stock hardwareCatalog availability))

The bug-ridden (partial) refactoring

(Update: Erik Assum was so kind that he invested the time and wrote a much better, simpler version of this code.)

(defn ->vec"Helper to change a map back to a vector so we can refactor stepwise"
    [keys vec-or-map]
 (if (map? vec-or-map)
   ((apply juxt keys) vec-or-map)
   vec-or-map))

(defn- ->sku+name [hardwareCatalog]
 (->> hardwareCatalog
      vals
      ; WAS: (mapcat :variations) (map (fn [[sku {:keys [name]}]] [sku name]))
      (mapcat
          #(map (fn [variation] (merge variation (select-keys % [:type])))
                (:variations %))) ; <-- MISTAKE 1: I forgot that variations isn't; sequential but a map; should be wrapped in (vals ...)
      (map #(->vec [:sku :name] %))
      (map (fn [[sku {:keys [name]}]] [sku name]))
      #_(map #(select-keys % [:sku :name :type])) ;; replace the prev 2 lines with this when :type fixed
      ))

(defn low-on-stock [hardwareCatalog availability]
 (->> hardwareCatalog
      ->sku+name
      (->vec [:sku :name]) ; <--- MISTAKE 2: It should be (map ->vec ..) but doesn't fail, leading to an error later; it should have been (map ->vec)
      (map (fn [var] (assoc var
                       :availableQuantity
                       (->> availability
                            (:sku var) ; <-- MISTAKE: After fixing #1 this will stop working, yielding nil everywhere; it should have become ((keyword (:sku var))):availableQuantity))))
      (filter #(<= (:availableQuantity %) low-stock-treshold))
      (sort-by :name)
      #_(map #(->vec [:sku :name :availableQuantity] %))))

The problems:

  1. I forgot that variations was a map, not a sequence. The code "worked" but the :type wasn't added to the variations. Spotting why was made even more difficult by having limited print to max 10 items and depth 3 (to prevent timouts) and thus not printing the keys and structures relevant for the mistake.
  2. Due to the inconsistency between ->sku+name taking a collection and ->vec taking a single item, I've mistakenly used (->vec) instead of (map ->vec). Again the code "worked" but did not do the expected thing so after fixing #1 and sending through a map, the code started to fail at another place and I couldn't figure out why.
  3. After fixing #1 and #2, the transformation again stopped doing the expected thing and all availableQuantities were nil. It wasn't easy to spot that it was because I have switched from using :<sku> to just <sku> (we could blame it on inconsistency in data, where SKUs used as keys have been turned into keywords but not those used as values).

Conclusion

Due to the use of nil as widely acceptable Null Object and the fact that many functions work on different types of inputs, mistakes in Clojure can travel far and manifestate at unexpected ways and places. That is a problem for people like me how aren't good at maintaining a correct mental model of the data as it is being transformed.

The problem could be mitigated by providing more visibility and displaying the data produced by each expression and making it possible to explore it interactively (as you can with a console.log-ed object in a browser's Console). It could be also mitigated by expressing you expectations about the shape of data and letting the computer verify regularly that they hold, using core.typed or pre- and post-conditions, Prismatic Schema etc.

(You can comment at the associated blog post or here.)


Tags: clojure refactoring ClojureScript


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob