Clojure: How To Prevent "Expected Map, Got Vector" And Similar Errors
What my Clojure code is doing most of the time is transforming data. Yet I cannot see the shape of data being transformed - I have to know what the data looks like on the input and hold a mental model of how they change at each step. But I make mistakes. I make mistakes in my code so that the data does not correspond anymore to the model it should follow. And I make mistakes in my mental model of what the data currently looks like, leading likely to a code error later on. The end result is the same - a little helpful exception at some later step regarding wrong shape of data. There are two problems here: The error typically provides too little useful information and it usually manifests later than where the code/model mistake actually is. I therefore easily spend an hour or more troubleshooting these mistakes. In addition to that, it is also hard to read such code because a reader lacks the writer's mental model of the data and has to derive it herself - which is quite hard especially if the shape of the input data is not clear in the first place.
I should mention that I of course write tests and experiment in the REPL but I still hit these problems so it is not enough for me. Tests cannot protect me from having a wrong model of the input data (since I write the [unit] tests based on the same assumptions as the code and only discover the error when I integrate all the bits) and even if they help to discover an error, it is still time-consuming the root cause.
Can I do better? I believe I can.
The hard to troubleshoot errors with delayed manifestation and hard to understand code that communicates only half of the story (the transformations but not the shape of the data being transformed) is the price we pay for the power of dynamic typing. But there are strategies to lower this price. I want to present three of them: small, focused functions with good names, destructuring as documentation, and judicious use of pre- and post- conditions.
The content of this post is based on what I learned from Ulises Cerviño Beresi during one of the free Clojure Office Hours he generously offers, similarly to Leif in the US.
So we need to make the shape of data more obvious and to fail fast, preferably with a helpful error message.
The main idea is:
We have a webshop that sells discounted cars. We also have occasional campaigns with increased discounts for selected cars. For each car we have also a number of keywords people can use to find it and categories it belongs to. Below is code that processes raw car + campaigns + search keywords data from a DB query, first the original and then the refactored one with checks:
We have a webshop that sells discounted cars. Each car we sell has a base discount (either an absolute amount or percentage) and we also have occasional campaigns for selected cars. For each car we have also a number of keywords people can use to find it.
Below is code that processes raw car + campaigns + search keywords data from a DB query, selected the best applicable campaign and computing the final discount:
I had originally two [discovered] errors in the code and both took me quite a while to fix - first I forgot to convert JSON from string into a map (wrong assumption about input data) and then I run merge-campaigns directly on the list of car+campaign lists instead of mapping it (the sequential? precondition did not help to detect this error). So the transformations are clearly too error-prone.
The stack traces did not contain enough helpful context info (though a more experienced Clojurist would have certainly found and fixed the root causes much faster):
This is the code refactored into smaller functions with checks (and it certainly can be improved much more):
The main problem with pre- and post-conditions is that they do not provide any useful context in their error message and do not support adding a custom message. An error like
is better than not failing fast but does not tell as what the invalid value was and which of the thousands of cars had the invalid value.
Also, the checks are performed at runtime so they have a performance cost. This might not be a problem with checks such as (map?) but could be with f.ex. (every?).
Do you repeat the same checks again and again? Then you could either copy them using
This looks arguably like a good case for static types. And yes, I come from Java and lack them. On the other hand, even though static typing would solve the main category of problems, it creates new ones and has its liits.
A) I have actually quite a number of "types" here so it would require lot of classes to model fully:
B) A key strength of Clojure is the use of generic data structures - maps, vectors, lazy sequences - and powerful, easy to combine generic functions operating on them. It makes integrating libraries very easy since everything is just a map (and not a custom type that needs to be converted) and you can always transform these with your old good friends functions - whether it is a Korma SQL query definition, result set, or a HTTP request. Static types take this away.
C) Types permit only a subset of checks that you might need (that is unless you use Haskell :)) - they can check that a thing is a car but not that a return value is in the range 7 ... 42.
D) Some functions do not care about the type, only its small part - f.ex. jdbc-array-to-set only cares about the argument being a map, having the key, and if set, the value being a java.sql.Array.
Using smaller functions and pre+post conditions, I can discover errors much earlier and also document the expected shape of the data better, even more so with destructuring in fn signatures. There is some duplication in the pre/post conditions and the error messages are little helpful but is much better. I guess that more complex cases may warant the use of core.contracts or even core.typed / schema.
What strategies do you use? What would you improve? Other comments?
I encourage you to fork and improve the gist and share your take on it.
Updates
I should mention that I of course write tests and experiment in the REPL but I still hit these problems so it is not enough for me. Tests cannot protect me from having a wrong model of the input data (since I write the [unit] tests based on the same assumptions as the code and only discover the error when I integrate all the bits) and even if they help to discover an error, it is still time-consuming the root cause.
Can I do better? I believe I can.
The hard to troubleshoot errors with delayed manifestation and hard to understand code that communicates only half of the story (the transformations but not the shape of the data being transformed) is the price we pay for the power of dynamic typing. But there are strategies to lower this price. I want to present three of them: small, focused functions with good names, destructuring as documentation, and judicious use of pre- and post- conditions.
The content of this post is based on what I learned from Ulises Cerviño Beresi during one of the free Clojure Office Hours he generously offers, similarly to Leif in the US.
So we need to make the shape of data more obvious and to fail fast, preferably with a helpful error message.
The main idea is:
- Break transformations into small, simple functions with clear names
- Use destructuring in function arguments to document what data is expected
- Use pre- and post-conditions (and/or asserts) both as checks and documentation
- (All the testing and interactive exploration in REPL that you already do.)
A simplified example
We have a webshop that sells discounted cars. We also have occasional campaigns with increased discounts for selected cars. For each car we have also a number of keywords people can use to find it and categories it belongs to. Below is code that processes raw car + campaigns + search keywords data from a DB query, first the original and then the refactored one with checks:
(require '[cheshire.core :refer [parse-string]]) | |
(defn- json->data [key m] | |
(update-in m [key] #(parse-string % true))) | |
(defn- select-campaign [car+campaigns] | |
(first car+campaigns)) | |
(defn- jdbc-array-to-set | |
[key m] | |
(update-in m [key] #(apply hash-set (.getArray %)))) | |
(defn refine-cars-simple | |
"Process get-cars query result set - derive additional data, transform values into better ones | |
There is one row per car and campaign, a car may have more campaigns - we pick the best one. | |
" | |
[cars-raw] | |
(->> | |
cars-raw | |
(map (partial json->data :json)) ;; <- this I originally forgot | |
;; group rows for the same car => [[car1 ..][car2 ..]] | |
(group-by :id) | |
(vals) | |
;; join all car1 vectors into one car .. | |
(map select-campaign) | |
(map (fn [car] | |
(->> | |
car | |
(jdbc-array-to-set :category_ref) | |
(jdbc-array-to-set :keywords)))))) |
(require '[cheshire.core :refer [parse-string]]) | |
(require '[clojure.set :refer [subset? difference]]) | |
(defn- car? [{:keys [id] :as car}] | |
(and (map? car) id)) | |
(defn- json->data [key m] | |
{:pre [(contains? m key) (string? (get m key))], :post [(map? (get % key))]} | |
(update-in m [key] parse-string true)) | |
(defn- select-campaign [[first-car :as all]] | |
{:pre [(sequential? all) (car? first-car)], :post [(car? %)]} | |
first-car) | |
(defn- jdbc-array-to-set | |
[key m] | |
{:pre [(contains? m key) (instance? java.sql.Array (get m key))], :post [(set? (get % key))]} | |
(update-in m [key] #(apply hash-set (.getArray %)))) | |
(defn group-rows-by-car [cars-raw] | |
{:pre [(sequential? cars-raw) (car? (first cars-raw))] | |
:post [(sequential? %) (vector? (first %))]} | |
(vals (group-by :id cars-raw))) | |
(defn refine-car [car] | |
{:pre [(car? car) (:keywords car) (:category_ref car)]} | |
(->> car | |
(jdbc-array-to-set :category_ref) | |
(jdbc-array-to-set :keywords))) | |
(defn refine-cars-simple | |
"Process get-cars query result set - derive additional data, transform values into better ones | |
There is one row per car and campaign, a car may have more campaigns - we pick the best one. | |
" | |
[cars-raw] | |
(->> | |
cars-raw | |
(map (partial json->data :json)) ;; <- this I originally forgot | |
(group-rows-by-car) | |
(map select-campaign) | |
(map refine-car))) | |
(defn empty-array [] (reify java.sql.Array (getArray [_] (object-array [])))) | |
(refine-cars-simple [{:id 1, :json "{\"discount\":5000}", :campaign_discount 3000, :category_ref (empty-array), :keywords (empty-array)}]) |
A real-world example
We have a webshop that sells discounted cars. Each car we sell has a base discount (either an absolute amount or percentage) and we also have occasional campaigns for selected cars. For each car we have also a number of keywords people can use to find it.
Original code
Below is code that processes raw car + campaigns + search keywords data from a DB query, selected the best applicable campaign and computing the final discount:
(require '[cheshire.core :refer [parse-string]]) | |
(defn db-array [col] (reify java.sql.Array (getArray [_] (object-array col)))) | |
(defn- json->data [data fields] | |
(map (fn [data] | |
(reduce (fn [data field] | |
(assoc data field (parse-string (get data field) true))) | |
data fields)) data)) | |
(defn- discount-size [discount] | |
(if (or | |
(> (:amount discount) 10000) | |
(> (:percent discount) 5)) | |
:high | |
:normal)) | |
(defn- jdbc-array-to-set | |
"Convert a PostgreSQL JDBC4Array inside the map `m` - at the key `k` - into a se" | |
[key m] (update-in m [key] #(some->> % (.getArray) (into #{})))) | |
(defn- compute-discount | |
"Derive the :discount map based on the car's own discount and its active campaign, if applicable" | |
[car] | |
(let [{:keys [json campaign_discount_amount campaign_discount_percent]} car | |
{:keys [discount_amount discount_percent]} json | |
discount? (:use_campaign car) | |
amount (if discount? | |
(apply + (remove nil? [discount_amount campaign_discount_amount])) | |
discount_amount) | |
percent (if discount? | |
(apply + (remove nil? [discount_percent campaign_discount_percent])) | |
discount_percent) | |
discount {:amount amount | |
:percent percent} | |
discount-size (discount-size discount) | |
] | |
(assoc car :discount discount :discount-size discount-size))) | |
(defn merge-campaigns | |
"Given vector of car+campaign for a particular car, return a single car map | |
with a selected campaign. | |
" | |
[car+campaigns] | |
{:pre [(sequential? car+campaigns) :id]} | |
(let [campaign-ks [:use_campaign :campaign_discount_amount :campaign_discount_percent :active] | |
car (apply dissoc | |
(first car+campaigns) | |
campaign-ks) | |
campaign (->> car+campaigns | |
(map #(select-keys % campaign-ks)) | |
(filter :active) | |
(sort-by :use_campaign) ;; true, if any, will be last | |
last)] | |
(assoc car :best-campaign campaign))) | |
(defn refine-cars | |
"Process get-cars query result set - derive additional data, transform values into better ones | |
There is one row per car and campaign, a car may have more campaigns - we pick the best one. | |
" | |
[cars-raw] | |
(->> cars-raw | |
(#(json->data % [:json])) | |
(#(map merge-campaigns | |
(vals (group-by :id %)))) | |
(map (comp ;; reminder - the 1st fn is executed last | |
compute-discount | |
(fn [m] (update-in m [:keywords] (partial remove nil?))) ;; {NULL} => [] | |
(partial jdbc-array-to-set :keywords) | |
(partial jdbc-array-to-set :category_ref) | |
)) | |
)) | |
(refine-cars [ | |
{:id 1 | |
:json "{\"discount_amount\":9000,\"discount_percent\":0}" | |
:campaign_discount_amount 2000 | |
:campaign_discount_percent nil | |
:use_campaign false | |
:active true | |
:keywords (db-array ["fast"]) | |
:category_ref (db-array [])}]) |
Defects and me
I had originally two [discovered] errors in the code and both took me quite a while to fix - first I forgot to convert JSON from string into a map (wrong assumption about input data) and then I run merge-campaigns directly on the list of car+campaign lists instead of mapping it (the sequential? precondition did not help to detect this error). So the transformations are clearly too error-prone.
The stack traces did not contain enough helpful context info (though a more experienced Clojurist would have certainly found and fixed the root causes much faster):
## Forgotten ->json:
java.lang.NullPointerException:
clojure.lang.Numbers.ops Numbers.java: 961
clojure.lang.Numbers.gt Numbers.java: 227
clojure.lang.Numbers.gt Numbers.java: 3787
core/discount-size cars.clj: 13
core/compute-discount cars.clj: 36
-------------
## Forgotten (map ..):
java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to clojure.lang.IPersistentMap
RT.java:758 clojure.lang.RT.dissoc
core.clj:1434 clojure.core/dissoc
core.clj:1436 clojure.core/dissoc
RestFn.java:142 clojure.lang.RestFn.applyTo
core.clj:626 clojure.core/apply
cars.clj:36 merge-campaigns
...
Refactored
This is the code refactored into smaller functions with checks (and it certainly can be improved much more):
(require '[cheshire.core :refer [parse-string]]) | |
(require '[clojure.set :refer [subset? difference ]]) | |
(defn db-array [col] (reify java.sql.Array (getArray [_] (object-array col)))) | |
(defn- json->data [data fields] | |
{:pre [(sequential? data) (sequential? fields)]} | |
(map (fn [data] | |
(reduce (fn to-json [data field] | |
{:pre [(map? data) (string? (get data field)) (keyword? field)]} | |
(assoc data field (parse-string (get data field) true))) | |
data fields)) data)) | |
(defn- discount-size [{:keys [amount percent] :as discount}] | |
{:pre [(number? amount) (number? percent) (<= 0 amount) (<= 0 percent 100)] | |
:post [(#{:high :normal} %)]} | |
(if (or | |
(> amount 10000) | |
(> percent 5)) | |
:high | |
:normal)) | |
(defn- jdbc-array-to-set | |
"Convert a PostgreSQL JDBC4Array inside the map `m` - at the key `k` - into a se" | |
[key m] | |
{:pre [(keyword? key) (map? m) (let [a (key m)] (or (nil? a) (instance? java.sql.Array a)))]} | |
(update-in m [key] #(some->> % (.getArray) (into #{})))) | |
(defn car? [{:keys [id] :as car}] | |
(and (map? car) id)) | |
(defn- compute-discount | |
"Derive the :discount map based on the car's own discount and its active campaign, if applicable" | |
[{{:keys [discount_amount discount_percent] :as json} :json | |
:keys [campaign_discount_amount campaign_discount_percent] :as car}] | |
{:pre [(car? car) (map? json) (number? discount_amount) (number? discount_percent)] | |
:post [(:discount %) (:discount-size %)]} | |
(let [discount? (:use_campaign car) | |
amount (if discount? | |
(apply + (remove nil? [discount_amount campaign_discount_amount])) | |
discount_amount) | |
percent (if discount? | |
(apply + (remove nil? [discount_percent campaign_discount_percent])) | |
discount_percent) | |
discount {:amount amount | |
:percent percent} | |
discount-size (discount-size discount) | |
] | |
(assoc car :discount discount :discount-size discount-size))) | |
(defn select-campaign | |
"Return a single car map with a selected campaign." | |
[{:keys [campaigns] :as car}] | |
{:pre [(car? car) (sequential? campaigns)] | |
:post [(contains? % :best-campaign)]} | |
(let [best-campaign (->> campaigns | |
(filter :active) | |
(sort-by :use_campaign) ;; true, if any, will be last | |
last)] | |
(-> car | |
(dissoc :campaigns) | |
(assoc :best-campaign best-campaign)))) | |
(defn nest-campaign [car] | |
;; :pre check for campaing keys would require too much repetition => an assert instead | |
{:pre [(car? car)] | |
:post [((comp map? :campaign) %)]} | |
(let [ks (set (keys car)) | |
campaign-ks #{:campaign_discount_amount :campaign_discount_percent :use_campaign :active} | |
campaign (select-keys car campaign-ks)] | |
(assert (subset? campaign-ks ks) | |
(str "Campaign keys missing from the car " (:id car) ": " | |
(difference campaign-ks ks))) | |
(-> (apply dissoc car campaign-ks) | |
(assoc :campaign campaign)))) | |
(defn group-rows-by-car [cars-raw] | |
{:pre [(sequential? cars-raw) (every? map? cars-raw)] | |
:post [(sequential? %) (every? vector? %)]} | |
(vals (group-by :id cars-raw))) | |
(defn join-campaigns [[car+campaign :as all]] | |
{:pre [(sequential? all) (:campaign car+campaign)] | |
:post [(:campaigns %)]} | |
(-> car+campaign | |
(assoc :campaigns | |
(map :campaign all)) | |
(dissoc :campaign))) | |
(defn refine-car [car] | |
{:pre [(car? car)] | |
:post [(:discount %)]} ; keywords and :category_ref are optional | |
(->> car | |
(jdbc-array-to-set :category_ref) | |
(jdbc-array-to-set :keywords) | |
(#(update-in % [:keywords] (partial remove nil?))) ;; {NULL} => [] | |
(select-campaign) | |
(compute-discount))) | |
(defn refine-cars | |
"Process get-cars query result set - derive additional data, transform values into better ones | |
There is one row per car and campaign, a car may have more campaigns - we pick the best one. | |
" | |
[cars-raw] | |
(->> cars-raw | |
(#(json->data % [:json])) | |
(map nest-campaign) | |
(group-rows-by-car) | |
(map join-campaigns) | |
(map refine-car) | |
)) | |
(refine-cars [ | |
{:id 1 | |
:json "{\"discount_amount\":9000,\"discount_percent\":0}" | |
:campaign_discount_amount 2000 | |
:campaign_discount_percent nil | |
:use_campaign false | |
:active true | |
:keywords (db-array ["fast"]) | |
:category_ref (db-array [])}]) |
Downsides
The main problem with pre- and post-conditions is that they do not provide any useful context in their error message and do not support adding a custom message. An error like
Assert failed: (let [a (key m)] (or (nil? a) (instance? java.sql.Array a))) cars.clj:18 user/jdbc-array-to-set
is better than not failing fast but does not tell as what the invalid value was and which of the thousands of cars had the invalid value.
Also, the checks are performed at runtime so they have a performance cost. This might not be a problem with checks such as (map?) but could be with f.ex. (every?).
What about duplication?
Do you repeat the same checks again and again? Then you could either copy them using
with-meta
(they end-up in metadata anyway) or reuse the explicitely:
(defn with-valid-car [f] (fn [car] {:pre [:make :model :year]} (f car)))
(def count-price (with-valid-car (fn [car] (do-something car))))
;; or make & use a macro to make it nicer
What about static types
This looks arguably like a good case for static types. And yes, I come from Java and lack them. On the other hand, even though static typing would solve the main category of problems, it creates new ones and has its liits.
A) I have actually quite a number of "types" here so it would require lot of classes to model fully:
- Raw data from the DB - car with campaign fields and keywords, category_ref as java.sql.Array
- Car with keywords as a sequence
- Car with category_ref as a sequence
- Car with a nested :campaign "object"
- Car with a nested :best-campaign object and with :rate (you could have :rate there from start, set initially to nil, but then you'd still need to ensure that the final function sets it to a value)
B) A key strength of Clojure is the use of generic data structures - maps, vectors, lazy sequences - and powerful, easy to combine generic functions operating on them. It makes integrating libraries very easy since everything is just a map (and not a custom type that needs to be converted) and you can always transform these with your old good friends functions - whether it is a Korma SQL query definition, result set, or a HTTP request. Static types take this away.
C) Types permit only a subset of checks that you might need (that is unless you use Haskell :)) - they can check that a thing is a car but not that a return value is in the range 7 ... 42.
D) Some functions do not care about the type, only its small part - f.ex. jdbc-array-to-set only cares about the argument being a map, having the key, and if set, the value being a java.sql.Array.
What else is out there?
- core.typed
- prismatic/schema
- contracts programming with core.contracts
Conclusion
Using smaller functions and pre+post conditions, I can discover errors much earlier and also document the expected shape of the data better, even more so with destructuring in fn signatures. There is some duplication in the pre/post conditions and the error messages are little helpful but is much better. I guess that more complex cases may warant the use of core.contracts or even core.typed / schema.
What strategies do you use? What would you improve? Other comments?
I encourage you to fork and improve the gist and share your take on it.
Updates