Code Study: Making code more functional

Having a pure function, what makes it more or less "functional" (as in "functional programming")? To me, "functional" includes favouring higher-level constructs over low-level "bit twiddling". Here I would like to demonstrate how I made one function more functional in this regard.

The awesome book Grokking Simplicity by Eric Normand explains very well the foundations of functional thinking, i.e. primarily the focus on separating data, (pure) calculations, and (side-effectful) actions. It then goes on to explore the next level of FP, namely using higher-order functions, functional iteration with sequence functions, and chaining of these, i.e. “express[ing] complex calculations as a series of steps called a chain”. That is exactly what we will be playing with here. (It might be a good idea to read/listen to Eric’s The 3 levels of functional thinking now.)

Word of caution: Don’t try this at work! If the function works and you do not need to touch it, you’d better spend your employer’s money on producing value. However what you do after office hours is up to you and I cannot be blamed for it.

This is by no means a criticism of the original code (which has been written ages ago and works just fine). It is only an exercise in functional thinking. It can be even argued that the more functional version is worse in other regards (such as performance and lines of code) and thus not worth adopting.

Context

We have named entities, let’s say Projects, and we need to create a short, unique, human-friendly key for this - similarly how Jira turns "Ardoq" into "ARD" to use in URLs. For whatever historical reasons, the algorithm is approximately:

  1. Remove non-ASCII characters, upper-case, break the name into words. If empty, use "XXX" instead

  2. Take the first letter from each word

  3. If the resulting key is not unique then add more letters until it is unique or the total length is 4+

    • Take the extra letters from the first word until exhausted, then from the second, …​. So from "Il Nino" we will get these candidates: "IN", "ILN", "ILNI"

  4. If still not unique then append a number, starting with 1 and going up until it is unique

The non-functional version 1

Here is the core part of the code. You do not need to understand the details - just look at the "shape" of it.

Why is it not sufficiently functional for me? Because it uses the lowest-level mechanism, namely recursion with loop-recur. The checks for uniqueness are spread everywhere, there is manual management of border conditions (length vs. 0 / 1 / 3), and it uses mutable state to construct the output. Finally, there is one big function that does all the heavy lifting.

(defn add-suffix [abbr kset]
  (loop [n 1]
    (let [candidate (str abbr n)]
      (if-not (contains? kset candidate)
        candidate
        (recur (inc n))))))

(defn maybe-add-suffix [abbr kset]
  (loop [candidate abbr
         n 1]
    (if-not (contains? kset candidate)
      candidate
      (recur (str abbr n) (inc n)))))

(defn name->unique-key
  [name existing-keys]
  (let [kset (set existing-keys)]
    (loop [char-seqs (->ascii-uppercase-words name)]
      (let [prefix (char-seqs->str char-seqs)
            length (count prefix)]
        (cond (= length 0) (maybe-add-suffix "XXX" kset)
              (and (> length 1) (not (contains? kset prefix))) prefix
              (> length 3) (add-suffix prefix kset)
              :else (let [found (atom false)
                          updated-char-seqs
                          (walk/prewalk
                            (fn [c]
                              f (and (char? c) ; not sure what this `and` does..
                                     (Character/isLowerCase ^char c)
                                     (not @found))
                              (let [upper-case (Character/toUpperCase ^char c)]
                                (if (not= c upper-case)
                                  ; => not a letter; not removed yet?!
                                  (do (reset! found true)
                                      upper-case)
                                  c))
                              c))
                            char-seqs)]
                      (cond (not @found) (add-suffix prefix kset)
                            :else        (recur updated-char-seqs))))))))

The functional version

So what does "functional" mean to me in this context?

  • Prefer sequence functions over low-level recursion

  • As somebody said, one and many are fine but two is wrong - i.e. write code that works either with a single thing or with a sequence of things, avoid the need for special cases such as two or three things

  • Leverage higher-order functions

  • Avoid manually maintaining state; leverage sequence functions instead

  • Split the algorithm into small, more independent steps

The core idea here is that instead of constructing the key manually until it is unique I want to have a sequence of all possible candidate keys, starting from the simplest one, and I will filter that to find the first one that fits, i.e. is unique and long enough.

The trouble is implementing the candidate-key fn, which produces a single candidate, plus the fact that we switch the algorithm in the middle from using more characters to appending a number (fortunately that can be modelled as a concatenation of two sequences, each generated by the respective algorithm). The complication with candidate-key is that turning each word into a letter(s) to be included in the key is not independent of the other words - it depends on how many extra characters are needed for the disambiguation and how many of those have already been supplied by the previous word(s). Thus we cannot simply (map word→prefix words). Read on to learn how to solve that puzzle!

(def desired-max-key-chars 5)
(def min-key-len 2)

(defn candidate-key
  "Make a candidate key from the given words with `extra-chars-cnt`
  extra characters (beyond the 1 / word)"
  [words extra-chars-cnt]
  (let [;; Task: Find out how many chars to take from each word (min 1);
        ;; it depends on how many we took from the previous ones.
        ;; I'd rather not keep state (how many extra chars remain?)
        ;; => reductions FTW!
        cumulative-length (reductions + 0 (map count words))
        ;; = how many extra characters have we been able to supply so far?
        prefix-lens (sequence (comp (map #(- extra-chars-cnt %))
                                    ; = how many chars remain to be taken?
                                    (map #(max 0 %))
                                    (map inc)) ; we always take at least 1
                              cumulative-length)]
    (some->> (seq (mapcat take prefix-lens words))
             ; isn't this beautiful? Just core fns and data!
             (str/join))))

(defn prefix-only-candidate-keys
  "A sequence of possible candidate keys based off the given `words`,
  using consequently more characters from the words"
  [words]
  (let [all (->> (map #(candidate-key words %)
                      (range 0 (dec desired-max-key-chars)))
                 (remove nil?)
                 dedupe
                 seq)]
    (or
      (seq (take-while #(< (count %) desired-max-key-chars) all))
      (seq (take 1 all)) ; if already the 1st candidate has 5+ chars
      ;; fallback:
      ["XXX"])))

(defn candidate-keys
  "A sequence of possible candidate keys - first using only the prefixes
  of the words then appending a number to disambiguate"
  [words]
  (let [pref-candidates (prefix-only-candidate-keys words)
        last-candidate (last pref-candidates)]
    (concat pref-candidates
            (map #(str last-candidate %) (next (range))))))

(defn name->unique-key [name existing-keys]
  (->> (candidate-keys (->ascii-uppercase-words name))
       (remove (set existing-keys))
       (drop-while #(< (count %) min-key-len))
       first))

Summary

I took a complicated, low-level function that does manual state management and border condition checking and replaced it with a few small, independent functions that leverage the Clojure sequence library. The resulting code is likely slower and possibly as hard to understand for someone not familiar with obscurities such as reductions. And it took some serious thinking to design it. But it was fun.


Tags: clojure


Copyright © 2022 Jakub Holý
Powered by Cryogen
Theme by KingMob