Holy Dev Newsletter Sep 2023

Welcome to the Holy Dev newsletter, which brings you gems I found on the web, updates from my blog, and a few scattered thoughts. You can get the next one into your mailbox if you subscribe.

What is happening

As is often the case, the month didn’t go as planned. But it was very productive, especially for Fulcro (see below). I planned to dig more into Rama, alas…​ .

For my tiny ERP, I have put in place the building blocks of the last key feature necessary for using in production, namely backups of data to a Google Sheet. This was inspired by a downtime I experienced with Fly.io’s DNS that shut down access to the app for over a day. Since the ERP is supposed to be critical for manufacturing in a small company, which cannot afford big problems, it is essential that they can access their data no matter what happens to Fly.io. A frequent backup to a G. Sheet seemed a sufficient, low-cost solution. I have learned and written about accessing Google API with OAuth2 and a service account from Clojure (without relying on Google SDK) and implemented a PoC of backing up key data from Datomic to a Sheet via the light-weight happygapi library. I still need to put in place the actual scheduled job and work on improving the feature with pushing key changes more frequently. I have also added uptime monitoring with the free plan of UptimeRobot.

I have discovered pagefind.app, a Rust CLI tool that can generate static JS search engine for a bunch of HTML files, and used it to finally replace my custom Google search for the blog.

The month of Fulcro

Users have prompted me to fix my DIY Fulcro workshop (which required updating dependencies for jack-in to work again in Calva), and to bring back to live FulcroDemos - a growing (?🙏) set of small examples, i.e. tiny Fulcro apps exploring various problems and capabilities. I have also upgraded minimalist-fulcro-template-backendless, a minimalistic template for Fulcro apps with in-browser Pathom “backend”, to the latest dependencies (neil deps upgrade FTW!) and to finally switch it over from Pathom 2 to Pathom 3. I have been also prompted to improve colors in and warnings from fulcro-troubleshooting, a Fulcro "addon," which helps detect possible problems early, with in-app notifications.

The biggest unplanned work, and the one that gave me the most satisfaction, was making it possible to live-code Fulcro applications in your browser through the power of Michiel Borkent’s SCI. You can read more about it and play with it in Include interactive Clojure/script code snippets in a web page with SCI & friends. I believe it will be a boost to teaching and demonstrating Fulcro. Michiel has also asked me to help set up a static site with the editor and all the libraries with SCI support. I imagine we could hook it up with loading gists so that you could easily share editable, rendered Fulcro apps with others. I am indebted to Michiel, Tony Kay, and Thomas Heller for their invaluable help.

Gems from the world wide web

👓 candera/causatum: A Clojure library for generating streams of events based on stochastic state machines. [clojure, library, testing]
A Clojure library for generating streams of timed events based on stochastic state machines - i.e. like a Finite State Automaton (FSM), but each possible transition has a probability (and also delay, which I read as "cost ~ time to perform the transition"). The event-stream then gives you a stream of [current state, time counter, info about transition taken (weight, delay)]. You could use this e.g. to model user's behavior on a webpage. It gives you more control than randomly generating actions with test.check.
(For test.check, there is also fsm-test-check, described in the post Verifying State Machine Behavior Using test.check).

👓 Tree-sitter|Introduction [library, parsing]
"Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited." Intended, it seems, primarily for editors. Written in C++ but with Java and other bindings.

👓 xdrop/fuzzywuzzy: Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy... [library, java, search]
No dependencies fuzzy string matching for java based on the FuzzyWuzzy Python algorithm, using a fast implementation of Levenshtein distance to calculate similarity between strings. Simple to use, lightweight.
Similar but heavier: clj-fuzzy (a native clj(s) implementation of a bunch of popular algorithms dealing with fuzzy strings and phonetics), org.apache.commons.text.similarity.

👓 API Hub - Free Public & Open Rest APIs | Rapid [SaaS, startup]
API providers (such as startups) can leverage RapidAPI to easily add auth*, billing, monitoring, test sandbox UI for users, etc. I have no experience with it but it could be useful to speed up building/testing you API-focused startup.

👓 carp-lang/Carp: A statically typed lisp, without a GC, for real-time applications. [programming languages, learning]
Carp - if Rust and Clojure had a baby :-) A Clojure-inspired, statically typed & compiled Lisp without garbage collection (but with ownership tracking instead). Interesting and already usable, though with a bunch of bugs. Marked as a research project and a warning not to use it for anything important just yet, v 0.5. Written in Haskell, started in 2016.

🎥 A great talk by Christian Johansen, a Clojurist and highly experienced frontend dev (one time author of the 📘 Test-Driven...
A great talk by Christian Johansen, a Clojurist and highly experienced frontend dev (one time author of the 📘 Test-Driven JavaScript Development) about framework-agnostic best practice for building maintainable web applications (and a little about his Portfolio tool). The main points? First, UI should be a "pure" function of data (what he calls data-driven UIs). I.e. components only get data from their parent - they do not fetch anything themselves with useEffect or a Re-frame subscription. Second, the UI is written in the language of the UI and does not use any domain terms - the incoming domain data passes first through a pure prepare function, which translates them into the UI domain. E.g. you don't have FacilitySelector with meter-name and street, but a Dropdown with a title and details - which you can reuse for displaying highly different kinds of data. The argument is that a frontend dev is in the domain of building UIs and not in the business domain of e.g. providing electricity to end consumers. And I trust Christian's experience on this point.

👓 Norris Numbers [opinion, productivity, software development]
A great post, posing the hypothesis that applications at different scale (LoC) need (radically) different approaches to write. With a particular approach, you will eventually hit a wall of how much complexity you can manage - e.g. ± 2000, 20k, 200k, ... . Just as running will eventually plateau and require you to switch to a car, then a plane, ... .

What’s the key to breaking past that [the 20k limit]? For me, it was keeping things simple. Absolutely refuse to add any feature or line of code unless you need it right now, and need it badly. I already touched on this in Every Line Is a Potential Bug (and sophomorically before that in Simple is Good).

And (highlight mine):
The real trick is knowing when a new feature adds linear complexity (its own weight only) or geometric complexity (interacts with other features). Both should be avoided, but the latter requires extra-convincing justification.

The author also points out it is difficult to justify the 20k/200k techniques to somebody who has not experienced so large code bases, because they only make sense at scale.

👓 Kitemaker [SaaS, project management]
"End-to-end product development from user feedback to shipping things people want." A new SaaS tool for managing project work. Contrary to Jira, it aims for the whole lifecycle, from specifying what to do, through individual work items, to user feedback. Having that in a single, integrated tool is attractive. I haven't tried it but would like to.

👓 Announcing HoneyEQL 1.0 - Tamizhvendan [clojure, library, database] - HoneyEQL is a Clojure library that enables you to query the database declaratively using the EDN Query Language(EQL). It aims to simplify th
This library looks interesting for when you use a RDBMS with next.jdbc and use joins to fetch an entity with sub-entities. With HoneyEQL, you specify an EQL query such as [:presentation/title {:presentation/slides [:slide/title ...]}} and HoneyEQL will generate the correct SQL with joins and then post-process the with group-by or something to turn the tables back to the tree you actually want: {:presentation/title "Clj Rocks", :presentation/slides [{:slide/title "Into"},{:slide/title "Conclusion"}]}.

I work in gaming, so I cannot speak to your specific experiences. Entity Component Systems are extremely performant, really good science, and shipping in middlewares like Unity. However, in order to ship an ECS game, in my experience, you have to have already made your whole game first in a normal approach, in order to have everything be fully specified sufficiently that you can correctly create an ECS implementation. In practice, this means ECS is used to make first person shooters, which have decades of well specified traditions and behavior, and V2 of simulators, like for Cities Skylines 2 and Two Point Campus. - doctorpangloss at Hacker News

I have heard about ECS before, and it is interesting to hear that you actually need to already have designed the game correctly to know how to re+design it with ECS, that you can't do ECS from scratch.

👓 Stateless, data-driven UIs [opinion, webdev, productivity]
The article Stateless, data-driven UIs explores how to deal with the complexity of frontend development by separating UI from state management and event handling. Christian uses simple, generic components, and all the state management logic is in a pure, testable function. Event handling is also described with data, and thus testable. ❤️

👓 Polylith in a Nutshell - Polylith [clojure, tool, architecture]
A new and great introduction page for Polylith. Just remember that "software architecture" in this context refers to how your code is structured into modules (which differs from how it is deployed). Polylith is an approach + tooling for structuring your code into small, separate modules that can easily use each other, and be combined in somewhat arbitrary ways to produce deployable artifacts.

👓 Introducing runes [webdev, framework, javascript]
Interesting development in upcoming Svelte 5, introducing "runes, which unlock universal, fine-grained reactivity." Universal = not limited to .svelte files, more explicit (with $state(val) rather then (only) top-level let ...) and thus easier to control and understand, extending from compile-time detection of dependencies to also runtime. With the pre-5 approach "[..] code is hard to refactor, and understanding the intricacies of when Svelte chooses to update which values can become rather tricky beyond a certain level of complexity." All this is, under the hood, based on Signals, used e.g. by Knockout since 2010, and more recently by Solid. Reportedly, "Signals unlock fine-grained reactivity, meaning that (for example) changes to a value inside a large list needn't invalidate all the other members of the list."
Reading the list of syntax/features that becomes obsolete with runes, it sounds as a great simplification. And once again shows that explicit is better than unreliably magical.
There is also a 12 min intro video, if you prefer.

👓 OrbStack 1.0: Fast, light, easy way to run Docker containers and Linux [productivity, devops, tool]
OrbStack 1 is here! This is great, although paid for commercial use ($8/month), replacement for Docker Desktop. We run our dev stack (some 8-10 containers) via Docker, and it used to eat quite a bug chunk of my CPU regularly, under Docker Desktop. Since I switched to OrbStack few months back, I had not noticed any such issues. It brands itself as "the fast, light, and easy way to run Docker containers and Linux. Develop at lightspeed with our Docker Desktop alternative." You can read more on its what/ why page. And it can also run Kubernetes, though I haven't tried that.

👓 Pagefind | Pagefind — Static low-bandwidth search at scale [tool, authoring]
Pagefind is a single binary you run on your HTML files to produce an index and add a JS + CSS file, which enable you to search the content. I have used it on my blog's search to replace Custom Google search, with its ads. Pagefind is trivial to use - you can just npx -y pagefind --site and add their built-in search UI to any page with a few lines of code. Or build your own UI with the JS API. Remember, all this is static, and the search runs fully in the browser. Pagefind is reportedly efficient and can search even large sites, downloading chunks of the index as needed. You can even influence the indexing with few data- attributes, or they full control with their JS indexing API. You can check out the few changes I had to do to adopt Pagefind. Or go to my search and type e.g. "fulcro". (And of course it is written in Rust! It indexed my 575 pages / 25303 words in ± 5s.)

👓 Roadmap to Tauri 2.0 [library, GUI, Rust, javascript]
Tauri is the attractive, secure-by-default and far more efficient alternative to Electron for building cross-OS desktop apps. The key difference is that Tauri leverages OS's native WebView instead of bundling Chromium and Node, and its focus on security. You might want to check out this Electron vs Tauri comparison. The upcoming v2 brings support for creating also mobile apps, much more powerful plugin system (dogfooding FTW!), and support for Swift and Kotlin plugins.

👓 The State of Async Rust: Runtimes [rust, criticism]
An insightful article about the state of async in Rust, warts and all. Key point: Async is hard, especially multi-threaded, only use it if truly necessary. Often, multithreading without async has much better cost/benefit ratio. My highlights:
Tokio is the absolutely most used async runtime. Async-std, the would-be async replacement for stdio, is effectively abandoned. But Tokio is much more than a runtime, with its extra modules for fs, io, net, etc. That makes it more of a framework for asynchronous programming than just a runtime. The author's main concern with Tokio is that it makes a lot of assumptions about how async code should be written and where it runs. Quote:
The Original Sin of Rust async programming is making it multi-threaded by default. If premature optimization is the root of all evil, this is the mother of all premature optimizations, and it curses all your code with the unholy Send + 'static, or worse yet Send + Sync + 'static, which just kills all the joy of actually writing Rust. — Maciej Hirsz
Multi-threaded also means you need Arc / Mute for most state. The choice to use Arc or Mutex might be indicative of a design that hasn't fully embraced the ownership and borrowing principles that Rust emphasizes. It's worth reconsidering if the shared state is genuinely necessary or if there's an alternative design that could minimize or eliminate the need for shared mutable state.
Going beyond Tokio, several other runtimes deserve more attention. These runtimes are important, as they explore alternative paths or open up new use cases for async Rust:

  • smol: A small async runtime, which is easy to understand.
  • embassy: An async runtime for embedded systems.
  • glommio: An async runtime for I/O-bound workloads, built on top of io_uring and using a thread-per-core model.
Modern operating systems come with highly optimized schedulers that are excellent at multitasking and support async I/O through io_uring and splice. We should make better use of these capabilities.
Async Rust might be more memory-efficient than threads, at the cost of complexity and worse ergonomics. (In a recent benchmark, async Rust was 2x faster than threads, but the absolute difference was only 10ms per request.)
Thread-based frameworks, like the now-inactive iron, showcased the capability to effortlessly handle tens of thousands of requests per second. This is further complemented by the fact modern Linux systems can manage tens of thousands of threads.

👓 Choosing a more optimal `String` type [rust, performance, experience]
Intriguing post from a Sentry Rust SDK maintainer, exploring alternatives to String, that have performance characteristics better suited to their use case, i.e.: immutable, copied often, small, Option-al, etc. There is a bunch of alternatives to choose from, such as Arc and smol_str. The author concludes that there is no free lunch, but smol_str with its O(1) clone, small string optimization, wrapping in Option adding nothing to its size, and its heavy use in rust-analyzer seems to be the best candidate for them.
A liked another observation there:

[..] but all other protocol types have way too detailed typing, and are not extensible on the other hand. In a ton of situations we might be better served with just having the option to manually add arbitrary JSON properties.

I.e. types are more of an obstacle than help, and an open, extensible system is preferable when user-controlled data flows through your system (which doesn't care about their details). Something Clojure has been advocating for on a more general level.

🎥 A good 30 min overview of key developments in Java since v7, starting with syntax improvements (try with resources, var with...
A good 30 min overview of key developments in Java since v7 (slides), starting with syntax improvements (try with resources, var with derived type, records, sealed interfaces [i.e. only fixed number impls], switch as an expression - possibly on records, etc.) and concluding with virtual threads, problems with go-like concurrency (which also applies to core.async) and avoiding them with "structured concurrency".

Handle conflicting files when uberjar-ing in Leiningen

Context: Groovy libraries may include /META-INF/groovy/org.codehaus.groovy.runtime.ExtensionModule to register some extensions to install into JDK. If you uberjar multiple such libraries, you will end up with just a single one of these (last wins). In our case, this is not acceptable and we want to filter out one we don't care about, and fail if there are still multiple. We use Leiningen, and (mis)use its :uberjar-merge-with for this. It takes a map of file pattern -> [fn1: stream->X, fn2: X+X->X, fn3: stream, X -> void].

  (fn [out contents]
    (if (> (count contents) 1)
      (throw (ex-info "More than 1 groovy extensions in uberjar! (see :uberjar-merge-with in project.clj)"
                      {:cnt (count contents)
                       :modules (mapv #(first (clojure.string/split-lines %)) contents)}))
      (spit out (first contents))))]}

I hope to never need this again.

👓 Mock Service Worker [library, testing, javascript]
We use this at work in frontend tests, to mock the REST backend. The nice thing is that it intercepts requests on the network level, so it doesn't care about which library or API you use to issue http requests.

👓 anteoas/broch: A library for handling numbers with units. [clojure, library]
A library for handling numbers with units - conversion between compatible units, comparison and arithmetic, data literals. Ex: (b/> #broch/quantity[1 "km"] (b/meters 999)) ;=> true

👓 Understanding htmx [webdev, opinion, productivity, architecture]
Intro to HTMX, why Biff uses it, when it is not suitable (lot of complex state as in Google Sheets or superfast response, as in G. Maps).
Thomas Heller in his critique of HTMX argues (I believe) that you can relatively easily roll your own htmx, thus gaining full control and making sure it will 100% fit your unique needs, while with HTMX you will run into the design walls, since it is made for general use, without knowledge of your needs. Jacob and Thomas have a discussion about it here, where Thomas asserts that the flexibility of your own solution is worth the trade-offs (mentioning "HTMX itself seems a bit too limited for most stuff I do, which often involves some more interactivity than HTMX is capable of."). Jacob reasonably proposes that htmx is superior for people who are still getting up to speed with Clojure web dev and aren't already familiar with CLJS / JS. Though Thomas objects that learning htmx + likely hyperscript is nontrivial, and it is better to spend the time to learn cljs + DOM properly. In Jacob's own apps, he did not run into htmx's limitations (and uses Hyperscript to get more interactivity where needed.) You might also want to check out htmx's own When Should You [not] Use Hypermedia? (meaning htmx).

👓 Intro to Running LLMs Locally [learning, llm, ai]
What, how, and why of running LLMs (Large Language Models - think ChatGPT & friends) locally, from Clojure. Reportedly, many models are available to download and run locally even with modest hardware. Conclusion: LLMs really only have one basic operation (~ given a seq of tokens, predict probabilities of tokens coming next) which makes them easy to learn and easy to use. Having direct access to LLMs provides flexibility in cost, capability, and usage.
I only skimmed the article, it seems as something useful to have in hand for when I need it.


Thank you for reading!

Tags: newsletter

Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob