Holy Dev Newsletter December 2023
Welcome to the Holy Dev newsletter, which brings you gems I found on the web, updates from my blog, and a few scattered thoughts. (Also available online.) I am always eager to read your comments and ideas so do not hesitate to press the reply button!
I have published three blog posts, about Postgres and Rama:
PostgreSQL & JDBC: How to select rows matching pairs of values, which solves a problem I have long pondered: Given a sequence of tuples of property values in app code, how do you efficiently select all records in your DB that match, without constructing a monstrous SQL like
(prop1=? AND prop2=? AND …) OR …? I need this e.g. when searching entity history for entities with particular IDs and versions. The solution is unnest with multiple arguments - see the post for details and a number of other goodies.
Exploring Rama, the platform for writing backends 100x more efficiently - I have finally finished the post I begun 4 months ago, when I started studying Rama. The pitch: RedPlanetLabs’s Rama is an integrated platform for writing and operating robust, distributed, and scalable backends 100x more efficiently. With it comes a new paradigm - dataflow oriented programming. So what is it all about? In this post I aim to give you a rough idea of what Rama is, what it offers, and why you absolutely should be interested in it.
Hands on Rama, day 1: Setup, idempotent create & update - following up on the theoretical study that resulted in the previous blog post, here I write about my experiences getting my hands dirty with Rama. I hope to get some more time soon to continue with this exciting journey.
My main focus in the past and coming weeks is learning Rama and building Wolframite.
Regarding Wolframite, I have long been excited about Stephen Wolfram’s incredibly powerful programming language with superpowers of the same name, which aims at making the world "computable," with a huge built-in knowledge base, the ability to combine all its entities (e.g. zoom and repeat an image, rendering it in an interactive UI), and high-level capabilities such as self-optimizing machine learning algorithms. Wolframite brings this power to Clojure, making it possible to call Wolfram from Clojure (by sending computations to a Wolfram Kernel). This library has existed for a long time under the name Clojuratica, then was taken over by Pawel Ceranka and radically refactored. However, Pawel has been sucked in by his startup before he managed to wrap up this work. Somehow, I have ended up committing to bring this to a completion, so that Wolframite could be added to the toolbox of our awesome, growing SciCloj community. I have been incredibly lucky in meeting Thomas Clark, a physicist and Clojurist who is proficient with Wolfram/Mathematica and highly motivated to make it awesome. (You can watch his and Daniel’s Conj 2023 talk to learn why.) So far, I have merged in Pawel’s work, improved docstrings, sped up internalization of Wolfram functions into a namespace, and added a bunch of small improvements. The next big thing is updating and rewriting the docs, adding tests, and renaming the namespaces. You can keep an eye on the progress and plans here.
A GPU-powered web app and React library to visualize huge network graph data and machine learning embeddings, to gain insights into complex relationships and patterns that are often challenging to discern. Leverage components such as Timeline, Histogram, Search, and more. Free for non-commercial use.
Not sure how well it works in practice and the cost/benefit ratio, but it sounds very interesting. I'd want such "honeypots" in my infrastructure.
Why is the database testing tool Jepsen written in Clojure? Because of: Great support for concurrency (JVM + immutable data, atoms, etc.), clients for all DBs thanks to JVM, REPL great for interactive explorations, best library for transforming data, very stable language. Cons: smaller community, no broadly accepted static type system.
A wonderful small library by a colleague for generating (and optionally persisting) graphs of test data. Its awesome, brief yet knowledge-packed and highly digestible tutorial will give you a very good idea about everything it can do. In short: create inter-related entities, where some values may be derived from other values or generated by no-arg functions, persist them to a database, and propagate DB-generated values (e.g. IDs) to all effected test data. Easily build your test data stepwise in exactly the shape and quantities you need, mixing randomness and control as you want, overriding defaults where necessary. Check out Why Fabrikk? for better understanding and alternatives.
An interesting idea: log a message + values with Timbre but also add each log entry, with arbitrary key-value entries, to an in-memory graph, which can be queried and explored via RDF triplets and everything IGraph offers.
Transform data structures based on powerful patterns and substitutions. Based upon the core ideas described in the excellent book Software Design for Flexibilty by Hanson and Sussman.
See the Nov 2022 talk for London Clojurians.
An amusing but also sobering tips for sabotaging ones company as any kind of technical or product leader. Are you sure your company is not doing one of these things? (Thanks, Erik!)
A conscise introduction to blockchain, NFTs, and Web3, and an eloquent argument for why they are bad solutions. Written by a programmer, and once expert for the German Bundestag (parliament) on the topic of Blockchains and their value and regulation.
Web3 is an unclear concept, but often includes decentralisation, leveraging blockchains.Explains blockchains (and Proof of Work, Proof of Stake), NFTs, DAOs (“decentralized autonomous organizations”, basically “basically smart contracts with a mission”) . The hard problem of blocchains: get consensus in a decentralised network to solve conflicts. Web3 criticism:
- bad engineering - blockchains neither perform (e.g. 5, 30 transactions / second, compared to Visa’s 2000, with support for 24k) nor scale
- a security disaster - no consumer protection, if you lose a token (or it gets stolen, defrauded), it is lost
- NFTs don’t really do what they claim to do; you own a thing that says you own a thing.
- The Oracle Problem: from within a system you cannot determine the truth of statements about the outside of that system. Thus a blockchain can’t function on its own for anything in the real world (such as ownership of a physical object), you’d need “oracles” you trust for that => there goes the no authority/decentralization approach.
- Climate cost
- Web3 doesn’t deliver - economics and societal pressure will produce centralisation (e.g. there is already only few exchanges for tokens)
A tiny intro into what flame graphs are, then we dig into interpreting a real one, filtering, sorting, slicing, etc. Highly useful, especially if you use clj-async-profiler. We learn why we fight want to use sorting by width or the reversed view, how to remove irrelevant data, collapse recursive calls, and re-merge unnecessarily split code paths (in this case, by lazy-seq), to get a clearer picture.
Learn to compare profiling results in two flamegraphs by using differential flamegraphs (from the two f.g.’s .txt files)
📜 Programming as Theory Building, an essay by Peter Naur
The wonderful 1985 (yes still very relevant) essay Programming as Theory Building argues that a program is a shared mental construct (a “theory”) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it’s lossy, so you can’t reconstruct a program from its code.
I.e. programming is about achieving a certain kind of insight, a theory, of the matters at hand - not about artifacts. It is about building knowledge. Source code + all available documentation can never capture this knowledge fully, as demonstrated by a few experiences from the real world. The programmers’ knowledge should be regarded as a theory, which is something that characterizes an intellectual, knowledge-based activity,: “[..] a person who has or possesses a theory in this sense knows how to do certain things and in addition can support the actual doing with explanations, justifications, and answers to queries [..]”. A key factor is that the person is able to apply the knowledge to novel situations. The theory includes mapping between the program and aspects of the real world, knowing why the program is the way it is, and ability to extend the program to meet new needs based on understanding it and similarities between the new and existing needs. This can’t be reduced to a set of rules. If we adopt the Theory Building View, as opposed to programming as text editing, then we have no ground to expect a program modification to be cheap. Building-in flexibility is not an answer, because it is itself very costly, and relies on the (severely lacking) ability to predict what future modifications will be needed. You need a person with a live insight into the program, to find similarities between existing and the newly requested capability and determine the best way to add it. “For a program to retain its quality it is mandatory that each modification is firmly grounded in the theory of it.” The theory of a program, is something that could not conceivably be expressed, but is inextricably bound to human beings.
Building a theory to fit and support an existing program text is a difficult, frustrating, and time consuming activity. You have a better chance of success if you build a new program, which will not cost more, and likely less.
Programmer education should also focus on building their ability to formulate theories [in this sense].
Delivering acceptable performance is not a technical problem per se — it's a management issue, and one that teams can conquer with the right frame of mind and support. Performance is about reducing latency and variance across interactions in a session, with a particular focus on the tail of the distribution (P75+). Bad performance => lost users and thus revenue, reduced engagement, etc.
Teams progress through a hierarchy of performance management practice phases:
Level 0: fully unaware.
Level 1 (firefighting, starting to track): pick and balance (all) the right metrics => look at industry standard (e.g. Web Vitals) & get advice. Build a strong model of the user [needs] and understanding of your systems to know what matters most.
Level 2: Global Baselines & Metrics - found objective, industry-standard metrics / reference points that correlate with their business success. A sense of shared ownership over performance, performance work framed in terms of business value. Continual reporting against these standard metrics. Do strive to uncover what matters most to you, not to drown in metrics.
Level 3: P75+, Site-specific Baselines & Metrics - realise that global metrics/values can’t fully fit your product’s UX and find what really matters for you. Begin to map key user journeys and track the influence of performance across the full conversion funnel => add custom, relevant metrics. Percentile thinking: median (P50) isn’t the most important, P75, P90, and P95 are. Histograms are key. Teams at Level 3 begin to understand their distributions are nonparametric, and they adopt more appropriate comparisons in response. Enable slicing the data by percentile, geography, device type, etc. Integrate metrics with experimentation frameworks to track the effect of new changes. Build a lab for accurate measurements, correlate lab and production metrics. Management support for consistent performance.
Level 4: Variance Control & Regression Prevention - realise the impact that variance has on UX and start managing the tail latency (i.e. P75+). Automated tests check the performance of PRs and block those that impacts user flows badly <_> must understand which flows and scenarios are worth the effort (cost x benefit of developing and running them). Watch for slow, cumulative performance degradation, watch trends over longer time. Start with latency budgeting and attribute slowness to product features. Start "performance team", or a group of experts whose job it is to run investigations and drive infrastructure to better inform inquiry.
Level 5: Strategic Performance - fully institutionalise performance management and come to understand it as a strategic asset, and it becomes a part of the culture. “Teams that reach top-level performance have management support at the highest level. Those managers assume engineers want to do a good job but have the wrong incentives and constraints, and it isn't the line engineer's job to define success — it's the job of management.”
The article closes with a great list of Questions for Senior Managers, including “Is there a shared understanding in the leadership team that slowness costs money/conversions/engagement/customer-success?”, and about the constraints and support given to teams.
An API that returns the requested number or random people data, with a name, email, address, coordinates, and more. Supports various query params, such as results=1000&seed=123.
“Transforming data programmatically is great, but we don't have to stop there. We can describe also the data models and data transformations as data and write an interpreter or compiler for it” - with Malli (schema def, data coercion) and Meander (a great library for creating transparent data transformations). A neat idea to convert between source and target data formats in a declarative way, with schema validation. Meander does the conversion based on a pattern -> expression you manually write (with any custom inline functions, so it is not purely declarative), while Malli validates and coerces the in/output data.
Thank you for reading!
This newsletter is produced by Jakub Holý, a blogger and programming buddy / mentor for hire. If you don’t want to receive it anymore, simple respond with "unsubscribe" in the subject