How generating test data saved us from angry customers and legal trouble

Me: Simple! First we sum all the raw charges. Then we sum the totals over all categories - the two should be equal.
Computer: … Nope!
Me: What?! Let me see… Are you telling me that -0.01 + -0.05 is different from -0.06?!
Computer: Yep!

That’s how I learned (again) to never use doubles for monetary amounts. I would never have thought of it myself (though I should have), hadn’t we used generative testing to produce random test data (popular troublemakers included) - data I wouldn’t have thought of, such as 0.01 + 0.05 that cannot be represented precisely with a double. Now that we switched safely over to BigDecimals and angry customers and law suits are off the table, you might wonder what is this generative testing about and how does it work.

Instead of hardcoding inputs and expected outputs, as in the traditional “example-based” testing, inputs are randomly generated and outputs are checked against rules (“properties”) that you define, such as “the output of (sort list) should have the same elements and length as list; also, each element is >= its predecessor”. And you can generate as many inputs as you want, for instance 100 is a popular choice.


Continue reading →

Highlights from the talk '`Exploring four hidden superpowers of Datomic`'

During our regular “tech lunch,” we have got our brains blown by the talk Lucas Cavalcanti & Edward Wible - Exploring four hidden superpowers of Datomic (slides) that summarizes the key benefits a startup bank in Brazil got from using this revolutionary database as the core of their technical backbone. We would like to share our highlights from the talk and a few other interesting resources.


Continue reading →

Translating an enterprise Spring webapp to Clojure

How to translate the concepts and patterns we know from enterprise Java applications to Clojure? It is not just a different syntax, the whole philosophy of the language is different. The thing is, many concepts and patterns do not translate - you just do things differently. We will look shortly at how we can solve common enterprise concerns in Clojure, compared to Java.

This post is intended for an experienced Java developer curious about how his object-oriented, enterprise know-how would translate into the world of functional programming.

If you are short on time then just scan the Summary table and read Basic principles perhaps together with Clojure primer to make sense of it.


Continue reading →

Clojure vs Java: Troubleshooting an application in production

I have just gone through the painful experience of troubleshooting a remote Java webapp in a production-like environment and longed for Clojure’s explore-and-edit-running-app REPL. I want to demonstrate and contrast the tools the two languages offer for this case.


Continue reading →

Clojure vs Java: The benefit of Few Data Structures, Many Functions over Many Unique Classes

In Clojure we use again and again the same data structures and have many functions operating on them. Java programmers, on the other hand, create a unique class for every grouping of data, with its own “API” (getters, setters, return types, …) for accessing and manipulating the data. Having been forced to translate between two such “class APIs” I want to share my experience and thus demonstrate in practical terms the truth in the maxim

It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.
- Alan Perils in Epigrams on Programming (1982)


Continue reading →

Solution design in Java/OOP vs. Clojure/FP - I/O anywhere or at the boundaries? - experience

As a Clojure developer thrown into an “enterprise” Java/Spring/Groovy application, I have a unique opportunity to experience and think about the differences between functional (FP) and object-oriented programming (OOP) and approach to design. Today I want to compare how the solution would differ for a small subsystem responsible for checking for and progressing the process of fixing data discrepancies. The main question we will explore will be where do we deal with external effects, i.e. I/O.

(We are going to explore here an application of the Functional Core, Imperative Shell architecture. You can learn more about it in the Related resources section.)


Continue reading →

It will only take one hour… (On why programmers suck at estimating and the perils of software development)

“It will only take about an hour,” I said to her. Two days later, a pull request awaits review. Where has all that time gone? What are the sources of delay in software development and how can we make it faster?


Continue reading →

Java/Spring App Troubleshooting on Steroids with Clojure REPL

(Published originally at the Telia Engineering blog.)

We have a Java/Groovy Spring Boot webapp, mainly running a bunch of batch jobs fetching, transforming and combining data. It is challenging to troubleshoot production issues because some production APIs are only accessible from the production servers and it is difficult and possibly dangerous to run the application in full production setup locally. Fortunately, we can now connect a REPL to the running application, get hold of its Spring beans, and interact with it (invoking remote calls, checking the returned data, …), which is a real life-saver and something I want to demonstrate and describe here.

Aside: What is REPL? A REPL - or Read-Eval-Print-Loop - is an “interactive terminal” into your live application, where you can inspect data, call functions, and (re)define code - in the context of the running application. It enables interactive development (and troubleshooting). The REPL puts all the power of our development tools and programming language at our fingertips and the immediate feedback we get from it enables us to iterate quickly toward the solution or answer we are looking for.

First an example. Imagine that the logs tell you that a job failed to fetch /subscribers information for 30 of our business customers (we are a mobile operator, among other things) due to 404 Not Found. What do you do? Simple; first, get into the REPL:

$ kubectl-shell-into-jobs-app.sh # runs kubectl exec ...
/var/app# ./repl-in.sh # runs env LOADER_MAIN=nrepl.main .. java -cp jobs-app.jar  ..

then:

user=> (def orgs ["12345678" ...]) ;; paste from logs
user=> (map #(-> (.getAgreementStatus (bean "CustomersServiceImpl") %)
                 deref
                 jbean
                 (select-keys [:status :name])) orgs)
({:status "ACTIVE_AGREEMENT", :name "TROLLBYGG HOLDING AS"} ...
user=> (def results *1)
user=> (filter #(not= "ACTIVE_AGREEMENT" (:status %)) results)
() ;; none => all are active

We have just used Clojure and a few helper functions (bean to find Spring Beans, jbean = clojure.core/bean) to fetch the status of each of the troublesome organizations and verified that all are active customers. Now we have enough information to talk to the Customer Service developers and ask them for help.

Other examples where the REPL in the live app proved incredibly useful were:

  • Find out the actual REST service URL used in production (fetched from a configuration service with complex rules)

  • Check whether a static property contains the version + git sha that we tried to get into it

  • Retry a failed call after infrastructure fixes

Soon I will answer the questions certainly swirling in your head:

  1. How do I start Clojure REPL from my Java application?

  2. How do I expose Spring Beans to it?

  3. How do I connect to it?

  4. What helper functions do I need/want?

Implementation

We create a standard Spring bean with an @Autowired ApplicationContext and start the REPL from a method annotated with @PostConstruct so that it is only run after the app context has been made available.

The challenge here was to expose the context to Clojure, which we do via (intern 'user '<var name> <var value>) (we cannot def it from the outside). (Alternatively, we could make our Clojure setup code expose a function and invoke that function, passing the context as a parameter.)

The REPL server itself is started from within the clojure-repl-init.clj (see below) loaded within start. We could just .invoke the start-server function from the Groovy code but since we have other Clojure code to load, it is simpler to just do it there as well.

Here is the relevant (Groovy) code:

@Component
class ClojureReplServer {

    // public so that we can access it from Clojure
    public final static int port = 55555

    @Autowired
    private ApplicationContext ctx

    private IFn symbol = Clojure.var("clojure.core", "symbol")
    private IFn intern = Clojure.var("clojure.core", "intern")
    private IFn stopReplServer = Clojure.var("user", "stop-repl-server")
    private Object symUser = symbol.invoke("user")

    @PostConstruct
    start() {
        String res = ""
        res += intern.invoke(symUser, symbol.invoke("_injected-spring-ctx"), ctx)
        res += intern.invoke(symUser, symbol.invoke("_injected-port"), port)
        res += intern.invoke(symUser, symbol.invoke("_injected-ClojureReplServer"), this)

        // Run the init code and start the server
        IFn loadString = Clojure.var("clojure.core", "load-reader")
        def reader = new InputStreamReader(getClass().getClassLoader().getResourceAsStream("clojure-repl-init.clj"))
        res += "|" + loadString.invoke(reader)
        println("ClojureReplServer started at port ${port}, res=" + res)
    }

    @PreDestroy
    stop() {
        // Here we use `stop-repl-server` defined in clojure-repl-init.clj:
        stopReplServer.invoke()
        Agent.soloExecutor.shutdown()
    }

    /** Called from Clojure when we screw up and need to reset vars etc */
    void reset() {
        // Calling start is enough; the server will not be started again thanks to `defonce`
        start()
    }
}

Gotchas: You might have noticed I use clojure.core/load-reader to load the Clojure setup code. I originally used load-string but it evaluates the code within the clojure.core namespace and you cannot change that, while you want to be able to def[n] stuff in the REPL user’s user namespace.

Here is the Clojure setup code, with helper functions and initialization of the REPL server:

(in-ns 'user)
(require
    '[clojure.reflect :refer [reflect]]
    '[clojure.repl :refer [doc]]
    '[clojure.string :as s]
    '[nrepl.server :as n]))

;; Provide an alias since we are going to use `bean` for Spring:
;; (we need the value to be fn? and to have the bean's docstring; I don't know of a better way:)
(intern 'user (with-meta 'jb    (meta #'clojure.core/bean)) clojure.core/bean)
(intern 'user (with-meta 'jbean (meta #'clojure.core/bean)) clojure.core/bean)

;; Helper functions
(defn help
    "List our helper functions (and vars)"
    []
    (println "Helper functions available in the user namespace:")
    (->> (vals (ns-publics 'user))
        (filter #(fn? (deref %)))
        (map #(let [{:keys [name doc]} (meta %)]
                (str "* " name (if doc (str " :- " doc) ""))))
        (sort)
        (s/join "\n")
        (println))
    (println "\nYou can also use `(doc a-fn)` and `(reflect an-object)`.")
    (println "Remember that *1 holds the result of the last call and *e the last error."))

(defn list-beans
    "List all Spring Beans; ex: `(list-beans)`"
    []
    (seq (.getBeanDefinitionNames user/_injected-spring-ctx)))

(defn find-bean
    "List all Spring bean names containing the given substring (case-insensitive)"
    [substring]
    (filter
        #(re-matches
            (re-pattern
                (str "(?i).*" substring ".*")) %)
        (list-beans)))

(defn bean
    "Get Spring Bean by a name (from (list-beans)); ex: `(bean \"configService\")`"
    [name]
    (.getBean user/_injected-spring-ctx name))

(defn members
    "Show public methods, fields of a bean; ex: `(members aBean)`"
    [bean]
    (->> bean clojure.reflect/reflect :members (filter (comp :public :flags)) (map :name)))

(defonce server (n/start-server :port user/_injected-port))

(defn stop-repl-server
    "Called from ClojureReplServer upon exit; don't use directly"
    []
    (n/stop-server server))

(defn reset
    "Reset the pre-defined functions and vars in the case that you messed up with them. Does not remove vars you made (we'd need clojure.tools/refresh for that)."
    []
    (.reset user/_injected-ClojureReplServer))

And the Gradle dependencies:

// build.gradle:
compile "org.clojure:clojure:1.10.0"
compile "nrepl:nrepl:0.5.0"

Comments

I use nREPL but I guess it is an overkill and I could just as well use the built-in Clojure Socket REPL.

Connecting to the REPL

If we used the Clojure Socket REPL or nREPL with the tty transport, we could simply use telnet localhost 55555 (or nc). We could even install the unravel REPL client for a rich user experience.

But since we run nREPL with its default transport, we need to use the nREPL Client. It is a little tricky but possibly to invoke its function from the Spring Boot application jar:

env LOADER_MAIN=nrepl.main LOADER_ARGS="--connect --host 127.0.0.1 --port 55555" \
  rlwrap java -cp myapp.jar org.springframework.boot.loader.PropertiesLauncher

(rlwrap is optional and can be omitted from the command line; however it makes editing in the REPL much nicer.)

Aside: Security

It might seem scary to enable REPL access to a production application. Whether it is something for you or not depends on multiple levels - the trust and skill level in your team and the domain you work with.

If you are security-conscious, you can mandate and enforce that any live coding is done by a pair of developers (our experience is that having two pairs of eyes also really helps to get things right) and you can log and review REPL sessions. (I don’t need to mention that you secure access to the REPL port by all means, do I?)

If you are afraid that a change/fix will be executed in production but not the version-controlled source code, you can automatically restart or even re-deploy the application after a finished REPL session.

Conclusion

Having a REPL to the live prod application has been invaluable for troubleshooting. Truth be told, it doesn’t really matter whether it is Clojure, Groovy or perhaps CRaSH. Anything that allows us to invoke Java methods and process and display data would do. (Though I have made a good use of Clojure’s map/filter/deref/… and the fact that it has a remote REPL built in gives it a head start.)

Despite its usefulness, a Clojure REPL in a Java app is a far cry from the interactive development and troubleshooting afforded by a Clojure app. I can only invoke existing methods, I cannot change them, for example to check whether adding a particular query parameter to a REST URL would fix a problem (it would) or to inject more logging.


Continue reading →

How to use Clojure 1.10 pREPL to connect to a remote server (WIP)

Clojure 1.10 includes a new, program-friendly REPL or prepl (pronounced as “preppy,” not p-repl). However there is still very little documentation about it, though it is reportedly in making (it is alpha, after all). Here I want to demonstrate how to start it and how to connect to it in a primitive way (I hope to improve the user experience of the client eventually).

Update 22/3: Check out O. Caldwell’s Clojure socket prepl cookbook.


Continue reading →

AWS RDS: How to find out login credentials to the database

To log in to your AWS RDS database (Oracle in my case) you need login credentials, but what are these for a newly created DB? The password is the master user password you entered during DB creation and which you can change via the Console.

To find out the master user name:


Continue reading →

Migrating from Wordpress.com to a static site generated by GatsbyJS

I am moving my blog over from Wordpress.com to a statically generated site using Gatsby. Wordpress has served me well in many years but it isn’t really fit for writing (about) code and the latest updates have made it even more difficult for me. With Gatsby I get a quick site and full control over everything (using JavaScript, React, and any of the tons of plugins for Gatsby).

The content from my old blog is coming soon, it is work in progress :-)

If you are interested in the details of the migration, check out the blog code at GitHub, especially blog2md-master/index.js and gatsby-node.js. The core of the process is using blog2md to turn Wordpress export XML into .json files and then loading those into Gatsby and suing custom code in gatsby-node.js to create an “adapter” node type ContentPage for both new Markdown-based posts and the old ones from the .json. In practice it is somewhat more complicated that it sounds - just as always.


Continue reading →

Java: Simulating various connection problems with Toxiproxy

Java: Simulating various connection problems with Toxiproxy

Simulate various connection issues with Toxiproxy and Java's HttpURLConnection to see what kind of errors get produced: connect timed out vs. read timed out vs. connection refused ... .

Results:

System: openjdk 11.0.1 2018-10-16

  1. (.setConnectTimeout 1) => java.net.SocketTimeoutException: connect timed out
  2. (.setReadTimeout 1) => javax.net.ssl.SSLProtocolException: Read timed out on HTTPS, java.net.SocketTimeoutException: Read timed out on HTTP (or Toxiproxy with 5s latency or timeout )
  3. Nothing listening at the port => java.net.ConnectException: Connection refused
  4. Toxiproxy with no upstream configured (i.e. the port is open, but nothing happesn with the connection) => javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake on HTTPS, java.net.SocketTimeoutException: Read timed out on HTTP
  5. limit_data_downstream => java.io.IOException: Premature EOF

(What I haven't been able to simulate (yet) is "connection interrupted/broken", i.e. java.net.SocketException Connection reset (perhaps you closed it and try to write to it anyway?) and java.net.SocketException Connection reset by peer (perhaps when dropped by a firewall/the server/...?).)

The setup

Prerequisities

To /etc/hosts add:

127.0.0.1       proxied.google.com

The toxiproxy setup

Start toxiproxy:

docker pull shopify/toxiproxy
# BEFORE we `run` it: case #3
docker run --rm -p 5555:5555 -p 6666:6666 -p 8474:8474 --name toxiproxy -it shopify/toxiproxy

Configure it (we could just POST to :8474 but using the CLI is easier):

$ docker exec -it toxiproxy /bin/sh
/ # cd /go/bin/
# ./toxiproxy-cli create google -l 0.0.0.0:6666 -u www.google.com:443 # BEFORE this is run: case #4
# ./toxiproxy-cli toxic add google -t latency -a latency=5000 # case #2
Added downstream latency toxic 'latency_downstream' on proxy 'google
# ./toxiproxy-cli toxic remove google -n latency_downstream
Removed toxic 'latency_downstream' on proxy 'google'

# ./toxiproxy-cli toxic add google -t  timeout -a timeout=2000 # case #2
Added downstream timeout toxic 'timeout_downstream' on proxy 'google'
# ./toxiproxy-cli toxic remove google -n timeout_downstream
Removed toxic 'timeout_downstream' on proxy 'google'

# ./toxiproxy-cli toxic add google -t limit_data -a bytes=5000 # case #5
Added downstream limit_data toxic 'limit_data_downstream' on proxy 'google'

The test code

(import '[java.net URL HttpURLConnection])
(->
  (doto ^HttpURLConnection (.openConnection (URL. "https://proxied.google.com:6666/"))
    ;; BEWARE: JVM *must* be started with `-Dsun.net.http.allowRestrictedHeaders=true` to allow setting the Host:
    (.setRequestProperty "Host" "www.google.com")
    (.setConnectTimeout 1000)
    (.setReadTimeout 1000))
  (.getInputStream)
  slurp)

Background

Read my Simulating network timeouts with toxiproxy to learn why we need to bother with /etc/hosts and the Host header.

view raw README.md hosted with ❤ by GitHub

Continue reading →

Clojure - comparison of gnuplot, Incanter, oz/vega-lite for plotting usage data

What is the best way to plot memory and CPU usage data (mainly) in Clojure? I will compare gnuplot, Incanter with JFreeChart, and vega-lite (via Oz). (Spoiler: I like Oz/vega-lite most but still use Incanter to prepare the data.)

The data looks like this:

;; sec.ns | memory | CPU %
1541052937.882172509 59m 0.0
1541052981.122419892 78m 58.0
1541052981.625876498 199m 85.9
1541053011.489811184 1.2g 101.8


The data has been produced by monitor-usage.sh.

The tools



Gnuplot 5



Gnuplot is the simplest, with a lot available out of the box. But it is also somewhat archaic and little flexible.


Continue reading →

How I got fired and learned the importance of communication and play time

When I came to the office one late autumn morning in 2005, I have been shocked to find out that - without any warning signs whatsoever - I hd been fired. That day I have learned the importance of communication. Their criticism was justified but the thing is, nobody bothered to tell me anything during my 11 months in the company. I received exactly 0 feedback about my behaviour or work. The company ended up at court with its client - which both explains why they were stressed and was also caused by bad communication. So communication - even, or especially under stress - is really important. It must be open, transparent, and broad.



The funny thing is that I still do the things they fired me for.




Continue reading →

How good monitoring saved our ass ... again

You know how it goes - suddenly people complain your app does not work, your are getting plenty of timeouts or other errors in your error tracking tool, you find the backend app that is misbehaving and finally "fix" the problem by restarting it. Phew!

But why? What caused the downtime? A glitch an an upstream system? Sudden overload due to a spike in concurrent users? Trolls?

You know that it helps sometimes to zoom out, to get the right perspective. Here the perspective was 7 days:



It was enough to look at this chart with the right zoom to see at once that something happened on October 23rd that caused a significant change in the behavior of the application. Quick search and indeed, the change in CPU usage corresponds with a deployment. A quick revert to the previous version shortly confirmed the culprit. (It would have been even easier if we showed deployments on these charts.)

This is not the first time good monitoring saved us. A while ago we struggled regularly with the application becoming sluggish and had to restart it regularly. A graph of the Node.js even loop lag showed it increasing over time. Once it was on the same dashboard as Node's heap usage, we could at once see that it correlated with increasing memory usage - indicating a memory leak. Few hours of experimenting and heap dump analysis later the problem was fixed.

So good monitoring is paramount.

Of course the trick is to know what to monitor and to display all relevant metrics in such a way that you can spot important relations. I am still working on improving that...
Continue reading →

Copyright © 2020 Jakub Holý
Powered by Cryogen
Theme by KingMob