Holy on Dev
Simulating network timeouts with toxiproxy
Goal: Simulate how a Node.js application reacts to timeouts.
Solution: Use toxiproxy and its
timeout"toxic" with the value of 0, i.e.
the connection won't close, and data will be delayed until the toxic is removed.
1. Start toxiproxy, exposing the port
6666that we intend to use as
docker pull shopify/toxiproxy docker run --name=toxiproxy --rm --expose 6666 -p 6666:6666 -it shopify/toxiproxy
(If I was on Linux and not OSX then I could use
--net=hostand wouldn't need to expose and/or map the port.)
2. Tell toxiproxy to serve request att
6666via an upstream service:
docker exec -it toxiproxy /bin/sh / # cd /go/bin/ /go/bin # ./toxiproxy-cli create upstream -l 0.0.0.0:6666 -u google.com:443
3. Modify your code to access the local port
6666and test that everything works.
Since we want to access Google via HTTPS, we would get a certificate error when accessing it via
localhost:6666(e.g. "SSLHandshakeException: PKIX path building failed: [..] unable to find valid certification path to requested target" in Java or (much better) "(51) SSL: no alternative certificate subject name matches target host name 'localhost'" in curl) so we will add an alias to our local s
https://proxied.google.com:6666 in our connecting code (instead of the
https://google.com:443we had there before). Verify that it works and the code gets a response as expected.
Note: google.com is likely a bad choice here since it will return 404 as you must specify the header "Host: www.google.com" to get 200 OK back; without it you will get 404.
4. Tell toxiproxy to have an infinite timeout for this service
Continuing our toxiproxy configuration from step 2:
./toxiproxy-cli toxic add -t timeout -a timeout=0 upstream
(Alternatively, e.g. timeout=100; then the connection will be closed after 100 ms.)
5. Trigger your code again. You should get a timeout now.
Tip: You can simulate the service being down via disabling the proxy:
./toxiproxy-cli toggle upstream
Aside: Challenges when proxying through Toxiproxy
The host header
Servers (e.g. google.com, example.com) don't like it when the Host header (derived normally from the URL) differs from what they expect. So you either need to make it possible to access localhost:<toxiproxy port> via the upstream server's hostname by adding it as an alias to /etc/hosts (but how do you then access the actual service?) or you need to override the host header. In curl that is easy with
-H "Host: www.google.com"but not so in Java.
In Java (openjdk 11.0.1 2018-10-16) you need to pass
-Dsun.net.http.allowRestrictedHeaders=trueto the JVM at startup to enable overriding the Host header (Oracle JVM might allow to do that at runtime) and then:
(doto ^HttpURLConnection (.openConnection (URL. "https://proxied.google.com:6666/"))
(.setRequestProperty "Host" "www.google.com")
SSL certificate issues
As described above, when talking to HTTPS via Toxiproxy, you need to ensure that the hostname you use in your request is covered by the server's certificate, otherwise you will get SSL errors. To apply the solution described here, i.e. adding e.g. proxied.<server name, e.g. google.com> to your /etc/hosts works, provided the certificate is valid also for subdomains, i.e. is issued for <server> and *.<server>, which is not always the case.
Alternatively, you can disable certificate validation - trivial in curl with
-kbut much more typing in Java.
Demonstration: Applying the Parallel Change technique to change code in small, safe steps
The Parallel Change technique is intended to make it possible to change code in a small, save steps by first adding the new way of doing things (without breaking the old one; "expand"), then switching over to the new way ("migrate"), and finally removing the old way ("contract", i.e. make smaller). Here is an example of it applied in practice to refactor code producing a large JSON that contains a dictionary of addresses at one place and refers to them by their keys at other places. The goal is to rename the key. (We can't use simple search & replace for reasons.)
It Is OK to Require Your Team-mates to Have Particular Domain/Technical Knowledge
Should we write stupid code that is easy to understand for newcomers? It seems as a good thing to do. But it is the wrong thing to optimise for because it is a rare case. Most of the time you will be working with people experienced in the code base. And if there is a new member, you should not just throw her into the water and expect her to learn and understand everything on her own. It is better to optimise for the common case, i.e. people that are up to speed. It is thus OK to expect and require that the developers have certain domain and technical knowledge. And spend resources to ensure that is the case with new members. Simply put, you should not dumb down your code to match the common knowledge but elevate new team mates to the baseline that you defined for your product (based on your domain, the expected level of experience and dedication etc.).
Don''t add unnecessary checks to your code, pretty please!
Defensive programming suggests that we should add various checks to our code to ensure the presence and proper shape and type of data. But there is one important rule - only add a check if you know that thing can really happen. Don't add random checks just to be sure - because you are misleading the next developer.
2015 in review
The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog.
A Costly Failure to Design for Performance and Robustness
I have learned that it is costly to not prioritise expressing one's design concerns and ideas early. As a result, we have a shopping cart that is noticeably slow, goes down whenever the backend experiences problems, and is a potential performance bottleneck. Let's have a look at the problem, the actual and my ideal designs, and their pros and cons.
We have added shopping cart functionality to our web shop, using a backend service to provide most of the functionality and to hold the state. The design focus was on simplicity - the front-end is stateless, any change to the cart is sent to the backend and the current content of the cart is always fetched anew from it to avoid the complexity of maintaining and syncing state at two places. Even though the backend wasn't design for the actual front-end needs, we work around it. The front-end doesn't need to do much work and it is thus a success in this regard.
Why we practice fronted-first design (instead of API-first)
Cross-posted from the TeliaSonera tech blog
Troubleshooting And Improving HTTPS/TLS Connection Performance
Our team has struggled with slow calls to the back-end, resulting in unpleasant, user-perceivable delays. While a direct (HTTP) call to a backend REST service took around 50ms, our median time was around 300ms (while using HTTPS and a proxy between us and the service).
Moving Too Fast For UX? Genuine Needs, Wrong Solutions
Cross-posted from the TeliaSonera tech blog
Our UX designer and interaction specialist - a wonderful guy - has shocked us today by telling us that we (the developers) are moving too fast. He needs more time to do proper user experience and interface design – talk to real users, collect feedback, design based on data, not just hypotheses and gut feeling. To do this, he needs us to slow down.
We see a common human "mistake" here: where the expression of a genuine need gets mixed in with a suggestion for satisfying it. We are happy to learn about the need and will do our best to satisfy it (after all, we want everybody to be happy, and we too love evidence-based design) but we want to challenge the proposed solution. There is never just one way to satisfy a need – and the first proposed solution is rarely the best one (not mentioning that this particular one goes against the needs of us, the developers).
Upgrade or not to upgrade dependencies? The eternal dilemma
Failed attempt one: Let tools do it
Originally we let
npmautomatically do minor upgrades but that turned out to be problematic as even minor version changes can introduce bugs and having potentially different (minor) versions on our different machines and in production makes troubleshooting difficult.
Storytelling as a Vehicle of Change: Introducing ClojureScript for the Heart and Mind
People don't really like changes yet change we must in this fast-developing world. How to introduce a change, or rather how to inspire people to embrace a change? That is one of the main questions of my professional life.
- An experienced speaker once recommended sharing personal experiences (even - or especially - if they make me vulnerable) as it is much easier for people to relate to them than to general statements.
- A Cognicast eposide mentioned storytelling as a great tool for introductory guides. We humans are natural storytellers, we think in stories and relate to them much more easily - so a story should be great also to communicate the value of a change.
- My ex-colleague Therese Ingebrigtsen gave an inspiring talk presenting some points from The Switch - mainly that we need to address the recipient's minds with rational arguments, but also their hearts to involve their emotion (e.g. by drawing a picture of the new bright future), and that it is important to show a clear path forward.
An answer to CircleCI''s "Why we’re no longer using Core.typed"
CircleCI has recently published a very useful post "Why we’re no longer using Core.typed" that raises some important concerns w.r.t. Typed Clojure that in their particular case led to the cost overweighting the benefits. CircleCI has a long and positive relation to Ambrose Bonnaire-Sergeant, the main author of core.typed, that has addressed their concerns in his recent Strange Loop talk "Typed Clojure: From Optional to Gradual Typing" (gradual typing is also explained in his 6/2015 blog post "Gradual typing for Clojure"). For the sake of searchability and those of us who prefer text to video, I would like to summarise the main points from the response (spiced with some thoughts of my own).
Refactoring & Type Errors in Clojure: Experience and Prevention
While refactoring a relatively simple Clojure code to use a map instead of a vector, I have wasted perhaps a few hours due to essentially type errors. I want to share the experience and my thoughts about possible solutions since I encounter this problem quite often. I should mention that it is quite likely that it is more a problem (an opportunity? :-)) with me rather than the language, namely with the way I write and (not) test it.
The core of the problem is that I write chains of transformations based on my sometimes flawed idea of what data I have at each stage. The challenge is that I cannot see what the data is and have to maintain a mental model while writing the code, and I suck at it. Evaluating the code in the REPL as I develop it helps somewhat but only when writing it - not when I decide to refactor it.
Nginx: Protecting upstream from overload on cache miss
These 2 magical lines will protect your upstream server from possible overload of many users try to access the same in cached or expired content:
Shipping a Refactoring & Feature One Tiny Slice at a Time, to Reduce Risk
You don’t need to finish a feature and your users don’t need to see it to be able to release and start battle-testing it. Slice it as much as possible and release the chunks ASAP to shorten the feedback loop and decrease risk.
My colleagues have been working on a crucial change in our webshop - replacing our legacy shopping cart and checkout process with a new one and implementing some new, highly desired functionality that this change enables. We have decided to decrease the risk of the change by doing it first only for product accessories. However the business wanted the new feature included and that required changes to the UI. But the UI has to be consistent across all sections so we would need to implement it also for the main products before going live - which would necessitate implementing also the more complex process used by the main products (and not yet supported by the new backend). And suddenly we had a a load of work that would take weeks to complete and would be released in a big bang deployment.
Such a large-scale and time-consuming change without any feedback from reality whatsoever and then releasing it all at once, having impact on all our sales - I find that really scary (and have fought it before). It is essentially weeks of building risk and then releasing it in a big kaboom. How could we break it down, to release it in small slices, without making the business people unhappy?
- Previous (1)Next (23)