Holy on Dev: Performance & Performance Testing for Webapps

! work in progress !

Key Performance Goals

Requirement format: “X should be less than L in P % times when the load is U users”

Request throughput
Latency
Max/avg response time from the end user’s point of view

Performance Metrics

The Utilization Saturation and Errors (USE) Method

The USE Method of performance analysis focuses on getting a complete overview of a system (without forgetting anything) and discovering quickly most of the performance problems. The main tool is a system-dependant checklist of resources and metrics of utilization, saturation (i.e. work that has to wait), and errors for each resource. Read more on its page, which also contains checklist for some systems.

Java EE App Stress Testing (A. Bien)

Goal: Test contention (⇒ dead/live-locks), transaction isolation, caching behavior, consistency, robustness, performance, memory consumption.

Memory, current heap size
Typical / peak # of worker threads
Usual depth of the request queue
# rolled back transactions (f.ex. due to optimistic locking) vs. successful ones
# requests/sec
# & length of major garbage collections
# DB connections
Size of JPA caches
all of these should be stable, i.e. not grow (too much) with time / growing load.

Ex.: JMeter + VisualVM (with the MBean plugin to monitor custom caches and with Visual GC) to observe the behavior live.

Level: Browser

TBD (network latency, rendering time, … - use FireBug’s timing capability or some similar browser plugin)

Level: Server

Resource utilization (from New Relic docs):

Cpu busy [%] - the percentage of the time that the system is using the CPU
Disk busy [%] - the percentage of the time that the system is performing Disk IO
Memory used [%]
Disk space used [%]
Network utilization [Mb/s]
Drilling down into processes - their count, CPU, memory

Layer: Servlet Container / EJB Container

TBD (JVM heap, threads, …)

Servlet Container
- # threads (unless NIO used, i.e. # concurrent requests being processed)
- …
Database
- No. free connections in connection pools
- Avg. time connection is used by a thread (how long away from the pool)
EJB Container
- Bean pool utilization
- …
…

Layer: Application

TBD (# concurrent users, # errors/exceptions, …)

Layer: Database

TBD

Common Performance Issues

Apache

(I don’t remember the resource for this :-( )

No more File Descriptors
- • symptoms: entry in error log, new httpd children fail to start, fork() failing everywhere
- • solution: increase system-wide limits, incr. ulimit via apachectl
Sockets stuck in time_wait
- • sympt.: unable to accept new conn., CPU under-utiliz. & httpd proc. idle, not swapping, netstat shows # sockets in time_wait
- • many t_w are to be expected, only a problem when new conn. failing ⇒ decrease sys-wide TCP/IP FIN timeout
High Mem Usage (swapping)
- • sympt.: (ignore system free mem, misleading): # disk activity, top/free show high swap usage, load gradually increasing, ps shows processes blocking on disk i/o
- • sol.: add mem, …
CPU overload
- • sympt.: top shows little/no cpu idle time, not swapping, high load, much cpu spent in userspace
- • sol.: add cpu
Interrupt (IRQ) overload
- • sympt: (freq. on 8+ cpu machines) not swapping, 1-2 cpu busy rest idle, low total load
- • sol.: add NIC

Doing Performance Testing Correctly

Jetty on Load Testing

The Jetty High Load Howto has some good tips on creating realistic load testers and on configuring the load testing and server machines (TCP buffer sizes, in/outbound connection queue size, # file, # ports, congestion control). F.ex.:

A common mistake is that load generators often open relatively few connections that are kept totally busy sending as many requests as possible over each connection. This causes the measured throughput to be limited by request latency (see Lies Damned Lies and Benchmarks for an analysis of such an issue.
Another common mistake is to use a TCP/IP for a single request and to open many many short lived connections. This will often result in accept queues filling and limitations due to file descriptor and/or port starvation.
A load generator should well model the traffic profile from the normal clients of the server. For browsers, this if mostly between 2 and 6 connections that are mostly idle and that are used in sporadic bursts with read times in between. The connections are mostly long held HTTP/1.1 connections.

It recommends the Cometd Load Tester for a good example of a realistic load generator

Tools

Simple tools

Web servers

ab (Apache Benchmark) - ex: ab -n 10000 -c 250 <page URL> - generate 10k GETs for the URL, issuing 25o in parallel (limited by the number of sockets a process can open on the test machine)
siege - http/https stress tester - available in the software repositories of most Linux distros
wrk - a HTTP benchmarking tool wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
fortio - similar to siege; runs at a specified query per second (qps) and records an histogram of execution time and calculates percentiles (e.g. p99 ie the response time such as 99% of the requests take less than that number (in seconds, SI unit)). It can run for a set duration, for a fixed number of calls, or until interrupted

Databases

MySQL: mysqlslap - simulates a number of clients connecting to the DB and performing a query of your choosing

On-line tools

Blitz.io - load-testing service based on curl-like tool, very easy to use and with nice graphs, can simulate tens of thousands of concurrent users from different location on the Earth
Pingdom - “uptime and performance monitoring made easy” - I haven’t tried this

Apache JMeter

General

Max Number of Concurrent Users

The number of concurrent virtual users JMeter can efficiently simulate depends on the resources of the test machine (memory, thread limits, socket limits, network card speed, …), the complexity of the test, and other load on the system. In general it’s recommended to use 1000 or less threads - of course assuming that you don’t perform any extensive report gathering/rendering and reduce JMeter resource consumption as it would steal resources available for the testing. With bad configuration or machine you can experience problems already with a much lower thread count.

Notice also that if you don’t introduce any “think time” into your test plans than a single JMeter thread can generate much higher load than a human user could and thus a single thread can correspond to e.g. ten humans.

If you need more virtual user then you need multiple JMeter instances (preferably on multiple machines) in the master-slave configuration or as independent instances, compiling the individual reports yourself, if the overhead of master-slave communication is unacceptable for you.

Check this blog for some tips (2009).

From the Gatling stress test tool docs (referring to JMeter 2.5.1):

JMeter creates one thread per user simulated. If there is not enough memory allocated to the JVM, it can crash trying to create these threads. For instance, JMeter could not run 1500 users with 512 MB (what was used for Gatling even with 2000 users); OutOfMemoryErrors are recorded in the table as OOM.
Another problem occurred with the 2000 users simulations; it seems that JMeter can not simulate more than 1514 users independently from the memory that was allocated to the JVM.

Gatling

Gatling is a new (2012?) stress test tool, written in Scala and using Akka. Tests are described by a fluent API in a “text” or richer scala format. It claims high efficiency (2000 users simulated where JMeter couldn’t handle over 1500 and with much lower memory consumption of 512M). So far I haven’t noticed anything about distributed testing (certainly needed for 10s of thousands of users).

Banchmarking

SysBench

“SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.” Last release 2004.

Disk: hdparm -t, bonnie++, iozone

See the blog post Disk IO and throughput benchmarks on Amazon’s EC2 (2009) for examples og use.

DBT-{1-5} - The Database Test Suite

DBT-* is a suite of database tests: DBT-1^TM (Web Server) simulates the activities of web users browsing and buying items, DBT-2^TM is an OLTP transactional performance test, DBT-3^TM is decision support workload (business oriented ad-hoc queries and concurrent data modifications), DBT-4^TM is an application server and Web services workload, DBT-5^TM is an OLTP workload simluating the activities of a brokerage firm.

For ex. Xeround used it to compare its Cloud Database with Amazon RDS (7/2011?).

Web page performance testing

Google PageSpeed tools (online/Chrome extension) - provides recommendations
- Also Mobile-Friendly Test
YSlow - browser plugin
Chrome/FF DevTools - Timing

The Value and Perils of Performance Benchmarks in the Wake of TechEmpower’s Web Framework Benchmark

Resources

Web page performance

Ilya Grigorik: Website Performance Optimization (Udacity course) [1:13:57] - 2014; Critical Rendering Path, optimizing content size, number requests, minimizing blocking elements. How to use Chrome dev tools’ Timing and perf.testing on a mobile.
Ilya Grigorik: High Performance Browser Networking - free online book - Author Ilya Grigorik, a web performance engineer at Google, demonstrates performance optimization best practices for TCP, UDP, and TLS protocols, and explains unique wireless and mobile network optimization requirements. You’ll then dive into performance characteristics of technologies such as HTTP 2.0, client-side network scripting with XHR, real-time streaming with SSE and WebSocket, and P2P communication with WebRTC.
Google
- Speed requirements for mobile websites & tips (Mobile Analysis in PageSpeed Insights)
- Optimizing the Critical Rendering Path and Analyzing Critical Rendering Path Performance

Testing

Core

Facebook Engineering: The Mature Optimization Handbook (or go directly to the pdf, ePub, Mobi). If you get bored, jump directly to ch 5. Instrumentation.

Performance Tips

HTTP Caching

Google devs: Optimize caching Key tips: Set one “strong” (unconditional) caching header - Cache-Control: max-age=N [sec][.Apple-converted-space]# (or Expires) - and one “weak” (conditional, checked for updates) - ETag (fingerprint/hash) or Last-Modified. Set Cache control: public directive to enable caching by HTTP proxies (and HTTPS caching for Firefox) - but make sure it does not set any cookies as most proxies would not cache it anyway in that case. Notice that many proxies do not cache resources with query params.#

Performance & Performance Testing for Webapps