Performance & Performance Testing for Webapps
- Key Performance Goals
- Performance Metrics
- The Utilization Saturation and Errors (USE) Method
- Java EE App Stress Testing (A. Bien)
- Level: Browser
- Level: Server
- Layer: Servlet Container / EJB Container
- Layer: Application
- Layer: Database
- Common Performance Issues
- Doing Performance Testing Correctly
- Simple tools
- On-line tools
- Apache JMeter
- Web page performance testing
! work in progress !
Requirement format: “X should be less than L in P % times when the load is U users”
Max/avg response time from the end user’s point of view
The USE Method of performance analysis focuses on getting a complete overview of a system (without forgetting anything) and discovering quickly most of the performance problems. The main tool is a system-dependant checklist of resources and metrics of utilization, saturation (i.e. work that has to wait), and errors for each resource. Read more on its page, which also contains checklist for some systems.
Goal: Test contention (⇒ dead/live-locks), transaction isolation, caching behavior, consistency, robustness, performance, memory consumption.
Memory, current heap size
Typical / peak # of worker threads
Usual depth of the request queue
# rolled back transactions (f.ex. due to optimistic locking) vs. successful ones
# & length of major garbage collections
# DB connections
Size of JPA caches
all of these should be stable, i.e. not grow (too much) with time / growing load.
Ex.: JMeter + VisualVM (with the MBean plugin to monitor custom caches and with Visual GC) to observe the behavior live.
TBD (network latency, rendering time, … - use FireBug’s timing capability or some similar browser plugin)
Resource utilization (from New Relic docs):
Cpu busy [%] - the percentage of the time that the system is using the CPU
Disk busy [%] - the percentage of the time that the system is performing Disk IO
Memory used [%]
Disk space used [%]
Network utilization [Mb/s]
Drilling down into processes - their count, CPU, memory
TBD (JVM heap, threads, …)
# threads (unless NIO used, i.e. # concurrent requests being processed)
No. free connections in connection pools
Avg. time connection is used by a thread (how long away from the pool)
Bean pool utilization
(I don’t remember the resource for this :-( )
No more File Descriptors
• symptoms: entry in error log, new httpd children fail to start, fork() failing everywhere
• solution: increase system-wide limits, incr. ulimit via apachectl
Sockets stuck in time_wait
• sympt.: unable to accept new conn., CPU under-utiliz. & httpd proc. idle, not swapping, netstat shows # sockets in time_wait
• many t_w are to be expected, only a problem when new conn. failing ⇒ decrease sys-wide TCP/IP FIN timeout
High Mem Usage (swapping)
• sympt.: (ignore system free mem, misleading): # disk activity, top/free show high swap usage, load gradually increasing, ps shows processes blocking on disk i/o
• sol.: add mem, …
• sympt.: top shows little/no cpu idle time, not swapping, high load, much cpu spent in userspace
• sol.: add cpu
Interrupt (IRQ) overload
• sympt: (freq. on 8+ cpu machines) not swapping, 1-2 cpu busy rest idle, low total load
• sol.: add NIC
The Jetty High Load Howto has some good tips on creating realistic load testers and on configuring the load testing and server machines (TCP buffer sizes, in/outbound connection queue size, # file, # ports, congestion control). F.ex.:
A common mistake is that load generators often open relatively few connections that are kept totally busy sending as many requests as possible over each connection. This causes the measured throughput to be limited by request latency (see Lies Damned Lies and Benchmarks for an analysis of such an issue.
Another common mistake is to use a TCP/IP for a single request and to open many many short lived connections. This will often result in accept queues filling and limitations due to file descriptor and/or port starvation.
A load generator should well model the traffic profile from the normal clients of the server. For browsers, this if mostly between 2 and 6 connections that are mostly idle and that are used in sporadic bursts with read times in between. The connections are mostly long held HTTP/1.1 connections.
It recommends the Cometd Load Tester for a good example of a realistic load generator
ab (Apache Benchmark) - ex: ab -n 10000 -c 250 <page URL> - generate 10k GETs for the URL, issuing 25o in parallel (limited by the number of sockets a process can open on the test machine)
siege - http/https stress tester - available in the software repositories of most Linux distros
wrk - a HTTP benchmarking tool wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
fortio - similar to siege; runs at a specified query per second (qps) and records an histogram of execution time and calculates percentiles (e.g. p99 ie the response time such as 99% of the requests take less than that number (in seconds, SI unit)). It can run for a set duration, for a fixed number of calls, or until interrupted
MySQL: mysqlslap - simulates a number of clients connecting to the DB and performing a query of your choosing
The number of concurrent virtual users JMeter can efficiently simulate depends on the resources of the test machine (memory, thread limits, socket limits, network card speed, …), the complexity of the test, and other load on the system. In general it’s recommended to use 1000 or less threads - of course assuming that you don’t perform any extensive report gathering/rendering and reduce JMeter resource consumption as it would steal resources available for the testing. With bad configuration or machine you can experience problems already with a much lower thread count.
Notice also that if you don’t introduce any “think time” into your test plans than a single JMeter thread can generate much higher load than a human user could and thus a single thread can correspond to e.g. ten humans.
If you need more virtual user then you need multiple JMeter instances (preferably on multiple machines) in the master-slave configuration or as independent instances, compiling the individual reports yourself, if the overhead of master-slave communication is unacceptable for you.
Check this blog for some tips (2009).
From the Gatling stress test tool docs (referring to JMeter 2.5.1):
JMeter creates one thread per user simulated. If there is not enough memory allocated to the JVM, it can crash trying to create these threads. For instance, JMeter could not run 1500 users with 512 MB (what was used for Gatling even with 2000 users); OutOfMemoryErrors are recorded in the table as OOM.
Another problem occurred with the 2000 users simulations; it seems that JMeter can not simulate more than 1514 users independently from the memory that was allocated to the JVM.
Gatling is a new (2012?) stress test tool, written in Scala and using Akka. Tests are described by a fluent API in a “text” or richer scala format. It claims high efficiency (2000 users simulated where JMeter couldn’t handle over 1500 and with much lower memory consumption of 512M). So far I haven’t noticed anything about distributed testing (certainly needed for 10s of thousands of users).
“SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.” Last release 2004.
See the blog post Disk IO and throughput benchmarks on Amazon’s EC2 (2009) for examples og use.
DBT-* is a suite of database tests: DBT-1TM (Web Server) simulates the activities of web users browsing and buying items, DBT-2TM is an OLTP transactional performance test, DBT-3TM is decision support workload (business oriented ad-hoc queries and concurrent data modifications), DBT-4TM is an application server and Web services workload, DBT-5TM is an OLTP workload simluating the activities of a brokerage firm.
For ex. Xeround used it to compare its Cloud Database with Amazon RDS (7/2011?).
Ilya Grigorik: Website Performance Optimization (Udacity course) [1:13:57] - 2014; Critical Rendering Path, optimizing content size, number requests, minimizing blocking elements. How to use Chrome dev tools’ Timing and perf.testing on a mobile.
Ilya Grigorik: High Performance Browser Networking - free online book - Author Ilya Grigorik, a web performance engineer at Google, demonstrates performance optimization best practices for TCP, UDP, and TLS protocols, and explains unique wireless and mobile network optimization requirements. You’ll then dive into performance characteristics of technologies such as HTTP 2.0, client-side network scripting with XHR, real-time streaming with SSE and WebSocket, and P2P communication with WebRTC.
Google devs: Optimize caching Key tips: Set one “strong” (unconditional) caching header - Cache-Control: max-age=N [sec][.Apple-converted-space]# (or Expires) - and one “weak” (conditional, checked for updates) - ETag (fingerprint/hash) or Last-Modified. Set Cache control: public directive to enable caching by HTTP proxies (and HTTPS caching for Firefox) - but make sure it does not set any cookies as most proxies would not cache it anyway in that case. Notice that many proxies do not cache resources with query params.#