Performance & Performance Testing for Webapps
- Key Performance Goals
- Performance Metrics
- The Utilization Saturation and Errors (USE) Method
- Java EE App Stress Testing (A. Bien)
- Level: Browser
- Level: Server
- Layer: Servlet Container / EJB Container
- Layer: Application
- Layer: Database
- Common Performance Issues
- Doing Performance Testing Correctly
- Tools
- Related
- Resources
! work in progress !
Key Performance Goals
Requirement format: “X should be less than L in P % times when the load is U users”
Request throughput
Latency
Max/avg response time from the end user’s point of view
Performance Metrics
The Utilization Saturation and Errors (USE) Method
The USE Method of performance analysis focuses on getting a complete overview of a system (without forgetting anything) and discovering quickly most of the performance problems. The main tool is a system-dependant checklist of resources and metrics of utilization, saturation (i.e. work that has to wait), and errors for each resource. Read more on its page, which also contains checklist for some systems.
Java EE App Stress Testing (A. Bien)
Goal: Test contention (⇒ dead/live-locks), transaction isolation, caching behavior, consistency, robustness, performance, memory consumption.
Memory, current heap size
Typical / peak # of worker threads
Usual depth of the request queue
# rolled back transactions (f.ex. due to optimistic locking) vs. successful ones
# requests/sec
# & length of major garbage collections
# DB connections
Size of JPA caches
all of these should be stable, i.e. not grow (too much) with time / growing load.
Ex.: JMeter + VisualVM (with the MBean plugin to monitor custom caches and with Visual GC) to observe the behavior live.
Level: Browser
TBD (network latency, rendering time, … - use FireBug’s timing capability or some similar browser plugin)
Level: Server
Resource utilization (from New Relic docs):
Cpu busy [%] - the percentage of the time that the system is using the CPU
Disk busy [%] - the percentage of the time that the system is performing Disk IO
Memory used [%]
Disk space used [%]
Network utilization [Mb/s]
Drilling down into processes - their count, CPU, memory
Layer: Servlet Container / EJB Container
TBD (JVM heap, threads, …)
Servlet Container
# threads (unless NIO used, i.e. # concurrent requests being processed)
…
Database
No. free connections in connection pools
Avg. time connection is used by a thread (how long away from the pool)
EJB Container
Bean pool utilization
…
…
Common Performance Issues
Apache
(I don’t remember the resource for this :-( )
No more File Descriptors
• symptoms: entry in error log, new httpd children fail to start, fork() failing everywhere
• solution: increase system-wide limits, incr. ulimit via apachectl
Sockets stuck in time_wait
• sympt.: unable to accept new conn., CPU under-utiliz. & httpd proc. idle, not swapping, netstat shows # sockets in time_wait
• many t_w are to be expected, only a problem when new conn. failing ⇒ decrease sys-wide TCP/IP FIN timeout
High Mem Usage (swapping)
• sympt.: (ignore system free mem, misleading): # disk activity, top/free show high swap usage, load gradually increasing, ps shows processes blocking on disk i/o
• sol.: add mem, …
CPU overload
• sympt.: top shows little/no cpu idle time, not swapping, high load, much cpu spent in userspace
• sol.: add cpu
Interrupt (IRQ) overload
• sympt: (freq. on 8+ cpu machines) not swapping, 1-2 cpu busy rest idle, low total load
• sol.: add NIC
Doing Performance Testing Correctly
Jetty on Load Testing
The Jetty High Load Howto has some good tips on creating realistic load testers and on configuring the load testing and server machines (TCP buffer sizes, in/outbound connection queue size, # file, # ports, congestion control). F.ex.:
A common mistake is that load generators often open relatively few connections that are kept totally busy sending as many requests as possible over each connection. This causes the measured throughput to be limited by request latency (see Lies Damned Lies and Benchmarks for an analysis of such an issue.
Another common mistake is to use a TCP/IP for a single request and to open many many short lived connections. This will often result in accept queues filling and limitations due to file descriptor and/or port starvation.
A load generator should well model the traffic profile from the normal clients of the server. For browsers, this if mostly between 2 and 6 connections that are mostly idle and that are used in sporadic bursts with read times in between. The connections are mostly long held HTTP/1.1 connections.
It recommends the Cometd Load Tester for a good example of a realistic load generator
Tools
Simple tools
Web servers
ab (Apache Benchmark) - ex: ab -n 10000 -c 250 <page URL> - generate 10k GETs for the URL, issuing 25o in parallel (limited by the number of sockets a process can open on the test machine)
siege - http/https stress tester - available in the software repositories of most Linux distros
wrk - a HTTP benchmarking tool wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
fortio - similar to siege; runs at a specified query per second (qps) and records an histogram of execution time and calculates percentiles (e.g. p99 ie the response time such as 99% of the requests take less than that number (in seconds, SI unit)). It can run for a set duration, for a fixed number of calls, or until interrupted
Databases
MySQL: mysqlslap - simulates a number of clients connecting to the DB and performing a query of your choosing
Apache JMeter
General
Max Number of Concurrent Users
The number of concurrent virtual users JMeter can efficiently simulate depends on the resources of the test machine (memory, thread limits, socket limits, network card speed, …), the complexity of the test, and other load on the system. In general it’s recommended to use 1000 or less threads - of course assuming that you don’t perform any extensive report gathering/rendering and reduce JMeter resource consumption as it would steal resources available for the testing. With bad configuration or machine you can experience problems already with a much lower thread count.
Notice also that if you don’t introduce any “think time” into your test plans than a single JMeter thread can generate much higher load than a human user could and thus a single thread can correspond to e.g. ten humans.
If you need more virtual user then you need multiple JMeter instances (preferably on multiple machines) in the master-slave configuration or as independent instances, compiling the individual reports yourself, if the overhead of master-slave communication is unacceptable for you.
Check this blog for some tips (2009).
From the Gatling stress test tool docs (referring to JMeter 2.5.1):
JMeter creates one thread per user simulated. If there is not enough memory allocated to the JVM, it can crash trying to create these threads. For instance, JMeter could not run 1500 users with 512 MB (what was used for Gatling even with 2000 users); OutOfMemoryErrors are recorded in the table as OOM.
Another problem occurred with the 2000 users simulations; it seems that JMeter can not simulate more than 1514 users independently from the memory that was allocated to the JVM.
Gatling
Gatling is a new (2012?) stress test tool, written in Scala and using Akka. Tests are described by a fluent API in a “text” or richer scala format. It claims high efficiency (2000 users simulated where JMeter couldn’t handle over 1500 and with much lower memory consumption of 512M). So far I haven’t noticed anything about distributed testing (certainly needed for 10s of thousands of users).
Banchmarking
SysBench
“SysBench is a modular, cross-platform and multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.” Last release 2004.
Disk: hdparm -t, bonnie++, iozone
See the blog post Disk IO and throughput benchmarks on Amazon’s EC2 (2009) for examples og use.
DBT-{1-5} - The Database Test Suite
DBT-* is a suite of database tests: DBT-1TM (Web Server) simulates the activities of web users browsing and buying items, DBT-2TM is an OLTP transactional performance test, DBT-3TM is decision support workload (business oriented ad-hoc queries and concurrent data modifications), DBT-4TM is an application server and Web services workload, DBT-5TM is an OLTP workload simluating the activities of a brokerage firm.
For ex. Xeround used it to compare its Cloud Database with Amazon RDS (7/2011?).
Web page performance testing
Google PageSpeed tools (online/Chrome extension) - provides recommendations
Also Mobile-Friendly Test
YSlow - browser plugin
Chrome/FF DevTools - Timing
Resources
Web page performance
Ilya Grigorik: Website Performance Optimization (Udacity course) [1:13:57] - 2014; Critical Rendering Path, optimizing content size, number requests, minimizing blocking elements. How to use Chrome dev tools’ Timing and perf.testing on a mobile.
Ilya Grigorik: High Performance Browser Networking - free online book - Author Ilya Grigorik, a web performance engineer at Google, demonstrates performance optimization best practices for TCP, UDP, and TLS protocols, and explains unique wireless and mobile network optimization requirements. You’ll then dive into performance characteristics of technologies such as HTTP 2.0, client-side network scripting with XHR, real-time streaming with SSE and WebSocket, and P2P communication with WebRTC.
Google
Speed requirements for mobile websites & tips (Mobile Analysis in PageSpeed Insights)
Optimizing the Critical Rendering Path and Analyzing Critical Rendering Path Performance
Testing
Core
Facebook Engineering: The Mature Optimization Handbook (or go directly to the pdf, ePub, Mobi). If you get bored, jump directly to ch 5. Instrumentation.
Performance Tips
HTTP Caching
Google devs: Optimize caching Key tips: Set one “strong” (unconditional) caching header - Cache-Control: max-age=N [sec][.Apple-converted-space]# (or Expires) - and one “weak” (conditional, checked for updates) - ETag (fingerprint/hash) or Last-Modified. Set Cache control: public directive to enable caching by HTTP proxies (and HTTPS caching for Firefox) - but make sure it does not set any cookies as most proxies would not cache it anyway in that case. Notice that many proxies do not cache resources with query params.#