I've recently been evaluating different website benchmarking tools, so I though I would take a moment to highlight two of them I have used recently.
ab is the Apache Benchmark tool, and it comes bundled with Apache. It's a pretty simple cli tool, used to test the throughput of a website. It has a bunch of different options you can pass to it, but the most important are -c (number of concurrent connections) and -n (number of requests). It's man page is pretty well written, so I'll let you explore the other options on your own.
(Make a note here, that you need to specify the protocol, and the page, otherwise ab will complain)
And the results of that benchmark:
ab -n200-c20 http://alextheward.com/ This is ApacheBench, Version 2.3<$Revision:655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Server Software: Apache/2.2.22 Server Hostname: alextheward.com Server Port: 80
Document Path: / Document Length: 19858 bytes
Concurrency Level: 20 Time taken for tests: 26.306 seconds Complete requests: 200 Failed requests: 0 Write errors: 0 Total transferred: 4076928 bytes HTML transferred: 3986634 bytes Requests per second: 7.60[#/sec] (mean) Time per request: 2630.642[ms](mean) Time per request: 131.532[ms](mean, across all concurrent requests) Transfer rate: 151.35[Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 6921211728.415549394 Processing: 0296372.51482138 Waiting: 063166.30897 Total: 33124171776.018469684
Percentage of the requests served within a certain time(ms) 50% 1846 66% 2623 75% 3189 80% 3586 90% 4958 95% 6051 98% 8059 99% 8518 100% 9684(longest request)
Looks like I could be doing a better job serving up web requests! Lowering the concurrency certainly helped the test, getting the request time down to < 1000 ms for 90% of requests, so I need to see what's going on with Apache when I'm serving up concurrent requests.
There's another gotcha with AB. It cannot handle ssl requests coming from a server with a self-signed cert. There does not appear to be any way to tell it to ignore ssl errors either.
Jmeter is actually really cool, but it does have a bit of a learning curve. I've attached a couple of images which show a basic configuration.
First thing you have to add is a thread group. This is the place where you tell it how many threads to run on, and how many requests each thread is going to request. After that, you need to add HTTP Request defaults, so that you can specify the default server and the default request uri. Next, you add a HTTP Request sampler, and give it the uri you want to test. You can add as many of these as you want. Finally, you need something to read the results. I've added 2, one which shows the sample results, and another which shows the average request times over an interval.
After you hit the run button, you will see results in the resulting screen!
It will give you a pretty good intro to how to do web benchmarking with it. There are a bunch of other features which are outside of the scope of this post, but it's a pretty good tool for doing all kinds of performance testing.
My last post was about using ab and jmeter to get some performance benchmarks on a website. Now, I figure I should probably mention how to profile your application to look for memory leaks, bad calls, and what portion of the code is taking the most amount of time.
First thing to do is get xdebug installed. There are several ways to go about this, the easiest being using a package manager to do the hard work for you. For example, if you're running Ubuntu with the default php from apt-get, you would do:
So, you usually want to turn on profiler_enable_trigger, and then pick out a page you want to profile (maybe one that is fairly slow) and then add ?XDEBUG_PROFILE=1 to the end of the url. Go check the directory you set in profiler_output_dir, and check out the cachegrind files it generated.
What the heck do you do with cachegrind files?
If you're on Windows, you'll want to dig out wincachegrind. On Linux, kcachegrind. On Mac, I've tried out MacCallGrind, it's a paid app, but the trial allows you to open files up to 3MB in size (which will generally be enough on smaller page loads, but if you have a serious call stack problem it won't cut it).
So, fire up the cachegrind parser of choice and load up a cachegrind.out file of your choosing. You will be presented with a window that looks like this (WinCacheGrind).
Learning to read the output isn't too difficult, the left side of the window is the entire call stack, starting at the outermost call and allowing you to drill down into all of the subsequent calls. You can also do that on the main panel. The different columns of the main window are the amounts of time each call took to execute. Self is the total amount of time that particular invocation took. In the sceenshot my main file (but nothing it included) took 1ms. The Cum. column is the cumulative amount of time that all of the child calls inside of the main method took (aka 22 seconds).
Just stepping through the entire application can be useful, but more useful is being able to sort through the most expensive calls that are happening in your application. Simply click on the Overall tab, and you will be presented with a screen like the following.
As you can see, there are several more columns which provide you with more information. Avg. Self is the average amount of time that individual call took across all of the invocations. **Avg. Cum. **is the average amount of time that the call took including all of the child calls that method invoked. Total Self is the total amount of time the individual call took across all invocations. **Total Cum. **not surprisingly is the total amount of time the call including child calls took. Calls is the total number of times that method was called during script execution.
That Total Cum. column is the one you will want to look at in order to determine where large spikes of time are being used up. As you can see, I've sorted on that field, and php's call_user_func_array is taking the largest amount of time. That's not particularly useful, though you can then drill down into the call stack and get to the meat of the problem by doubleclicking (from the line-by-line section) until you find a more concrete problem. How do I know it's not actually call_user_func_array's fault? For one, Total Self is 18ms, which is almost nothing. For two, it doesn't do anything except make another call.
Drilling down, I find that ultimately the longest call is split between view->build and view-render, indicating I have a problem with a drupal view I'm inculding on a bunch of pages. Going even deeper, the majority of the time in those calls are also because of a mysql call, indicating a query problem being the bottleneck.
Optimizing Mysql is beyond the scope of this article, but it is next in my series of site optimization articles so stay tuned!