Grand National in Aintree.
Even people usually not betting have a flutter heading to the nearest betting shop or hitting the net. Good news for an online gambling company, mixed news when it comes to the servers that need to cope with a lot more traffic than any other time of the year. Quite a few online gaming companies had problems on the day.
Our servers held and we only load tested one aspect of our system. Here's the story.
About two weeks before the race it was decided that we want our payment section tested after all. Right, plenty of time, better get started. We looked into JMeter, however there's limited knowledge in the team. So we asked some developers for help. After a lot of headscratching it was decided, yes, it can be done within about 3 weeks and 2 devs. So the idea of using JMeter was out of the window. A commercial solution was not feasable due to timescales and some other factors.
We decided, since it was to be something that had to be quick and was to be a one off (we'll worry about next year later), to do an in-house solution. While I was running around, gathering required information, investigating possible pitfalls, etc our automation expert (don't tell him, he will ask for more money) used AutoIT to code the load tool.
Yes, you read right. That's about the most unlikely tool for a performance test that I can think of but I employed most of my team members, I have trust in their abilities so time to put my money where my mouth is.
We knew that we wanted to get 8+ transactions a second to test that our system and our third party payment provider could take the heat. And we wanted that sustained over a period of time, say half an hour before the race is busiest so let's just run it for that time. That's the detailed requirements gathering out of the way...
I won't go into the details of the system for obvious reasons, rather explain how we went about it. We ran a transaction manually and found that it took between 10-25 seconds to complete, including filling out the form again for a second try. We also found that we could fit 8 browser windows on a monitor and show all necessary fields and buttons when setting the window size to 35%.
The idea was then to click "Submit" on all 8 browsers at the same time (or a millisecond apart). That worked fine. So, we could hit the system with 8 transactions. Once.
Ok, no problem, since we needed about 25 seconds for a round trip that was maintainable, we added another 5 seconds in case the system slowed down and then went to hunt down 30 PCs in the office (an experience in itself) to start them a second apart.
After having identified the machines we set all 30 to the same screen resolution (I didn't know how many different monitors we had until that day), deployed our performance tool to each machine and set up the machines. That involved setting the IE homepage to the desired URL, running the setup tool, clicking the necessary buttons and fields so that the correct X/Y coordinates were captured. The AutoIT application would then calculate the relative difference for the other 7 windows.
That was a lot of manual setup but we haven't had the time to code for more. Each machine got it's own ID (set up in the application and post it on the monitor) from 1 to 30 and we coded an editable start time into it. For example, machine with ID 1 would start at 08:30:01, ID 2 would start at 08:30:02, etc and then loop round starting again at 08:30:31.
Quite a bit of thinking and discussion has gone into this bit because we could have let each window just start again after each transaction was finished. Depending on the system response time transaction would have drifted apart though resulting in potentially a lot higher transaction number for some seconds and none at others. I was more concerned about the former as it could have been a theoretical 30*8=240 transactions at the same time which would very likely have created serious problems.
So we had our 8 transactions running for 30 seconds, looping around indefinetely until we stopped them. Setup was completed in the evening after a 13 hour+ day. We also did a small test run just to make sure it would work.
Before starting the next morning we had sys admins and several other people in place to monitor the various system components during the test and in case something went seriously wrong.
Execution threw up some problems, not least some network issues that weren't identified before. Also, some pages threw errors that the application couldn't recover from. Time was running out and people came into the office wanting their PCs to work on, so work was stopped. Two days later, after resolving the issues and having two more days to put some resilience into the AutoIT scripts so that recovery after errors was better we tried again.
This time it worked really well. Some machines had to be started again but overall we put the desired load on the system.
Lessons learnt:
- Regardless how much you ask people for information, there's always something that someone has forgot to mention that will ruin your day. Do a test run and plan for round 2.
- Don't use IE if you don't have a standard company build on all PCs. Use a browser that you install yourself and control so you don't run into configuration issues when you least need it.
- Ask people for help. Running it yourself will only stress you out. I found that once we explained the somewhat mad plan people were happy to assist
- Tell people if you're highjacking their PCs, you may not get a chance to revert all changes. That was done for most but the ones we forgot were understandably miffed that we changed their settings without warning.
- Get a demonstration of what will be monitored and what can be saved for analysis later. Assuming it will all be there can lead to disappointment and repeats of the test.
- Don't ask others to do the boring jobs. Being hands on helps the rest of the team to see that you're serious about making this work.
- It's possible to get the job done, regardless if you're prepared, have the tools or the time. Determination, the belief that the job can be done and trust in others will get you a long way.
Thanks for reading.
No comments:
Post a Comment