本来是要给公司写的一个文档,但是觉得很有用,所以拿出来和大家分享一下。
Build Up Full Performance Model From Scratch
Performance Issues are the big thing for a system. Lots of time, we need a performance model to get the detail performance information.
This document is talking about how to build up a performance model from scratch.
Checkpoint
First of all, the goal of performance modeling and testing is to find the bottle neck.
When you using the System, you feel it is slow, is that really slow or just your feeling? Which component/sub-system drag the speed down? Who knows.
The Checkpoint is the answer. Checkpoint focus on the time period of one component/sub-system spend. A checkpoint’s target may be BIG or SMALL, it’s depends on what you need.
The Level of Checkpoint
The checked object could be big or small.
High level checkpoint focus on the brief layer, it will help you find the worst component/sub-system in you System. Using this one, you can determine which part is the worst one, that need improve immediately.
Dive into the big component, you need much detail checkpoint, that will tell you more information about the internal status.
Performance Model Prototype
The checkpoint data is important, and lots of time, you cannot collect enough checkpoint data from real world system, even the very detail ones. And another time, you need to know the performance ability of your new system. How to know that, lots of time, the performance testing environment is the only choice.
From scratch, you need to build up a very simple prototype system. Only focus one checkpoint, the database, network, or some functional component/sub-system. Whatever it is, just one or little checkpoint is OK.
Try to build your testing environment close to your real world system.
Performance Testing
Run the performance testing, there are two ways.
Per-Feature Testing
Mostly, per-feature testing is focus on a single component/sub-system. This testing’s goal is to know the ultimate capability of this part. Per-Feature testing always used for the component/sub-system turning, improve it’s performance.
Integration Testing
Of course, a (may be or may not be) huge system has lots of sub-system and components. In real world environment, the parts may affect others, may drive down others’ performance. You know, the integration testing is very necessary.
To do an integration testing is not an easy thing. Just put them together and run the testing is a wrong way. Because each part in different stress acts very different.
The first thing of integration is simulate the real world visiting behavior. For example, for most simplicity story, you are running a forum website, you must know how many people view the article topic and how many people edit/create the article. Then, divide your testing resource via the ratio.
Run the test, and then …
Checkpoint Data Collection
After we run a test, maybe a long time, lots of time/speed record data from checkpoint. These data is our answer, that will show you the trues.
Collect them and group them base on levels, per-feature and/or integration testing. You’ve got the time/speed between components/sub-systems. You got the idea of your system, whether it’s really slow, and which part is the worst.
Speed up the System
Find the worst or most needed upgrade components/sub-systems, debuging, refactoring, or unfortunately rewrite it. In this step, you have to know, the operation will be act on the most needed parts, maybe not the slowest parts.
It’s depends on your real world requirements. On the above example, if the slowest part is editing user profile, and, the ratio in all visiting request less then 1%. Also, reading article topic component is not the worst one, maybe much better, but it cost your website traffic more then 80%. You know which component is the most needed one to improve.
Continual Improve The Performance Model
After your upgrade you system, the job isn’t over, you’ve just start. The testing and improvement should go with your website traffic increase. Another, you should collect the checkpoint data from real world environment.
Real World Performance
In real world, it’s more dangerous then in your library. The network nightmare, wrong estimated user visiting overlay, or some other expected and unexpected problems. We should know the system healthy in real world environment but not only in your library.
Real World Checkpoint Data Collection
There are two types data is most important.
- Website access log
- Top, or some other detail, level checkpoint data
It’s a straight forward answer. The first item is about the real world site overlay. The second one is the brief view about your system in real world.
Performance Estimate
After we collect the checkpoint data, especially the integration , no worry about more or less, you can give a brief estimate about your system in the real world. The point is, all the time/speed data, estimated in real world, is based on real testing but not an unreasonable thinking. We can using the real world checkpoint data to fix the wrong number.
Upgrade the Model
When finish a loop, the model should be upgraded in for the following two reason.
- For new feature or just a refactoring.
- Try to make the testing environment much close to the real world.
We will use the data collected from real world, the bird view of the checkpoint data and site access overlay, adjust the testing data to simulate the real world. Make the checkpoint data in library more reasonable.
Performance Checkpoint Report
On the end, we will make the collected data, from both the library and real world, into a report. The report is important cause
- You know the system healthy in every time period.
- You can give a more reasonable estimation for new feature or for new server in different location, the different city or different country.
- Combine the SCM tool, as subversion or bazaar, you got to know how to do the system design/architecture/implementation/refactoring.
贴到google doc上吧,我想contribute。一直以来你建的那些proj我是真没时间参与,一个doc还是有点儿时间的:D