| ISSRE 2006 | START Conference Manager |
To solve this problem, a performance-oriented methodology was proposed and implemented in HPC cluster system test. The key in this methodology is to measure the performance of all major subsystems that could impact system performance at every stage of system testing, most importantly at cluster bring-up stage. Full scale system test will not get started until all major subsystems’ performance is understood and acceptable system performance is achieved. Performance is closely monitored throughout the testing cycle so that performance related defects can be detected.
As an example, this methodology was implemented in testing GPFS 3.1 release in comparison with testing an earlier GPFS 2.3 release when this methodology was not adopted. In testing GPFS 2.3, a lot of network and storage performance related problems were encountered during the regular test cycle. A lot of effort was spent debugging network and storage problems, resulting 75% completion of planned testing by the scheduled delivery date.
From the lessons we learned from the GPFS 2.1 release, we adopted this performance-oriented methodology. Setup time was shortened; returned defects number due to network problems and system down-time that was used to debug network and storage problems were reduced. 100% testing completion was achieved on schedule.
| START Conference Manager (V2.52.6) |
| Maintainer: mark.sherriff@ncsu.edu |