STRESS TESTING YOUR COMPUTER BACKGROUND ---------- Today's computers are not perfect. Even brand new systems from major manufacturers can have hidden flaws. If any of several key components such as CPU, memory, cooling, etc. are not up to spec, it can lead to incorrect calculations and/or unexplained system crashes. Overclocking is the practice of increasing the speed of the CPU and/or memory to make a machine faster at little cost. Typically, overclocking involves pushing a machine past its limits and then backing off just a little bit. For these reasons, both non-overclockers and overclockers need programs that test the stability of their computers. This is done by running programs that put a heavy load on the computer. Though not originally designed for this purpose, this program is one of a few programs that are excellent at stress testing a computer. RESOURCES --------- This program is a good stress test for the CPU, memory, L1 and L2 caches, CPU cooling, and case cooling. The torture test runs continuously, comparing your computer's results to results that are known to be correct. Any mismatch and you've got a problem! Note that the torture test sometimes reads from and writes to disk but cannot be considered a stress test for hard drives. You'll need other programs to stress video cards, PCI bus, disk access, networking and other important components. In addition, this is only one of several good programs that are freely available. Some people report finding problems only when running two or more stress test programs concurrently. You may need to raise prime95's priority when running two stress test programs so that each gets about 50% of the CPU time. Forums are a great place to learn about available stability test programs and to get advice on what to do when a problem is found. The currently popular stability test programs are (sorry, I don't have web addresses for these): Prime95 (this program's torture test) 3DMark2001 CPU Stability test Sisoft sandra Quake and other games Folding@Home Seti@home Genome@home Several useful websites for help (look for overclocking community or forum): http://www.overclockers.com http://www.arstechnica.com http://www.hardocp.com http://www.anandtech.com http://www.tomshardware.com http://www.sharkyextreme.com Also try the alt.comp.hardware.overclocking Usenet newsgroup. Utility programs you may find useful (I'm sure there are others - look around): Motherboard monitor from http://mbm.livewiredev.com Memtest86 from http://www.memtest86.com Cpuburn by redelm: http://pages.sbcglobal.net/redelm/ TaskInfo2002 from http://www.iarsn.com/ WHAT TO DO IF A PROBLEM IS FOUND? --------------------------------- The exact cause of a hardware problem can be very hard to find. If you are not overclocking, the most likely cause is an overheating CPU or memory DIMMs that are not quite up to spec. Another possibility is you might need a better power supply. Try running MotherBoard monitor and browse the forums above to see if your CPU is running too hot. If so, make sure the heat sink is properly attached, fans are operational, and air flow inside the case is good. For isolating memory problems, try swapping memory DIMMs with a co-worker's or friend's machine. If the errors go away, then you can be fairly confidant that memory was the cause of the trouble. A power supply problem can often be identified by a significant drop in the voltages when prime95 starts running. Once again the overclocker forums are a good resource for what voltages are acceptable. If you are overclocking then try increasing the core voltage, reduce the CPU speed, reduce the front side bus speed, or change the memory timings (CAS latency). Also try asking for help in one of the forums above - they may have other ideas to try. CAN I IGNORE THE PROBLEM? ------------------------- Ignoring the problem is a matter of personal preference. There are two schools of thought on this subject. Most programs you run will not stress your computer enough to cause a wrong result or system crash. If you ignore the problem, then video games may stress your machine resulting in a system crash. Also, stay away from distributed computing projects where an incorrect calculation might cause you to return wrong results. Bad data will not help these projects! In conclusion, if you are comfortable with a small risk of an occasional system crash then feel free to live a little dangerously! Keep in mind that the faster prime95 finds a hardware error the more likely it is that other programs will experience problems. The second school of thought is, "Why run a stress test if you are going to ignore the results?" These people want a guaranteed 100% rock solid machine. Passing these stability tests gives them the ability to run CPU intensive programs with confidence. FREQUENTLY ASKED QUESTIONS -------------------------- Q) My machine is not overclocked. If I'm getting an error, then there must be a bug in the program, right? A) The torture test is comparing your machines results against KNOWN CORRECT RESULTS. If your machine cannot generate correct results, you have a hardware problem. HOWEVER, if you are failing the torture test in the SAME SPOT with the SAME ERROR MESSAGE every time, then ask for help at http://mersenneforum.org - it is possible that a recent change to the torture test code may have introduced a software bug. Q) How long should I run the torture test? A) I recommend running it for somewhere between 6 and 24 hours. The program has been known to fail only after several hours and in some cases several weeks of operation. In most cases though, it will fail within a few minutes on a flaky machine. Q) Prime95 reports errors during the torture test, but other stability tests don't. Do I have a problem? A) Yes, you've reached the point where your machine has been pushed just beyond its limits. Follow the recommendations above to make your machine 100% stable or decide to live with a machine that could have problems in rare circumstances. Q) A forum member said "Don't bother with prime95, it always pukes on me, and my system is stable!. What do you make of that?" or "We had a server at work that ran for 2 MONTHS straight, without a reboot I installed Prime95 on it and ran it - a couple minutes later I get an error. You are going to tell me that the server wasn't stable?" A) These users obviously do not subscribe to the 100% rock solid school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS. But since they are not presently running any programs that reveal the hardware problem, the machines are quite stable. As long as these machines never run a program that uncovers the hardware problem, then the machines will continue to be stable.