A number of bugs get reported that really don't make a lot of sense. The cause all sorts of head-scratching among kernel developers. Whilst most bug-reporters don't like to hear that their shiny new hardware may be broken/crap, sadly this is the case sometimes. Here's a few tips that may help root-cause hardware problems. - Be sure to use adequate cooling. In some cases, it has been discovered that even the CPU coolers that come with some OEM systems aren't quite as good as some 3rd party coolers. It's worth spending a little extra to ensure your CPU stays within its temperature limits whilst under sustained heavy load. Also, not all thermal grease is equivalent. The more expensive options really do get things running cooler. Be sparing when applying this stuff btw, too much is a bad thing, and adversely affects its ability to dissipate heat. - Spring clean your cooler(s). Periodically take your system apart and clean out the dust and buildup from any parts with fans. Including graphics boards. - Use the CPU clock speed that the CPU was rated for. Don't overclock. Don't overclock. Don't overclock. Even if Windows XP works fine on your 6GHz water-cooled Pentium4, this is no sign of stability. In some cases Linux can push the theoretical limits of the machine (by utilising all available memory bandwidth for sustained periods of time for example). Under such extreme load, the CPU will be generating a lot more heat than it will sitting idle on a Windows XP desktop. - Make sure your power supply is adequate to power all the peripherals you have attached. The gotcha here is that whilst it may be adequate to get the OS booted, when it's actually doing some work (like a big compile, or running doom), it's going to use up more power than it would whilst idling. All of this power has to come from somewhere. If the PSU can't supply enough, something is going to be underpowered, which can result in very strange kernel panics. - memtest86 Yes it takes ages to run. Sometimes it takes at least a day before it shows up that there's a bit error in some DIMM. (The worst I've seen was an error that only showed up after a week long run). It's really worth the time testing though. If you don't do this test, and the problem really is flaky RAM, then the 'bug' will never be fixed, and just cause extensive head-scratching. - Where possible avoid double-sided RAM. Some boards don't seem as well tested with double-sided sticks as they do with single sticks. In one reported case, double-sided RAM only worked in one out of the two slots. - Reset BIOS to safe defaults. - A number of times, users have reported issues that manifest themselves as really obscure oopses that don't really make a lot of sense. They turned out to be things like 'CAS timing' set too aggressive on systems with cheap RAM. (A number of times, these settings worked fine until the user added an extra DIMM). Interestingly, this problem didn't show up under memtest86 [although maybe it would if left to run long enough] - Also, there may be an option in the BIOS labelled "installed OS", offering various choices of operating system. There's no hard and fast rule as to which one is the right one to pick (unless there's an explicit "Linux" option), as various BIOS vendors put different choices in their lists, and have this option affecting various different things. If you experience hard to diagnose problems, you may want to try experimenting with a different option here. - Finally, if your BIOS has a 'reset ESCD' option, and you've been experiencing problems which could be related to interrupts, resetting this has in some cases made things work for some people. - Check cabling isn't obscuring airflow. Fans should be completely unobstructed, ensuring that air can circulate throughout the case. - Be sure to be running the latest BIOS from your motherboard manufacturer. Modern Linux kernels make heavy usage of ACPI during bootup for things ranging from hardware discovery, to interrupt routing, and power management. From time to time, broken tables in the BIOS are discovered, which can cause crashes, or lockups before the system even boots. Even non-ACPI related tables are wrong from time to time, so this is one to check, even if you don't use ACPI. - Check cabling is secure. We've seen various strange reports especially ranging from "floppy occasionally writes/reads bad sectors", "hard disk spews random error messages", "graphics get corrupted" which turned out to be power cables that haven't been fully pushed home. If you use Y shaped 'splitter' cables for these cables, and you see problems on one specific device, try a different cable, sometimes these things aren't built quite as well as the connectors coming from the PSU. - Check RAM timings. Mixing and matching DIMMs of different speeds is a sure-fire way to introduce really bizarre bugs and stability problems. Also make sure that the speed settings in the BIOS are either set to AUTO (sometimes labelled 'SPD' [serial presence detect]), or has the correct speed for the DIMMs. As noted above, double-sided DIMMs can be problematic. Don't mix and match RAM where possible. - Check your power. There have been a number of bugs reported that 'disappeared' when the reporter moved the affected machine onto a UPS or an alternative power circuit. There are a number of surge protector type devices that 'clean' the incoming power before it reaches the computer. They make for a valuable investment. - Balance the power cables out. Don't have multiple Y cables coming off a single PSU spur and 1 device on all the other spurs. Use 1 Y cable per spur where possible. (If you need more, it may be a sign that you need a larger PSU). - USB devices plugged into hubs. Not all hubs are powered, and instead, draw their power from the computer instead of having its own power supply. If you plug too many devices into one of these hubs which draw more power than the hub can draw from the computer, you may observe strange USB failures. Choose a powered hub if you think you will be using devices which may draw lots of power (printers/scanners etc) - cable convertors. Attaching a device to an interface through a dozen cable convertors and gender changing devices is going to weaken the signal strength which may result in errors. Make sure you buy the right cables rather than hacking together a frankencable. - Make sure you have the right cables. An 80 pin IDE cable is essential for anything above ATA33 for example. Using a 40 pin cable might work for low speeds, but you'll get errors later when you try to do faster transfers. - Put slow IDE devices on their bus. Typically CD drives/ata floppy drives, these things aren't particularly fast, and when used as slaves on the same bus as a hard disk, it can impact hard drive performance, and in some cases, cause corruption. Additionally, if you have both ATA33 and a higher speed ATA bus, putting the CD drives etc on the slower bus is advised. - BIOS options - legacy USB - just say no.