Quantcast
Channel: All ProLiant Servers (ML,DL,SL) posts
Viewing all articles
Browse latest Browse all 30225

Re: DL380G7 WHEA-logger Event ID: 47 "A corrected hardware error has occurred"

$
0
0
If your WHEA errors indicate a memory problem, it's probably a bad module.

The RAM test it does at startup when the memory configuration changes is pretty basic. Unless it's a *really* bad RAM module, it will probably pass that quick test.

I had a DL360 G7 that tested fine on every single RAM test I could find, even the ones where you boot to a floppy or USB and it has a full comprehensive test. Every time, no problems.

But when I ran MSSQL on that server, as soon as the memory usage reached some particular size, hitting some specific memory block, then boom, the system would blue screen and IML would log an error.

Fortunately for me, it logged the specific memory slot so I was able to replace that module, and it's been fine ever since.

The lesson to learn is, just because your memory tests are coming up clean, don't assume you don't have a memory problem, especially when the server's monitoring is telling you that you do. :)

Fortunately in many cases, a memory error is correctable if it's just a single-bit error, and that's probably what's happening if you're getting messages like that. If it's a double-bit, it's being detected but not corrected and you'll probably BSOD sooner or later.

There are more advanced memory modes like lockstep, mirroring, spare, etc. and some of those advanced modes can detect *and* correct double-bit errors too, but it slows down your memory access. Unless you have some mission critical system, you probably want performance, and you'll just replace bad modules when you find them.

Hopefully the system is logging which memory slot had the error.

Then again, it could be something else and not a memory problem at all, but probably not as long as you have the latest firmware and drivers installed.

Viewing all articles
Browse latest Browse all 30225

Trending Articles