16 October 2006

Bart vs. BAD_POOL_CALLER

BAD_POOL_CALLER is one of those scary STOP errors that one may see in XP (that is, once you kill the duhfault "restart on system errors" setting) . This particular case was an XP SP2 system that was said to crash straight into this on startup.

Uneventful 12 hours in MemTest86 with no sponaneous lockups or booting of the Bart CDR I left in place during the test, motherboard caps OK. Bart CDR boot, HD Tune passes SMART, temperature and surface on both hard drives, file systems OK on ChkDsk all volumes.

Formal malware scans fine, until the first test that requires RunScanner to access the registry hives on the hard drive. As soon as I click RunScanner's dialog OK, the system STOPs.

Riiiight... next, I harvest spare registry hives from SR Restore Points in the C:\SVI subtree. Then I pick a trivial scanner that I've set to run via RunScanner; in this case, Stinger. It doesn't matter what it is; all I want to test-to-fix is whether I can initiate registry access to the hard drive hives. If that no longer dies, I may have fixed the problem.

I run Stinger in this way, each time choosing a different user account and not checking the "use all hives" checkbox in the RunScanner dialog box. Ever user account is fine except the one they actually use, which dies the blue death.

Now that I've narrowed it down to a single file, I rename away that user account's NTUSER.DAT (which is the per-user registry hive), copy in the most recent spare from the most recent Restore Point, rename that into action as NTUSER.DAT, and re-test; this time it works as well as the other accounts did.

I'm interested in a single hive causing this common head-scratching problem, so I keep the "bad" and "fixed" copies of the hive, which appear to be the same length. I'll FC them to see if there's some specific difference (either structural, or a recent install... tho I'd expect the latter to change the file length) that causes the problem, and update this article if that looks interesting.

Meantime, Bart has saved the day yet again; what could so easily been "just" wipe and re-install, turned out to be a few unattended hours on "the prelim" plus around one hour of interactive work in Bart. Did I mention I liked Bart?

That drill-down method again...
  • check RAM and hardware first
  • Boot into Bart CDR
  • use a tool via RunScanner
  • choose one user account at a time
  • if all break, suspect a system hive (common to all accounts)
  • if only one account breaks, it's that account's hive
  • preserve damaged hive by renaming away, not delete
  • harvest replacement hives from recent Restore Points
  • try with newset, then second-newest etc.
  • do not try any of the above in hard-drive-booted Windows
  • compare bad and good copies of the hive for differences
There are probably scores of reasons to STOP on BAD_POOL_CALLER, and many of these may have no pattern if the underlying hardware level of abstraction is bad (as is often the case with registry damage). Even so, if you have a consistent STOP on every boot, at the same point in the boot, then this approach may find the solution if your case is like this one.

3 comments:

Anonymous said...

mate! you saved my day

Anonymous said...

Saved my day too!!! Thanks!

I have a HP Pavilion laptop (dv5320)
A couple of notes from my journey...

- Getting Bart up and running... Easy enough, but I had to do some research because my C drive wasn't visible. Finally discovered that I had to include the drivers in the BartPE image. Getting the drivers wasn't simple since my laptop is pre-installed with windows, so I didn't have a drivers CD. None-the-less I managed to get them from the HP support site and low and behold, Bart saw my C drive!

- I don't really understand the registry too well and don't really understand what RunScanner does... but as you suggested, I selected user by user and quickly came across the user that crashes the system (my daughter... figures!).

- I had a little trouble figuring out how to harvest a spare hive. The link between the files in the snapshot and the NTUSER.DAT wasn't very clear to me. Anyway, I guess I did the right thing because it worked. I renamed my daughters NTUSER.DAT, copied one of the NTUSER files in the snapshop (took one with the same file size) and renamed it to NTUSER.DAT

- Rebooted the system and no more blue screen!!!

Thanks again!!!

Chris Quirke said...

Hi, Anon II :-)

Yes, most new motherboard chipsets have S-ATA that isn't supported by the native i386 code set from which Bart is built. How did you integrate the drivers? I've been pulling the HDs and working in a compatible frame while the parent PC does time in MemTest.

The registry is a repository of information that defines how the system works, how applications interact, etc. It's a binary file, (nearly?) always approached only via OS APIs.

The challenge with operating from Bart is that the registry in effect is Bart's, not the HD installation's, so tools that operate on the registry will miss their mark. RunScanner is a plugin that overcomes this by transparently re-routing a program's registry access to the hives on the hard drive.

Yes, there's a bit of file renaming and subfolder navigation required to find matching hives in the Restore Points. It looks fairly intuitive to the eye, but is a challenge if you want to automate the process via batch file, or just haven't done it before.

Sorry my walk-through wasn't clearer on the above, but at least it had you digging in the right place - glad it worked :-)