16 October 2006

Bart vs. BAD_POOL_CALLER

BAD_POOL_CALLER is one of those scary STOP errors that one may see in XP (that is, once you kill the duhfault "restart on system errors" setting) . This particular case was an XP SP2 system that was said to crash straight into this on startup.

Uneventful 12 hours in MemTest86 with no sponaneous lockups or booting of the Bart CDR I left in place during the test, motherboard caps OK. Bart CDR boot, HD Tune passes SMART, temperature and surface on both hard drives, file systems OK on ChkDsk all volumes.

Formal malware scans fine, until the first test that requires RunScanner to access the registry hives on the hard drive. As soon as I click RunScanner's dialog OK, the system STOPs.

Riiiight... next, I harvest spare registry hives from SR Restore Points in the C:\SVI subtree. Then I pick a trivial scanner that I've set to run via RunScanner; in this case, Stinger. It doesn't matter what it is; all I want to test-to-fix is whether I can initiate registry access to the hard drive hives. If that no longer dies, I may have fixed the problem.

I run Stinger in this way, each time choosing a different user account and not checking the "use all hives" checkbox in the RunScanner dialog box. Ever user account is fine except the one they actually use, which dies the blue death.

Now that I've narrowed it down to a single file, I rename away that user account's NTUSER.DAT (which is the per-user registry hive), copy in the most recent spare from the most recent Restore Point, rename that into action as NTUSER.DAT, and re-test; this time it works as well as the other accounts did.

I'm interested in a single hive causing this common head-scratching problem, so I keep the "bad" and "fixed" copies of the hive, which appear to be the same length. I'll FC them to see if there's some specific difference (either structural, or a recent install... tho I'd expect the latter to change the file length) that causes the problem, and update this article if that looks interesting.

Meantime, Bart has saved the day yet again; what could so easily been "just" wipe and re-install, turned out to be a few unattended hours on "the prelim" plus around one hour of interactive work in Bart. Did I mention I liked Bart?

That drill-down method again...
  • check RAM and hardware first
  • Boot into Bart CDR
  • use a tool via RunScanner
  • choose one user account at a time
  • if all break, suspect a system hive (common to all accounts)
  • if only one account breaks, it's that account's hive
  • preserve damaged hive by renaming away, not delete
  • harvest replacement hives from recent Restore Points
  • try with newset, then second-newest etc.
  • do not try any of the above in hard-drive-booted Windows
  • compare bad and good copies of the hive for differences
There are probably scores of reasons to STOP on BAD_POOL_CALLER, and many of these may have no pattern if the underlying hardware level of abstraction is bad (as is often the case with registry damage). Even so, if you have a consistent STOP on every boot, at the same point in the boot, then this approach may find the solution if your case is like this one.

15 October 2006

Open Source Eudora

Most of the time you'd be reading rants about awful and shifty vendors are - perhaps every industry is as bad, but I'm "further away" from most? - so it's a pleasure to celebrate software vendors who do the right thing...

http://www.eudora.com/faq/

Now here's a vendor with a popular product, but one that isn't their central interest. They could have just killed it, and told those who complain that it is their right to do so; after all, we see that all the time with music corporations, who delete titles they are "too old and unpopular to make money" while still forbidding even the original artist to distribute them for free.

But instead, they are shifting the product into Open Source, while committing to honor their obligations to those who have purchased the Paid version. Those who use Sponsored mode (myself included) can now stay in this mode with full functionality forever, even when the ads stop. Unless there's some sting in the tail so hidden I can't see it, it looks like an excellent result!

Previous "Do The Right Thing" award

I've been as impressed a few times before, and the last time was when Computer Associates ceased the popular free InoculateIT antivirus suite. Again, they announced the move and then supported the free version with updates for a longer period than commercial vendors' one-year subscription, and they offered a low conversion price to the feeware eTrust that replaced it.

The InoculateIT story was particularly impressive, as the initial announcement that stated "free updates until we have to change the scanning engine code" was made within a few months of the release of a new version of Windows. It would have been so easy to claim the need to create a new engine to overcome compatibility issues with the new Windows version, but they didn't do so; InoculateIT remained free for many months thereafter.

When InoculateIT ceased to be free, it also ceased to be the de facto free/non-warez antivirus product. AVG stepped into those shoes; it was always around, along with Avast, later AntiVir, and some others, but there was greater confidence in InoculateIT at that time. AVG have also done the right thing when they dropped the free AVG 6 product to consolidate on AVG 7 as the sole code base; they offered a free version of AVG 7, pushing alerts to AVG 6 installations about the cut-off date for some months before updates ceased for the old version.

When you have such good "no strings attached" free antivirus products, why would anyone want to put up with Symantec's embedded commercial malware in Norton AV? If you do a Google( "Why I don't use Norton" ), you will see I'm not the only one who avoids it.

14 October 2006

Rungbu.A Exploits Bad Design

This case study illustrates several issues I've raised before, as well as a few lessons, such as "there's no 'one problem per case' rule", "best practice isn't bullet-proof" and "one antivirus scanner isn't enough".

I was on site doing something else, when I was called to check out a problem with opening Word documents, which the user attributed to an encounter with a dubious diskette.

The first thing I noticed was that her PC wasn't showing file name extensions, contrary to the way I generally set up PCs...

"Hey, you can't see the file name extensions! Without that, you don't know what type of file you're about to open! That's dangerous!"

'No, that's OK; I can see the Word icon, so I know the files are Word documents'

This was followed by an explanation of why this can't be trusted, while she insisted it was OK, and 'was always like that'. I pointed to two files as an example; a pale (normally hidden) one called "Some file name" and a bold one also called "Some file name". I right-clicked on each, and sure enough, the hidden one was the .DOC while the visible one was an .SCR - so I wasn't too surprised when the setting to not hide file name extensions would not "stick".

"You're malwared", I said, and after shutting down and setting CMOS to boot CD, I booted up the Bart CDRW I tend to have on me at all times. Bart would boot on this crusty old Win98SE system (333MHz, 64M RAM)... if only the 32-speed CD-ROM would read CDRW disks... so it's heigh-ho, back to base we go.

Who's stupid?

As a geek, my first reaction was to consider the user foolish for trusting icons as an indication of file type. Then I thought; why should a user know that the most dangerous file types can set whatever icon they like, and that .scr files are raw code, and thus dangerous? Why doesn't the user interface clearly flag which files are code and which are data, as well as the type, and disallow any content to misrepresent itself? Why are file name extensions hidden by duhfault, anyway, and why are things still as brain-dead in Vista?

That's the problem with bad design - it never gets patched, because it "works as designed". We had years of MS Office macro and VB malware before that was fixed, years of Outlook and Outlook Express auto-running scripts in HTML "message text", and we still have Format in the middle of hard drive context menus while Backup, Check for errors etc. are buried under Properties, Tools. Stupidity is found not only in end users.

Best Practice can still fail

As usual, I started work on the system with several hours in MemTest86, to make sure the system was safe to run at all. Then I booted my Bart CDR, eyeballed SMART details in HD Tune, did a surface scan of the 4G hard drive, and checked SMART details again; no change in SMART, no surface errors, OK. A ChkDsk confirmed the four file systems were OK, so I created a session base directory on one volume, and set that as Bart's Temp location. I could then shrink the Bart RAM disk as it's no longer needed for Temp, and create a pagefile on hard drive to relieve constraints imposed by 64M of RAM.

Then I started my antivirus scanning wizard and went about my other work. A while later, I see the second av scan is still stuck on the same file, so I run HD Tune again; it shows blank SMART details, and a surface scan picks up "one bad sector".

I immediately pull the mains, pull the hard drive into another PC, copy off everything from DOS mode using the LCopy from Odi's LFN Tools, starting with the data set and carrying on until most stuff is backed up. I had hard failures on C:\Windows (bad disk) and the session subtree to which the Bart av wizard would have been logging the scans (file system corruption).

Next, I went in with DiskEdit, confirming bad clusters throughout the entire C:\Windows directory chain. Noting the cluster address of the Windows directory, I searched for subdirectories on C: (fortunately it's a small C:, not the whole hard drive) and ballpointed the . cluster addresses for all that had .. pointing back to the lost Windows directory. Then I created scratch directory entries in C:\ to point to these, and copied them off.

I then did a raw image copy of the fortunately-small C: volume in case I needed to recover more stuff later, and finally back in DiskEdit, I "erase-marked" the Windows directory so that scanners traversing the file system wouldn't fall into a pit of bad sectors.

Having got what I could off the stricken hard drive, I put it back in the PC it came from and got back to my Bart antivirus scanners etc.

Rungbu.A

Four out of five of the initial "detect only" scans detected the same files as infected, but each called the virus something different. One called it a generic trojan, another called it SillyWorm, each with a high variant suffix. Only Sophos gave it the unique name of Rungbu.A, though their site only had a descriptive page for the Rungbu.B variant. The sixth scanner was set to kill, and did; thereafter there was nothing for the remaining scanners to detect.

Reading the decription's Advanced page revealed this malware to be anything but "generic". It left the system tattoo'd so that I had to Regedit before I could stop Windows from hiding file name extensions.

We get annoyed when vendors don't patch known exploitable surfaces, and highly irate when there are ITW (in the wild) malware already exploiting those surfaces. Yet we've seen so many malware with double file name extensions such as README.TXT.pif, and these raw code file types can and do set their icons to match the faked file type.

But hey, not a problem; it "works as designed".

Re-entry

Finding the plethora of vintage application disks for this PC would not be fun, so I decided to preserve the old installation instead. I called the site and asked them to find the disks if they could, in case I'd have to rebuild, then set out to fix the installation.

First, I partitioned a replacement hard drive (a used 40G, jumpered to act as a 32G in deference to the old PC's BIOS limitations) and copied everything to one of the logical volumes. Then I fresh-installed Windows 98, and copied that subtree to the logical volume as well. Next, I copied everything except the old Windows child subtrees into place, then finally identified and copied the recovered child subtrees over what was installed with Windows.

All of this was done from DOS mode, but I couldn't extract recovered registries etc. from the latest RB*.CAB from there, so I had to go back into Windows at this point. That crashed Explorer, so I set shell=Winfile.exe in System.ini, and from there I could Extract the registry files. Back to DOS mode to drop in these registry files, as well as older backed-up Vmm32.vxd and "Exit to DOS.pif", and now everything looks OK - though I'll try everything out in case there are needed files that are missing from the Windows base directory.

Duhfaults are forever

It's a good thing the variant of Rungbu that infected this PC didn't also put "hide hidden and system files" into effect, and that we don't use Microsoft's duhfault settings. If we did, there would be no visible indication the system was malware'd; we'd see only one "Some Name" file, which would appear to "open" just fine (the malware code runs invisibly and then spawns and opens the original Word document). Unless someone tried to change the Explorer settings, and became puzzled when the changes didn't "stick", there's be no indication that anything was wrong.

And that means the companion malware files would have found their way into every data backup, too.

It's all very well saying "it's only the default setting; you can change it", but defaults are forever. These unsasfe defaults are all you get in "Safe Mode", will recur after "just" formatting and re-installing Windows, will be the baseline for every newly-created user account, may be re-asserted by domain servers or when account rights are limited, and will be what users see whenever they use arbitrary PCs elsewhere. Defaults should always be truly safe!

2 October 2006

Vitsa's Maintenance OS

This is one of the best bits of news I had from the Vista Labs a few months back! We were told to spread the word about what wasn't NDA, but this item was NDA at the time, so I had to sit on it. But today' Google( Vista boot DVD WinPE ) shows it's public knowledge now :-)

http://www.apcstart.com/site/jbannan/2006/08/1082/windows-pe-20-a-tiny-version-of-windows-for-system-maintenance

This turns what would have been a crisis (would Bart CDR boot be compatible with Vista's NTFS, registry, etc.?) into what may be a reason for cautious consumers to favor Vista over XP.

I tested Vista beta 2 a while ago in a bit more depth than recent time permits with the newer Customer Preview build I have now, and it's certainly come a long way since that earlier build. I specifically wanted to test the mOS, and that turned out to be very interesting indeed...

Those familiar with Bart PE would guess what I'd be looking for first - can it boot off a USB stick? Can you hot-swap USB flash drives? Can you use the optical drive or will the system crash if you eject the Vista boot DVD? Is there a GUI?

No, there's no GUI - it's more like Safe Mode Cmd Only, which is a good thing in many ways. I'd have been worried in Explorer was there as the shell, in case Vista's richer shell offered exploit surfaces to malware on the maintained system.

Yes, you can eject the boot DVD! In this recent build I tested, the Vista installation DVD is the mOS boot disk, and just as you'd UI your way to Recovery Console after booting an XP CD, so it is that you GUI your way to "command prompt", which is likely to be WinPE 2.0 itself.

After booting, the DVD gets a different drive letter, compared to the booted OS files. The free space and a few other cursory tests indicated these were different volumes, and neither is an alias of hard drive space. You can eject the Vista DVD, insert other CDRs or DVDRs, and use them directly. I suspect the mOS runs from a RAM drive - and it worked quite happily in 512M RAM. What I didn't check was whether it uses a page file on the HD.

Vista development takes off from the most recent Server 2003 SP1 code base - and this is a good code base for a mOS, because it no longer resets the USB during the boot process, as XP SP2 does. So the odds are favorable for booting Vista mOS off USB flash drives, etc.

Unlike a Bart boot, Vista mOS will "see" USB flash drives inserted and changed on the fly - they don't have to be present at boot time, as they do with Bart, and swapping them is OK too. (Tip for Bart users; a memory card reader present at boot will generally allow hot-swapping of cards after boot - so I share SD cards between Bart sessions and my camera, instead of using slower and more write-limited flash drives).

Running tools from Bart CDR

Just for laffs, I ejected the Vista DVD and popped in my Bart CDR. The nu2menu (the standard "Start buttom" menu shell for Bart) worked fine, and many of the tools worked too. Because the Bart drive letter is not the same as the boot drive letter, my own "Is this booted or Autorun?" batch file logic, e.g....

Set Prog=Ad-Aware.exe
Set Launch=%~dp0..\RunScanner\RunScanner.exe
Set Opt=/m /t 0

...

If "%SystemDrive%"=="%~d0" (
Start %Launch% %Opt% %~dp0%Prog%
) Else (
Start %~dp0%Prog%
)

...concludes Bart is being run from the "native" system OS, and not as the booted OS.

That means my tools weren't running through RunScanner, which is probably prudent at this stage. Yes, that means registry-orientated tools such as AdAware or HiJackThis will not "see" the HD installation's registry, but until we know RunScanner and legacy registry access methods are compatible with Vista's registry, it's safer this way.

Many tools didn't work, because they relied on files and settings within the running OS. The Bart plugins for these tools would have included these in the Bart mOS, but that's not the OS that's in effect here - so if these resources aren't in the standard Vista code set, then the tools won't work. That's to be expected; after all, if I'd just scraped them onto Bart without using the plugin system, they wouldn't have worked there either.

All this testing was with the original Vista DVD - I haven't gone as far as building a new Vista mOS boot disk, nor have I explored "plugging in" tools as one does for Bart. I'm not sure if either of these things would be possible, or whether the answers would change between the build I tested and the final release.

Ah, for the time to really explore this stuff!

Conclusions

It's really good news to see a mOS for Vista, even if it's still not really orientated to mOS work. For example, it won't operate unless there's a visible Vista installation on the hard drive, and the RAM testing component writes to and then boots from the hard drive installation - both of which are bad practices when dealing with systems that are really ill.

I think this is because the mOS is still rooted in its origins as a "(p)re-install environment", originally intended for use on perfect fresh hardware. It was somewhat in response to this, as well as seeing some Bart off-shoots that also break some mOS best-practices, that prompted an earlier "How to design a maintenance OS" post in this blog.

The important things is that it's there, on the installation DVD (a break-through, if you'd ever peered longingly at MS WinPE though the previous licensing sphincter) and that the architecture seems fundamentally sound. It needs to be tested more rigorously to see how well it stays within the rules of mOS best practice, but it's already more than I'd dared hope for!