30 August 2005

When it all comes together...

Every once in a while, one has a case that illustrates the value of changes in default practice that one's made over the years. Here's one...

A system came in because Eudora had "lost all the mail".

Indeed; the entire "My Documents" object had been punched out; not in Recycle Bin either. Score is Murphy 1, Chris 0 so far.

Fortunately, this data set was on FATxx not NTFS, so the trail did not end there - I could go in with UnErase and DiskEdit to attempt recovery. So now the score is Murphy 1, Chris 1.

Normally, deleted data would be safer from overwrite than you'd expect, because I relocate data off C: (thus avoiding incessant temp, TIF, swap writes). Murphy 1, Chris 2. Plus I disable SR on D:, given that there's no core code there anyway, so that should avoid that source of spontaneous writes to (what could be at any time) at-risk disk. Murphy 1, Chris 3.

But this system had re-duhfaulted to turning on SR (with maximum disk use, of course) for all volumes, probably as a side-effect of disabling and re-enabling SR as a means of clearing it. So when I went in with my tools, I found the data set not only deleted, but also overwritten. Murphy 2, Chris 3.

Fortunately, the user had left the PC running one night a week, which meant my overnight auto-backup Task ran once a week. So I could go F:\BACKUP and choose the latest of the last 5 of such backups, and thus recover all data, even though the user has never explicitly initiated a backup in years. If the PC was running every night, perhaps they'd lose 1 instead of 7 days work, but even so, it's quite a win; Murphy 2, Chris 4.

Plus they are using Eudora for email, which separates it into malware-safe messages in mailboxes, and malware-risky attachments that can be stored somewhere else. Eudora doesn't run scripts in messages, and can be prevented from using IE's code to interpret them, so the messages really are malware-safe. So any data backup on a system I set up will automatically include the email stores; Murphy 2, Chris 5.

However, to restore this data, I'd have to overwrite whatever deleted data hadn't been destroyed already - Murphy 3, Chris 5. The client wants the PC back RSN, so what do I do; take an extra day searching raw disk for loose data, or restore their backup and close that door forever?

Fortunately, I can have my cake and eat it, because the volume I store data on is a tiny FAT16, 2G in size. So I can simply peel off the entire volume as 4 CDR-sized slabs of raw sectors, paste that onto another HD somewhere, and carry on doing deep recovery while the PC's back in the field and working on the data I restored. Murphy 3, Chris 6.

Security is not the only thing that is "a process"; the same could be said for working around dumb-ass vendor duhzign and duhfaults - and Murphy wins whenever the vendor's code discards rather than respects your choice of settings!

6 August 2005

Safe Fall-Through: Not...

I've just worked on a system with a BIOS that tried to get smart, and IMO got it horribly wrong.

This was a working PC that came in for a RAM upgrade, so I set boot order to not boot the HD at all (I want to avoid any chance that bad RAM might have to eat the installation) and booted a 1.44M MemTest86, which promptly rebooted. So I tried my CDR, which uses a different version, and that showed RAM errors. And so on, etc.

At some point, a boot POST phase had BIOS prompt me to enter Setup, as it had noticed several failed boot attempts, concluded all was not well, and thus had (irreversably) flushed all CMOS settings back to defaults.

What's wrong with that picture? Firstly, the notion that factory-set defaults are safe, when in fact they could be incompatible with bootability. Secondly, BIOS duhfaults are often self-serving or controversial, such as disabling S.M.A.R.T. monitoring of hard drives. Thirdly, this re-defaulted hard drive bootability, which is often a bad idea in such circumstances (and the reason why I had specifically prevented this). Fourthly, the change is irreversible and permanent, whereas the cause could have been unrelated and transient (such as my corrupted 1.44M diskette).

There's more, such as malware that could boot-fatigue its way to weaker BIOS-level defences such as blocked flash BIOS updates. I'm sure you can think of others, too.

Bad BIOS, no biscuit :-)

4 August 2005

Know Your Nemeses

One of the biggest benefits of theory is spotting inevitable pain points, before wasting resources on longer scenic routes that just bring you back to the same crunch later.

Another is to identify when it's insufficient to bet the farm on one of several parallel strategies, because these strategies do not fully encompass each other after all.

Let's apply these concepts to that old hobby-horse of mine - which just so happens to be one of consumerland IT's most common crises - management of active malware.

We know that malware can embed itself in the core code set, hold control so that other tasks can't start, detect system changes, and take punitive action. That's enough to predict that formal "look-don't-touch" detection scanning will be safe, but that informal detection scanning and formal clean-up may not be, and informal clean-up is even less likely to be. By "formal", I mean "without running any code from the infected system".

From that I conclude the only lasting SOP to detect malware safely is to do so formally, without leaving any detectable footprints in the system being scanned. I also conclude that one can't predict an always-safe SOP to clean active malware, so it's best to unlink the detection and cleanup phases of the operation so that off-system research on what has been detected can guide the cleanup process around any caveats that may apply.

Maintain or wipe?

This is one of several common bunfights that assume one of the two alterantives fully encompasses the other. With good enough maintenance, you'd never need to suffer the collateral damage of "just wipe, rebuild and restore". With good enough "backups", you'd never need to bother with malware identification or cleaning, and suffer the risk of thinking everything's been cleaned when it has not.

One can point out that circumstances may force one approach or the other, and thus no matter how well you develop one strategy, you cannot afford to abandon the other. Or that adopting a "wipe, rebuild and restore" strategy does not obviate the need to identify malware, in case it is in the "data" you restored or in case it's using an entry point that will be as open on the rebuilt system as it was on the originally-infected system.

Two further points arise from the above, when it comes to the thorny issue of backup theory. Firstly, we see that the pain point of distinguishing data from code is a nemesis that can't be avoided. Secondly, we see there's a classic backup conundrum of how to scope out unwanted changes you are restoring to avoid, when it comes to the "rebuild" part of "just wipe, rebuild and restore 'data'".

When code was expected to be a durable item, it was meaningful to speak of rebuilding the code base from a cast-in-stone boilerplate that dated from the software's initial release, and that is definately free of malware. Once you entertain the notion of "code of the day" patching, you cannot be sure your code base is new enough to be non-exploitable, and yet old enough not to contain malware that's been designed to stay hidden until payload time.

"Ship now, patch later" is another nemesis that won't go away - theory predicts that no matter how you design the system, you will always need bullet-proof code, just as no matter how you manage the system, you will always need to be able to safely identify malware. For example, how do you know your updates are not malware spoofing themselves as such? Whatever code applies that verification, has to be bullet-proof.

PS: Yes, I know how to spell "nemesis", singular.
You didn't seriously think you'd have only one, did you?