11 September 2008

Google Chrome - Born Dead?

Technorati tags: ,

Web browsers are serious risk surfaces, so there's always room for a better one - but so far, most new browsers are a lot dumber than the incumbents.

So it was with Apple's Safari, when that was ported to Windows as a beta -it was found to be exploitable within two hours of release.  So it is with Google's Chrome, which should be no surprise as it uses the pre-fixed exploited code from Safari!

By-design safety

Google talk a good talk, with these security features widely quoted:

  • Privacy mode for trackless browsing
  • Each tab runs in its own context, can't crash other tabs
  • Tabs run in a "sandbox", can't attack the rest of the system
  • Updates list of bad sites from Google's servers, to spot phishing scams
  • Web pages can open without browser UI elements (uhh... why is this "secure"?)

My first reaction when I read this was, "wow, Google scooped IE8's feature set", given that IE8 builds IE7's to phishing filter into a more comprehensive updated system, runs tabs in separate processes so they don't crash the whole browser, and Vista runs IE7 and thus IE8 in a safer "protected mode" for well over a year now.  I don't know whether Google's "sandbox" is stronger and safer than IE7-on-Vista's "protected mode", or whether either of these constitute an effective "sandbox".

Then I thought: Hang on, this is a newly-released beta, whereas IE8 has been in beta for a while now and has already been more widely released as beta 2... so who's first to offer these features?

I have to wonder why Google thinks it's a good idea to spawn web content (basically, stuff foisted from the web) as generic stand-alone windows, when we already have so many problems with pop-ups forging system dialog boxes to push fake scanners etc.  Why is it considered a good idea to let sites hide the address bar, when phishing attacks so often use misleading URLs that HTML allows to be covered with arbitrary text, including completely different fake URLs?

Code safety

Google talks about a large sandboxed system to interpret JavaScript, which sounds a bit like the idea behind Java.  Well, we've seen how well that works, given the long list of security updates that Sun have to constantly release to keep up with code exploits - so we'd have to hope Google are really good at crafting safe, non-exploitable code.

So it doesn't bode well, that the public beta they release is based on a known-exploitable code base, which is already being attacked, at a time when patched versions of this code are already being retro-fitted to existing Safari installations. 

Why would Google not build their beta on the fixed code base?  It's Open Source, and already available, why not use it?  Would it have killed them to delay the hitherto-secret web browser beta until they'd adopted the fixed code?  Or is the need to leverage pre-arranged hype etc. more important than shipping known-exploited code to users?  And why does the fixed release still report the exploitable code base version? 

Trust me, I'm a software vendor

How do you feel about vendors who silently push new code into your system and are slow to tell you what it does?  Here's what Google is quoted as saying about that:

"Users do not get a notification when they are updated. When there are security fixes, it's crucial that we update our users as quickly as possible in order to keep them safe. Thus, it's important for us to not require user intervention. There are some security fixes that we'll keep quiet because we don't want to disclose security vulnerabilities to attackers"

To me, that reads like a dangerous combination of Mickey-Mouse attempts at security via obscurity, plus supreme vendor arrogance. 

But wait, there's more...

Further things have come to light when searching for links for this post, such as installing in a "data" location (thus side-stepping Vista's protection for "Program Files") and a rather too-effective search that finds supposedly private things.

"Well, it's a beta", I can hear you say.  That's why it's safely tucked away deeply within Google's developer site, so that only the adventurous and knowledgeable will find it, right?  I mean, it's not as if it's being shoved at everyone via popular or vendor-set web pages so that it's gaining significant market share, is it?

10 September 2008

Compatibility vs. Safety

Technorati tags: ,

Once upon a time, new software was of interest because it had new features or other improvements over previous versions.  This attracted us to new versions, but we still wanted our old stuff to work - so the new versions would often retain old code to stay compatible with what we already had.

Today, we're not so much following the carrot of quality, but fleeing the stick of quality failure.  We are often told we must get a new version because the old version was so badly made, it could be exploited to do all sorts of unwanted things.  In this case, we want to break compatibility so that the old exploit techniques will no longer work!

Yet often the same vendors who drive us to "patch" or "upgrade" their products to avoid exploitation risks, still seem to think we are attracted by features, not driven by fear.

Sun's Java

I've highlighted the long-standing problems with Sun's Java before, and they are still squirming around their promise to mend their ways.  In short, they may still leave old exploitable versions of the Java JRE on your system, but it's no longer quite as easy for malware to select these as their preferred interpreter.  Still, you're probably safer if you uninstall these old JREs (as Sun's Java updater typically does not do) than trust Sun to deny code access to them.

Microsoft's Side By Side

Here's an interesting article on the Windows SxS (Side By Side) facility, which aims to appease software that was written for older versions of system .DLLs and thus ease the pain of "DLL Hell".  This works by retaining old versions of these .DLLs so that older software can specify access to them, via their manifest

How is that different from Sun's accursed practice? 

Well, is generally isn't, as far as I can tell, until a particular exploit situation is recognized where this behaviour poses a risk.  The current crisis du jour involves exploits against GDIPlus.dll - yep the same one that was fixed before - and the patch this time includes a facility to block access to old versions of the .DLL, leveraging a feature already designed into the SxS subsystem.

5 September 2008

The Most Dangerous File Type Is...

Technorati tags: ,

The most dangerous file type is... what?

Well, you pass if you said ".exe", and get bonus marks for ".pif, because it's just as dangerous thanks to poor type discipline, and more so because of poor UI safely that hides what it is".  But today's answer may be neither.

By the time a code file lands up on your system, there's a chance your antivirus will have been updated to know what it is, and may save the attacker's shot at goal.  But a link can point to fresh malware code that's updated on the server side in real time; that's far more likely to be "too new" for av to detect, and once it's running, it can kill or subvert your defences.

We need to apply this realization to the way we evaluate and manage risk, to up-rate the risk posed by whatever can deliver Internet links.  Think "safe" messages without scripts or attachments, and blog comment spam (including the link from the comment poster's name). 

Think also about how HTML allows arbitrary text to overlie a link, including text that looks like the link itself.  This link could obviously go to www.bad.com, but it's less obvious that www.microsoft.com could go there instead.  Then think how HTML is ubiquitously tossed around as a generic "rich text" interchange medium, from email message "text" to .CHM Help files.

13 August 2008

Bart Plugin for Spybot 1.6

See previous post about the new version 1.6 of Spybot SD and its issues.  I've updated my Bart plugin (tested with XP SP2 code base, Bart Builder 3.1.3) to address these, and offer it here, along with .REG for control in Windows.

To use the plugin, do this:

  • Navigate into your Bart Builder plugin folder
  • Create new folder called SpybotSD and enter it
  • Copy this post's plugin files to this location
  • Create a subfolder Files within this location and enter it
  • Copy the installed Spybot 1.6 subtree contents into here

The plugin is written with these assumptions and dependencies:

  • Standard Bart PE Builder with nu2menu as shell
  • Cmdow utility in Bart included Bin folder (not essential)
  • Paraglider's RunScanner plugin in plugin\RunScanner

Cmdow

Cmdow hides windows for processors, and I use it to hide the .CMD launcher; it's purely cosmetic, so if missing, the plugin will still work.  Because Cmdow can be dropped on systems and used maliciously, many scanners will detect it as a "potentially unwanted program", and fair enough!

RunScanner

RunScanner allows registry-aware tools to run relative to an inactive set of hives, rather than those of the booted OS.  Spybot has native awareness of this situation, so theoretically doesn't need RunScanner, but I find I get better detections if I use it anyway.  If RunScanner isn't present, you'd have to revise the .INF and .XML for it else it won't work.

SpybotSD.inf

This determines how Spybot 1.6 is integrated into the Bart CDR at build time.

; spybotsd.inf
; PE Builder v3 plug-in INF file for Spybot - Search & Destroy by Safer Networking Ltd.
; Created by Patrick M. Kolla, Jochen Tösmann and modified by cquirke for Spybot 1.6

[Version]
Signature= "$Windows NT$"

[PEBuilder]
Name="Spybot - Search & Destroy"
Enable=1
Help="spybotsd.htm"

[WinntDirectories]
a="Programs\SpybotSD",2
b="Programs\SpybotSD\Dummies",2
c="Programs\SpybotSD\Excludes",2
d="Programs\SpybotSD\Help",2
e="Programs\SpybotSD\Includes",2
f="Programs\SpybotSD\Languages",2
g="Programs\SpybotSD\Plugins",2

h="Programs\SpybotSD\HelpHTML",2
i="Programs\SpybotSD\HelpHTML\css",2
j="Programs\SpybotSD\HelpHTML\html",2
k="Programs\SpybotSD\HelpHTML\images",2

[SourceDisksFiles]
*.cmd=a,,1

files\blindman.exe=a,,1
files\SDMain.exe=a,,1
files\SDUpdate.exe=a,,1
files\SDWinSec.exe=a,,1
files\SpybotSD.exe=a,,1
files\TeaTimer.exe=a,,4
files\Update.exe=a,,4
files\advcheck.dll=a,,1
files\aports.dll=a,,1
files\DelZip179.dll=a,,1
files\SDHelper.dll=a,,4
files\Tools.dll=a,,4
files\messages.zres=a,,1
files\Tools.dll=a,,1
files\sqlite3.dll=a,,4

files\Dummies\*.*=b,,1
files\Excludes\*.*=c,,4
files\Help\*.*=d,,4
files\Includes\*.*=e,,1
files\Languages\*.*=f,,4
files\Plugins\*.*=g,,1

files\HelpHTML\*.*=g,,4
files\HelpHTML\css\*.*=h,,4
files\HelpHTML\html\*.*=i,,4
files\HelpHTML\images\*.*=j,,4

[Software.AddReg]
0x4, "Safer Networking Limited\Tweaks", "DisableTempFolderCleaning", 0x1
0x1, "Paraglider\RunScanner\SpybotSD.exe", "HKLM", "Software\Safer Networking Limited\Tweaks"

[Append]
nu2menu.xml, spybotsd_nu2menu.xml

Ensure that when you copy and paste these files, that they are free of HTML tags and formatting junk, and that long lines (e.g. the two lines in the last section) are not broken.  The above differs from Safer Networking's plugin for 1.5, in that:

  • It includes new code file sqlite3.dll
  • It suppresses automatic temp file clearance
  • It persists the above setting through RunScanner

The last is useful, so you don't have to use non-zero /t parameters in an attempt to delay registry redirection until Spybot has checked for the "disable temp clearance" setting.

SpybotSD_nu2menu.xml

This integrates Spybot 1.6 into the Bart menu system, and is referenced from the .INF during build time. 

<!-- Nu2Menu entry for SpybotSD -->
<NU2MENU>
<MENU ID="Programs">
  <MITEM TYPE="ITEM" DISABLED="@Not(@FileExists(@GetProgramDir()\..\SpybotSD\SpybotSD.exe))" CMD="RUN" FUNC="@GetProgramDir()\..\SpybotSD\SpybotSD.exe">Spybot 1.5.2</MITEM>
</MENU>
</NU2MENU>

You may change this to strip references to RunScanner, relocate it to a different menu flyout etc. or if you're fed up with disordered menus, you may simply leave out this file (; comment it out in the .INF) and add your reference directly to plugin\nu2menu\nu2menu.xml - once again, watch out for long lines; there is in fact only one line between the MENU ID and /MENU tags.

SpybotSD.cmd

This launches Spybot 1.6 from the nu2menu entry at runtime.

@Echo Off

SetLocal

Set Debug=
Set Prog=SpybotSD.exe
Set Launch=%~dp0..\RunScanner\RunScanner.exe
Set Opt=/t 0

If Not Defined Debug (
  Cmdow @ /HID
  %~dp0..\..\Bin\Cmdow @ /HID
) Else (
  Title Debug
  Echo.
  Echo ProgDir  %~dp0
  Echo Prog     %Prog%
  Echo Launch   %Launch%
  Echo Opt      %Opt%
  Echo.
  Pause
  Title %~dp0%Prog%
)

If Exist "%~dp0Files\%Prog%" Set ProgDir=%~dp0Files\
If Exist "%~dp0%Prog%"       Set ProgDir=%~dp0
If Defined ProgDir (
  If "%SystemDrive%"=="%~d0" (
    Start %Launch% %Opt% %ProgDir%%Prog%
  ) Else (
    Start %ProgDir%%Prog%
  )
) Else (
  Title Error - target executable not found!
  Echo "%Prog%" not found in %~dp0 or %~dp0Files\ - abort!
  Pause
  EndLocal
  Exit /b 1
)

If Defined Debug (
  Echo.
  Echo Done!
  Echo.
  Pause
)

EndLocal

Exit /b 0

You can edit this to strip out the "debug" part (define the Debug variable to enable it), as well as references to Cmdow and RunScanner.  By changing the variables, you can use this for other "easy" tool plugins (e.g. HiJackThis).

The logic goes as follows; if boot drive is same as where we are, then we're Bart-booted and need to apply RunScanner redirection, else we're not, and can run the tool directly.  This logic will also not use RunScanner if run from a WinPE 2.0 boot disk, which is OK with me as I don't know how safe RunScanner is for Vista hives.

An extra bit of logic is applied to deriving the path to the tool, so that the .CMD will work when run from the pre-build subtree.  This is also why the .XML uses relative "GetProgramDir()\..\" paths, rather than the more commonly used "GetProgramDrive()\Programs\" paths that break in the pre-build or pre-iso environments.

Windows .REG

You can also control some of Spybot's potentially unwanted behaviours via .REG in Windows, similar to the Software.AddReg section in the .INF above:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Safer Networking Limited\Tweaks]
"DisableTempFolderCleaning"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Paraglider\RunScanner\SpybotSD.exe]
"HKLM"="Software\\Safer Networking Limited\\Tweaks"

The second part of the above will pre-load appropriate settings for a Bart session using RunScanner, in case the RunScanner's parameters cause it to read its settings from the hard drive's hives.

Some settings can be changed interactively, e.g. disabling the intrusive Tea Timer feature, while others have to be excluded at the time of installation.  One of the latter, is the right-click context menu action to scan using Spybot, which annoyed these folks who offer this fix:

Windows Registry Editor Version 5.00

[-HKEY_CLASSES_ROOT\*\Shell\sdfiles]

[-HKEY_CLASSES_ROOT\Folder\shell\sdfiles]

[-HKEY_LOCAL_MACHINE\SOFTWARE\Classes\*\Shell\sdfiles]

The * association is applied to all things, hence all things can be right-clicked and scanned.  There's an Undo .REG in the same post in that thread.

12 August 2008

Spybot 1.6 and Bart PE

Technorati tags: , ,

Malware scanners tend to focus on resident protection rather than intervention and clean-up, but Spybot has always had a clue there.  Not only does Spybot explicitly support Bart PE as a formal scanning platform, it can also be aware of inactive registry hives, e.g. if you were to drop an ?infected hard drive into a Windows host system to clean it from there.

Bart has a plugin facility to integrate tools, and whenever there's a new version of a plugged-in tool, there may be changes required, or new unwanted behaviours to work around.  Such is the case with the new Spybot 1.6

Spybot 1.6 plugin changes

A Bart plugin is a set of files that control how a program is integrated into a Bart CDR.  Build-time instructions are defined in an .inf, menu integration via an nu2menu.xml, runtime control via a .cmd (if needed), and human documentation via an HTML file.

The .inf defines what files are to be copied to the CDR and where they are to be located, in the SourceDisksFiles and SourceDisksFolders sections.  If you've used SourceDisksFiles to explicitly name every file from within Spybot 1.4 or 1.5 to be copied to CDR, and you then drop in the Spybot 1.6 file set and build a new Bart disk, then you'll find Spybot will fail to launch from the disk.

If so, you can fix this by adding a line to include sqlite3.dll, which is a new file not present in earlier versions of Spybot SD.  Or you can use wildcard syntax to include all dll files, i.e. *.dll as files to be included.

Unwanted behaviour

Spybot 1.6 has a controversial new feature; it deletes Temp files when it starts up.  This is "controlled" by a 6-second dialog box that appears as Spybot starts up (so if you start it and walk away, you'll miss it) and defaults to "Yes, delete temp files".

This is a bigger problem within the Bart environment, which often has troublesome graphics due to unrecognised display chipsets.  In my first Bart session with Spybot 1.6, I expected the dialog, but it appeared with blank buttons.  By the time I checked out what button was what, testing on another PC, the 6 seconds were up, and I'd lost material I'd have preferred to include in further malware scans.

There is a rather obscure fix for this, which I will add to my Bart plugin's .inf file, using one of the registry modification sections.  If using the RunScanner plugin to launch Spybot (should not be required, as Spybot "knows" about such needs), then you'd want to delay the RunScanner redirection until this value had been read by Spybot after starting up - else it will look for it in the inactive (target) hives instead.

9 August 2008

Lazarus of Bad Hard Drives

Technorati tags: ,

Here's a failure pattern worth keeping in mind:

  • Unbootable OS
  • Attempts to access hard drive lock up or fail
  • Hardware diagnostics show or imply bad sectors
  • You image the raw partition to a good hard drive
  • Data access and even OS bootability miraculously OK

This is Lazarus of Bad Hard drives, as opposed to Lazarus of Bethany.

What's happening here, is that effects from deeper abstraction layers are creating what appears to be unfixable problems in higher layers.  What is counter-intuitive is that fixing the underlying layer can fix the upper layers too, i.e. that the state of these layers may not be irreparably botched by the lower-layer failure.

So don't give up hope, if you hit the first three items in the list above.

27 July 2008

This Hard Drive, Which PC?

Technorati tags:

Let's take two unrelated ideas and draw them together...

If you take Fred's brain and transplant it into Martin's body, have you performed a brain transplant on Martin, or a body transplant of Fred?

If the processor is your computer's brain, then the computer's hard drive is your mind.

Do you think the second sentence should have been "...the computer's hard drive is the computer's mind"?  Both statements are true, but from your perspective, the first may be more important.

Why might this matter?

The practical importance of this arises if you take a hard drive out of a PC, and then come back after a while with that hard drive in your hand, and a collection of PCs without hard drives on the bench.

It's easy to tell who the hard drive belongs to, because the files it contains will be full of cues.  But which PC belonged to that particular user?  Less easy, because all information uniquely linking that computer to the user is on the hard drive.  Unless you have kept some other record, e.g. serial number tracking, ownership sticky notes, arrival photos etc. you could have a problem.

Solving this problem

On old PCs that don't auto-detect hard drives, you can examine CMOS to look for CHS geometry etc. that matches the hard drive in your hand.  But new PCs generally don't persist CHS or other hard drive parameters in CMOS; they re-detect such devices on POST instead.

You can examine the hard drive's files to look for links to the particular hardware, e.g. an OS product key that matches a case sticker, or a collection of device drivers that map to the rest of the PC hardware. 

That is not as easy as looking for cues to the user of the PC; you may have to bind registry hives and look for cues in there.  You could do that implicitly by CDR-booting into Bart and using RunScanner to wrap Regedit or other tools so they map to the hard drive installation's hives, or you can do that explicitly by running Regedit from any suitable host OS, and manually binding the hard drive's hives under HKLM.

Don't boot in the wrong PC!

Whichever approach you use to examine the hard drive, it's crucial not to allow the hard drive to boot in the wrong PC, in all but the most trivial of OSs.  DOS would be safe, but anything more complex is likely to go wrong for various reasons.

Win9x will use Vmm32.vxd as the core driver set upon which all other drivers are loaded, and that core driver code was derived from the particular PC hardware that was in effect when Windows was installed on it.  If it's incompatible, then the PC will crash before the OS has reached sufficient "consciousness" to Plug-n-Play. 

This is a common crisis when changing the motherboard under a Win9x system, and can be solved by rebuilding a new Vmm32.vxd appropriate to the new hardware.  Yes, there's one on the installation disk, but it's an empty stub upon which Windows Setup builds the "real" file at install time.

You could rebuild Vmm32.vxd by re-installing the Win9x over itself, but that is messy; breaks patches and subsystem upgrades, loses settings, and so on.  Or you could do a fresh install of Win9x on a different hard drive, harvest the new Vmm32.vxd from there, and drop that into the "real" hard drive from DOS mode.

Having got this far, Windows should now boot to the point that Plug-n-Play can detect the rest of the new hardware and nag you for drivers.  Expect problems from hardware-specific code that was not installed as drivers, and is thus not disabled when PnP detects the hardware as no longer present - think packet-writing CD software, modem fax and especially voice software, hardware-specific Properties tabs added to Display Settings, etc.

XP, Windows 2000 and older NT-based OSs may fail in a similar way, though the installation-time hardware-specific code file will he the HAL (Hardware Abstraction Layer) rather than Vmm32.vxd - same difference, in other words. 

You will probably have to do a repair installation to fix this, with similar impact as installing a Win9x over itself.  I'm not sure if cleaner fixes would work.

XP may have another surprise for you, if you aren't BSoD'ed by a HAL compatibility STOP error; Windows Product Activation may see the changed PC as "too different", and trigger the DoS (Denial of Service) payload.

Vista doesn't have the same HAL mechanism as earlier NT, so it may start up without a STOP error, but you'd still have the Product Activation payload to contend with - as you would with MS Office versions XP and later, regardless of OS.

I can't tell you how Linux, BSD or MacOS (pre- or post-BSD) would fare when booted in the "wrong" computer, but I suspect similar issues may apply.  Aside from artificial crises caused by deliberately malicious product activation code, you may still have the core problem of needing hardware-specific information to boot the logic that can respond to altered hardware.

Safety First

If you are not certain that you're putting the hard drive in the correct PC, then you should back up C: in such a way it can be restored as a working OS, if mistakes lead to a PnP or product (de)activation mess.

For a Win9x, that's as simple as copying off all files from the root directory, all of the OS subtree, and (easiest) everything in "Program Files".  For best-practice on WinME, you should preserve the _Restore subtree too.  These are the only contents of C: that should be affected by the OS "waking up" in the wrong PC.

For NT, it's messier, because Windows 2000 and later (I haven't tested earlier NT) will not boot after a file-copy transfer from one hard drive to another, even if you meticulously include all files.  So you're obliged to do a partition-level copy (e.g. save C: as a partition image via BING) to maintain undoability.

25 July 2008

Should You Detect Old Malware?

Technorati tags: , ,

We've gone from thinking of software as a "durable good" to evolving under selection pressure.  This certainly applies to malware and blacklist-driven scanner countermeasures, which are assumed to become either extinct or irrelevant over time. 

Who needs old scanners?

You may want to keep an old scanner if it still detects stuff other scanners can miss, or if it was the last version that ran in your environment - though as an "extra" on-demand scanner, not as sole resident protection, of course.

For example, Kaspersky's CLI scanner no longer runs under Bart, and I haven't adapted AdAware 2007 (which I don't particularly like) to run in Bart either.  There are still manual updates for AdAware SE, so that's still "current", and Kaspersky CLI still works in Safe Cmd.  However, I may want the safety of formal scanning with Kaspersky CLI, and for that, I'd need the last "old" version and updates that still worked from Bart.

Another example is McAfee's Stinger.  Just as an "old" Kaspersky CLI may find stuff other updated scanners will miss, so it is with the even older Stinger - it's particularly good at catching TFTP-dropped malware and some bots, both of which are likely to be found in an NT that has not patched RPC against Lovesan et al.

F-Prot's DOS and Win32 CLI scanners are also discontinued, i.e. no further updates, but are still useful.  Specifically, these scanners will often detect "possibly new version of ... Maximus", and sometimes that rather loose and false-positive-prone detection still finds things others miss.  These scanners also find other false positives unrelated to Maximus, so handle with care.

Does old malware matter? 

Firstly, you may encounter vintage malware on vintage systems and diskettes (e.g. boot sector infectors on old DOS or Win95-era PCs, old MS Office macro infectors in "documents" from old systems).  Malware of that era were mostly self-contained and fully automated, and often had destructive payloads, so they will still bite... so you'd want to detect them.

Secondly, think of the spammer equivalent of the guy who still uses a PC built from old parts running MS Office 2000 on Windows 98, because these old feeware programs pre-date automated defence against piracy - i.e. no software budget.

A less-obvious feature of botnets is that those who own them, don't want folks controlling them for free.  So if you want to send spam through a modern botnet, you will probably have to find someone and pay them.

On the other hand, old bots that are still in the wild, may have been cracked so they can be operated for free - or may simply pre-date the rise of malicious info-business and thus lack modern mechanisms to block control.  In which case, our impoverished spammer may use these instead - so it may still be prudent to detect and kill them off, especially in the context of poorly-patched or defended systems (e.g. unpatched Windows 2000, no firewall, outdated or missing av).

7 July 2008

XP Repair Install

Technorati tags: ,

Re-installing Windows XP isn't a good idea as a blind first step in troubleshooting problems, but there are specific contexts where it is necessary, as the cleanest way to "make things work".  One of these contexts is after a motherboard change that invalidates XP's core assumptions, typically causing a STOP BSoD on any sort of attempted XP boot (from Safe Cmd to normal GUI).

This is the situation that edgecrusher is in, as posted in comments to the previous post in this blog, and this post is my response.

Before you start

Firstly, I'm going to assume you have all the necessary installation and drivers disks, have your XP product key or retrieved this via Nirsoft Produkey or similar, excluded malware, and verified RAM overnight e.g. via MemTest86 or MemTest86+ and hard drive e.g. via HD Tune

Make sure the edition (OEM vs. retail, Home vs. Pro, etc.) of the XP installation disk you will use for the repair install is one that matches your product key, that the disk actually has the ability to do a non-destructive install (as many OEM disks do not), and that the disk can be read without errors (as tested by copying all files to a subdir on the hard drive before you start).

It's a good idea to make a partition image backup of your XP installation before you start, using something like BING.  Simply copying off every file is not enough, because unlike Windows 9x, XP will not work when copied in this way.

Also before you start, you may want to uninstall any OS-bundled subsystems that you've upgraded past the baseline of your XP installation disk, such as IE7 or recent versions of Windows Media Player.  Things are cleaner and more likely to be "supported" if you uninstall these before the repair, and re-install them afterwards, plus you'll have valid entries in Add/Remove Programs should you need to uninstall them again later (e.g. as a troubleshooting step).

Several sites describe the XP repair install process, starting from how to start the process, and going on to a step-by-step slide show or providing more detail.  In this post, I will mention a few specific gotchas to avoid...

137G capacity limit

If your hard drive is over 137G in size, then the Service Pack level of the Windows XP installation disk must be at least SP1 to install, and SP2 to live with.  In other words, you cannot safely install XP "Gold" (SP0) on a hard drive over 137G, and should apply SP2 or SP3 over an XP SP1 installation. 

If your install disk pre-dates SP1, you need to slipstream a later Service Pack into this and make a new installation disk that includes SP1 or later, built in.  Your other option is to install XP "Gold" onto a hard drive smaller than 137G, apply SP1 or later, and then use a partition transfer utility to copy the partition to the larger hard drive where the partition can then be resized to taste.

XP "Gold" has no awareness of hard drives over 137G and is very likely to mess them up.  XP SP1 is supposed to be safe on such hard drives, but there are some contexts where the code that writes to disk is unsafe and may cause corruption and data loss; from memory, these contexts typically apply to C:, e.g. writing crash dumps to the page file.  XP SP2 and SP3 are truly safe over 137G.

F6 driver diskette

Yep, you read right; that's "diskette" as in "ancient crusty old stiffy drive"! 

Most current motherboards have S-ATA hard drive interfaces that are not "seen" by the native XP code set (affecting Bart and Recovery Console boot disks as well).

The trouble is, the latest PCs often have no diskette drive, and the latest motherboards often have no legacy diskette controller.  You may come right with an external diskette drive plugged in via USB.  You'll also have to find and download the relevant driver diskette image and make a diskette from this, if yours is missing or unreliable.

If you use a USB keyboard, and this is not initiated at the BIOS level, then your F6 keystroke to read the driver diskette will be missed.  If so, you can plug in a PS/2 keyboard... as long as your new motherboard has PS/2 sockets; the newest ones don't.

Sometimes your mileage may vary, depending on the mode that your S-ATA is set to operate in CMOS Setup.  RAID and AHCI will generally not be "seen" natively by XP's code, whereas IDE mode may be.  But some nice S-ATA features may not work in IDE mode, e.g. hot-swapping external S-ATA or NLQ, and changing this after XP is installed may precipitate the same crisis as the motherboard swap... requiring a repair install to fix, again.

All of this is a reason why I consider the XP era to be over, when it comes to new PCs.  I appreciate how old OSs run beautifully fast on new hardware, and how attractive that is for gamers in particular - but XP's getting painful to install and maintain, and this is going to get worse.

Duplicate user accounts

Later in the GUI part of the installation process, you will be prompted to create new user accounts.  You can try to skip this step (best, if that works... I can't remember if it does), or create a new account with a different name that you'd generally delete later. 

But many users are likely to create a new account with the same one as their existing account, and that's likely to hurt... 

The two accounts will show the same name at the Welcome screen, but both will be selectable via this UI; I have no idea what will happen if you were to force the more secure legacy logon UI, which requires the account name to be typed in.

Each account will have a unique Security Identifier (SID), which is the real "name" used behind the scenes - but you can't login with that.  There will also be separate account subtrees in "Documents and Settings"; the one with the plainest name is likely to be the original, and the one with numbers or the PC name added to it is likely to be for the newly-spawned account.

At this point I'll mention another user account hassle that I generally don't see, because I avoid NTFS where I can.  If you find you can "see" your old user account's data, but aren't permitted to access the files, then you may have to "take ownership" of these files from a user account that has full administrative rights. 

This issue is well documented elsewhere; search and ye will find!

Broken update services

It's a given that the "repair" is going to blow away all patches subsequent to the baseline SP level of the XP installation disk you are using, unless you've slipstreamed these into your installation disk.

What's less obvious is that after you do the "repair" install, you won't be able to install updates.  It doesn't matter whether you try via Automatic Update, Windows Update or Microsoft Update, the results will be the same; the stuff downloads OK (costing you bandwidth) but will not install, whether you are prompted to restart or not.

The cause is a mismatch between the "old" update code within the installation CD, and the newer update code that was controversially pushed via update itself.  I can see Microsoft's logic here; if you ever wanted updates to work (e.g. you'd chosen "download but don't install", or disabled updates while planning to enable them later), then the update mechanism has to be updated - but doing so, invalidates the original installation disk's update code.

This topic is well-covered, as is the fix; manually re-registering a number of .DLLs that are needed for the update process to work.

Broken settings

It's often asserted that a repair install "won't lose your settings", and is yet waved around as a generic fix for undiagnosed problems.  Part of why it sometimes works as a "generic fix" is precisely because it can and does flatten some settings, which may have been deranged to the point that the OS couldn't boot!

So if you do apply any non-default settings, you should check these to see if they've survived.  I always check the following, and can't remember with certainty which ones survive and which don't:

  • System Restore (may be re-enabled on all volumes)
  • System Restore per-volume capacity limits
  • Automatically restart on system errors
  • RPC Restart the computer on failures (may survive)
  • Show all files, extensions, full paths, etc. (may survive)
  • NoDriveTypeAutoRun and NoDriveAutoRun
  • Standard services you may have disabled
  • Hidden admin shares, if you'd disabled them
  • Recovery Console enabling settings
  • AutoChk parameters in BootExecute setting
  • Shell folder paths
  • Windows Scripting Host, if you'd disabled it
  • Settings detail in IE, including grotesquely huge web cache
  • Windows Firewall settings; may be disabled if < SP2 !!
  • Anything else you've dared to change from duhfaults

It's particularly crucial to enable the Windows Firewall (or install a 3rd-party alternative) before letting your PC anywhere near any sort of networking, especially the Internet, if your installation is "Gold" or SP1.  Not only do these dozeballs duhfault to "no firewall", they're also unpatched against RPC (Lovesan et al) and LSASS (Sasser et al) attacks, so you'd be "open and revolving".

By now, the original PoC Lovesan and Sasser worms may be extinct, but these exploits are often crafted into subsequent workaday bots and worms.  You may still get hit within an hour of plugging in the network cable if so, and probably before you can pull down updates for the OS, antivirus scanners, etc.

Cause and Distance

"Send this habitat module to Sirius Prime, now!"

' OK ...'

Right-click habitat module, Properties, Location tab, highlight "Earth", enter new text "Sirius Prime", press Enter.  Drone work, really, but Fred just counts himself lucky at being able to find a summer job.

Sharp distance runaround

I find it interesting that despite the strange and counter-intuitive models we've developed for sub-atomic matter, we still cling to the Newtonian idea that cause is carried by force, and force involves objects banging into each other.

So we've had "The Aether" before we could get our heads around empty space, the idea that waves must travel in a medium (neatly solved by counter-generating electro and magnetic fields), and the "problem" of action at a distance that is modelled on throwing particles around.

We understand space and time as interrelated through the speed of light, so that "distance" can be envisaged in either terms. 

Trapped in the mesh

Trying to resist the idea of desire-guided evolution, consider the need for senses (or sensors, if you like).  We sense what we need to attain, avoid or overcome, not what we can happily ignore as irrelevant.  Does hard drive S.M.A.R.T. monitor the tides of the ocean?  Nope.  Does the body monitor radiation levels?  Nope, as this had not been relevant during the timescale when shaped by selection pressure.

We are of the universe and are unable to transcend "distance" (be it conceptualised as time or space) at will, though we have a limited ability to physically move towards or away from things.  So we perceive distance as a dominant property of our environment, shaping concepts such as "cause and effect" and "the arrow of time". 

But this perspective may be a platform-specific perception issue, rather than a universal truth.  Perhaps if we visualize things differently - e.g. consider the distribution of mass as a constant, and "distance" as a particular parameter, then some things may snap in to focus, such as gravity as a "curvature of space". 

Often a graph that has a shape that is hard to grapple with, becomes a tame line drawing when the scaling of an axis is changed; certain problems, such as shapes that tend towards but never reach zero, may resolve themselves.  So it may be with "distance".

After writing this, I found the articles I've linked to, along with this one.  It would probably be more enlightening to fan out from here than to read the post you have just finished reading  :-)

28 June 2008

XP SP3 "Stuck" Activation Dialog

Technorati tags:

You may see this failure pattern:

  • Windows XP (SP3) demands activation
  • You get the first dialog of the activation wizard
  • But no matter which option you choose, Next doesn't

Specifically; this is the first dialog page of the activation wizard, from which you choose "activate via Internet", "activate via telephone" or "no, I'll do it later".  When you press the Next button, the button appears to depress fine, but when you let it go, the dialog stays where it is. 

If you were trying to activate by phone, then that means you don't see the list of locations to call, or the key to read to the call center if you do call.  So when you call the activation center, the first thing they ask you to do ("please read me your installation ID"), you can't do, and frustration follows.

Context details

I've seen this once, in the following context.  A PC had suffered hard drive failure, and was also in need of RAM upgrade and software updates.  So I first repaired the hardware by imaging to a good hard drive, added more RAM, replaced the duff CD-ROM drive with a working one. 

Then I did "the prelim"; MemTest86 to verify RAM, Bart CDR boot to verify HD via HD Tune, file system checks OK, formal malware scans OK.

Next, I boot into Windows, and am not too surprised when it tells me I need to activate, as the hardware has "changed too much".  I deferred this, and did what I usually do when updating XP systems this month:

  • get off all networks and Internet
  • uninstall free AVG 7.5
  • uninstall Internet Explorer 7
  • move all $.. folders from OS subtree to another HD volume
  • defrag to consolidate free space
  • apply XP SP3 from offline installer
  • verify firewall is on
  • install free AVG 8
  • connect to Internet so AVG 8 can update
  • upgrade other software; Java, Acrobat Reader, Firefox etc.
  • allow Automatic Update to pick up IE7 and other updates
  • attempt activation before applying OS and IE7 updates

During this process, I restarted Windows several times for various reasons, but the activation dialog would not work.  It only worked after I applied the pending Automatic Updates; then after the restart that followed, activation was fine.

Suspected cause

I suspect that pending updates cause the activation dialog to "stick".  This may apply specifically to XP SP3 or be a general XP issue that I had not encountered until now, as I seldom (if ever) have activation demands and pending updates at the same time. 

That situation can arise in the context of installing XP SP3, because:

  • you want to uninstall IE7 (or IE8 beta) before applying SP3
  • Windows Media Player falls back to old version
  • you can't install Media Player 11 as it won't "validate"
  • you can't install IE7 from pre-downloaded file
  • you can't use the Update web site until you activate

So you rely on Automatic Update to feed in the patches you want, but may feel the need to defer installation of these until you've activated.  This applied in my case, because a lot of the updates were for IE6 which I intended to replace with IE7 anyway - so before applying updates, I wanted to install Media Player 11 and IE7, so that the updates I downloaded and applied would be "after" these.

If my hunch about the cause of this failure pattern is correct, then this combination of circumstances can create a "deadly embrace" of cross-dependencies; can't activate until updates are applied, but user doesn't want to apply updates until the system is activated.

Can't install IE7 on XP SP3?

On "you can't install IE7 from pre-downloaded file"; this seems to be a different XP SP3 issue. 

Usually, I can at least initiate the IE7 install from a pre-downloaded installation executable, though this needs to be online so it can pull down updates to IE7 as part of the installation process.  But this fails after XP SP3 has been applied; instead, one has to induce an IE7 install via Automatic, Windows or Microsoft Update.  Sometimes it's offered as a critical update, other times not?

15 April 2008

When Add/Remove Doesn't Remove

Technorati tags: , ,

What do you do when you go to XP's Control Panel, Add/Remove Programs, find the software you want to remove, and the entry has no Remove button on it?

I found an answer in a forum thread, as follows...

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{Program Name}

If NoRemove is set to 1, the Remove button will be unavailable
If NoModify is set to 1, the Change button will be unavailable

Note that as the {program name} may be a CLSID, you may need to search for the product name (e.g. "Intel Audio Studio", using what you saw in Add/Remove) to locate the correct entry in which to find the relevant NoRemove setting.

Intel Audio Studio

In my case, I'd had to replace a failed Intel 945G chipset motherboard with a new Intel G33 chipset motherboard with attendant processor upgrade.  XP died on a BSoD STOP error on all boots, as expected (a self-serving product fragility that helps limit "license creep") so the next step was to "just" do a repair install... but that's another day's blogging.

The other motherboard came with Intel Audio Studio, which installs and uninstalls along with the sound drivers.  But when the motherboard changes, the old sound device vanishes from Device Manager, so you no longer have an UI from which the device (and thus drivers, and thus associated bundleware) can be removed.

Hence the problem: Add/Remove Programs has an entry for Intel Audio Studio, but that entry has no Remove button.  It "doesn't need it" (so implies the text within the entry) because it is "installed and uninstalled with the drivers".

The meta-bug behind the bug

Alun Jones said "Don't solve problems, solve classes of problems", and by meta-bugs, I mean the classes of problems behind the bugs you step on one by one.

Have you noticed how few USB-interfaced hardware vendors create driver installations that work with the Windows PnP detection, prompt, and install sequence?  Most vendors tell you to avoid that by first auto-running their installation CD or running their Setup.exe, and then plugging in the USB device only after the "drivers" are loaded.

One reason may be because the vendor wants to install a range of software that is broader than that handled by the device driver installation purpose.  And this is where the meta-bug comes in, because whereas drivers are suppressed when the device is not found, the rest of the bundled software is not.  This is what causes your Display Properties dialogs to crash when you replace your graphics card, if the old graphics card left device-specific code hooked into the Properties page.

In this case, the meta-bug caused an apparent inability to uninstall Intel Audio Studio.  With the new motherboard in place, a dialog appears on every boot to the effect that Intel Audio Studio doesn't work with the system's (different) audio, prompting my attempts to uninstall the software via Add/Remove Programs.

What is particularly annoying, is that Intel's web site has nothing I could find on this issue - either via their own site search, or combining www.intel.com with appropriate search terms in a general Internet Google search.

11 April 2008

Web Forum Login Blues

Technorati tags: ,

One of the challenges in web design and web forums in particular, is how to handle the requirement to log in for more functional access.

Whenever a login prompt pops up, folks will often simply leave, rather than log in, even if they are already registered at the site.  So immediate login isn't usually what you want to do with web forums; for one thing, you will lose folks who find your forums via search results that jump into forum content.

So web forums generally let you browse around until you want to do something that requires you to login, such as start new threads, reply to posts, etc.

Even at this point, I'm still inclined to give up rather than continue, when I see a login prompt.  Why?  Because I anticipate a hassle of not only login in, which I can handle, but having to re-navigate my way back to what I was doing - i.e. which forum, which thread, which one of 27 pages of posts (sometimes requiring 22 "next" because there's no "go to end" where the most recent stuff is found) and which post on that page, then the edit box I may have already started.

The Bart forums hosted here get it right; you sign in, and you are returned exactly to context you left before you had to log in.

Microsoft's web reader for their newsgroups gets it wrong, at least in the context of linking to their forums via other Microsoft sites.  Here's the repro; start here...

http://www.microsoft.com/windows/ie/ie8/welcome/en/default.html

...and then at the bottom right of the page, right-click, Open In New Tab (the method I used, a straight click will likely do the same) on "Community Forums", taking you here:

http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?dg=microsoft.public.internetexplorer.beta&cat=en_us_2BAF8EC5-645C-4477-A380-0F1CF6C102F9&lang=en&cr=us

So far, so good; we're in blah.blah.blah.ie8.blah which is where we want to be, and we start reading, then decide we want to post a new thread.  Oops, now we want to login, so we do, and when we get back, we are no longer in the context of the newsgroup we wanted to visit.  We didn't note the name of that newsgroup because we thought we'd always be able to link into it directly from the IE8 beta page.

Now I know Microsoft are battling to win over many of us usenet die-hards to embrace the web UI to their newsgroups.  This is IMO a serious stumbling block, not only for us, but also for folks discovering newsgroups for the first time, without usenet preconceptions and expectations.

The above is as tested on XP SP2 with IE8 beta 1 installed over IE7, IE8 Standards Mode, set to prompt on active content, all such prompts OK''d.

Workaround 1: Remember what the newsgroup is called, duh!  Not so easy in that it's not content that can be cut and pasted, in terms of the UI, and the nature of the data is a pain to remember.  Breaks the concept of "let the PC do it", too, plus that only gets you back as far as the start of the newsgroup, before you navigated within it.

Workaround 2: Log in, then go back to the IE8 page and repeat the link to the forums (which Microsoft may also refer to as "communities").  This time you will navigate into a logged-in state, and all will be well, though any navigation you did within the forum would have to be re-done.  For best practice (i.e. shouldn't be required, but...) remember to log out of both forum tabs when done.

30 March 2008

NTFS vs. FATxx Data Recovery

Technorati tags:

By now, I've racked up some mileage with data recovery in FATxx and NTFS, using R-Studio (paid), GetDataBack (demo), manually via ye olde Norton DiskEdit (paid), and free Restoration, File Recovery 4 and Handy Recovery, and a pattern emerges.

Dispelling some myths 

NTFS has features that allow transactions to be reversed, and there's much talk of how it "preserves data" in the face of corruption.  But all it really preserves is the sanity of the file system and metadata; your actual file contents are not included in these schemes of things. 

Further, measures such as the above, plus automated file system repair after bad exits from Windows, are geared to the interruption of sane file system activity.  They can do nothing to minimize the impact of insane file system activity, as happens when bad RAM corrupts addresses and contents of sector writes, nor can they ameliorate the impact of bad sectors encountered on reads (when the data is not in memory to write somewhere else, it can only be lost).

From an OS vendors' perspective, there's no reason to consider it a failing to not be able to handle bad RAM and bad sectors; after all, it's not the OS vendor's responsibility to work properly under these conditions.  But they occur in the real world, and from a user's perspective, it's best if they are handled as well as possible.

The best defences against this sort of corruption is redundancy of critical information, such as the duplication of FATs, or less obviously, the ability to deduce one set of metadata from another set of cues.  Comparison of these redundant metadata allows the integrity of the file system to be checked, and anomalies detected.

Random sector loss

Loss of sector contents to corruption or physical disk defects is often not random, but weighted towards those parts of the disk that are accessed (bad sectors) or written (bad sectors and corruption) the most often.  This enlarges the importance of critical parts of the file system that do not change location and that are often accessed.

When this happens, there are generally three levels of recovery.

The first level, and easiest, is to simply copy off the files that are not corrupted.  Before doing so, you have to exclude bad hardware that can corrupt the process (e.g. bad RAM), and then you make a beeline for your most important files, copying them off the stricken hard drive - even before you surface scan the drive to see if there are in fact failing sectors on it, or attempt a full partition image copy.  This way, you get at least some data off even if the hard drive has less than an hour before dying completely.

You may find some locations can't be copied for various reasons that break down to invalid file system structure, physical bad sectors, overwritten contents, or cleanly missing files that have been erased.  If the hard drive is physically bad, you'd then attempt a partition copy to a known-good drive.  If you want to recover cleanly erased files, or attempt correction of corrupted file systems, then this partition copy must include everything in that space, rather than just the files as defined by the existing file system.

The second level of recovery is where you regain access to lost files by repairing the file system's logic and structure.  This includes finding partitions and rebuilding partition tables, finding lost directory trees and rebuilding missing root directories, repairing mismatched FATs and so on.  I generally do this manually for FATxx, whereas tools like R-Studio, GetDataBack etc. attempt to automate the process for both FATxx and NTFS. 

In the case of FATxx, the most common requirements are to rebuild a lost root directory by creating scratch entries pointing to all discovered directories that have .. (i.e. root) as their parent, and to build a matched and valid pair of FATs by selectively coping sectors from one FAT to the other.

Recovered data is often in perfect condition, but may be corrupted if file system cues are incomplete, or if material was overwritten.  Bad sectors announce their presence, but if bad RAM had corrupted the contents of what was written to disk, then these files will pass file system structural checks, yet contain corrupted data.

The third level of logical data recovery is the most desperate, with the poorest results.  This is where you have lost file system structural cues to cluster chaining and/or the directory entries that describe the files.

Where cluster chaining information is lost, one generally assumes sequential order of clusters (i.e. no fragmentation) terminated by the start of other files or directories, as cued by found directories and the start cluster addresses defined by the entries within these.  In the case of FATxx, I generally chain the entire volume as one contiguous cross-linked file by pasting "flat FATs" into place.  Files can be copied off a la first level recovery once this is done, but no file system writes should be allowed.

If directory entries are lost, then the start of files and directories can be detected by cues within the missing material itself.  Subdirectories in FATxx start with . and .. entries defining self and parent, respectively, and these are the cues that "search for directories" generally use in DiskEdit and others.  Many file types contain known header bytes and known offsets (e.g. MZ for Windows code files) and this is used to recover "files" from raw disk by Handy Recovery and others - a particularly useful tactic for recovering photos from camera storage, especially if the size is typical and known.

Results

I have found that when a FATxx volume suffers bad sectors, it is typical to lose 5M to 50M material from a file set ranging from 20G to 200G in size.  The remainder is generally perfectly recovered, and most recovery is level one stuff, complicated only by the need to step over "disk error" messages and retry bog-downs.

When level two recovery is needed, the results are often as good as the above, but the risks of corrupted contents within recovered files are higher.  The risk is higher if bad RAM has been a factor, and is particularly high if a "flat FAT" has to be assumed.

In contrast, when I use R-Studio and similar tools to recover files from NTFS volumes with similar damage, I typically get a very small directory tree that contains little that is useful.  Invariably I have to use level three methods to find the data I want.  Instead of getting 95% of files back in good (if not perfect) condition, I'll typically lose 95%, and the 5% I get is typically not what I am looking for anyway.

Level three recovery is generally a mess.  Flat-FAT assumptions ensure multi-cluster files are often corrupted, and loss of meaningful file names, directory paths and actual file lengths often make it hard to interpret and use the recovered files (or "files").

Why does mild corruption of FATxx typically return 90%+ of material in good condition, whereas NTFS typically returns garbage?  It appears is if the directory information is particularly easy to lose in NTFS.  I don't believe all the tools I've used, are unable to match the manual logic I use when repairing FATxx file systems via DeskEdit.

Survivability strategies

Sure, backups are the best way to mitigate future risks of data loss, but realistically, folks ask for data recovery so often that one should look beyond that, and set up file systems and hard drive volumes with an eye to survivability and recovery.

Data corruption occurs during disk writes, and there may be a relationship between access and bad sectors.  So the first strategy is to keep your data where there is less disk write activity, and disk access in general.  That means separating the OS partition, with its busy temp, swap and web cache writes, from the data you wish to survive.

At this point, you have opposing requirements.  For performance, you'd want to locate the data volume close to the system partition, but survivability would be best if it was far way, where the heads seldom go.  The solution to this is to locate the data close, and automate a daily unattended backup that zips the data set into archives kept on a volume at the far end of the hard drive, keeping the last few of these on a FIFO basis.

One strategy to simplify data recovery is to use a small volume to contain only your most important files.  That means level three recovery has less chaff to wade through (consider picking out your 1 000 photos from 100 000 web cache pictures in the same mass of recovered nnnnn.JPG files), and you can peel off the whole volume as a manageable slab of raw sectors to paste onto a known-good hard drive for recovery while the rest of the system goes back to work in the field.

The loss of cluster chaining information means that any file longer than one cluster may contain garbage.  FATxx stores this chaining information within the FATs, which also cue which clusters are unused, which are bad, and which terminate data cluster chains.  NTFS stores this information more compactly; cluster runs are stored as start, length value pairs, whereas a single bitmap holds the used/free status of all data clusters, somewhat like a 1-bit FAT. 

Either way, this chaining information is frequently written and may not move on the disk, and both of these factors increase the risk of loss.  A strategy to mitigate this common scenario is to deliberately favour large cluster size for small yet crucial files, so that ideally, all data is held in the first and only data cluster.  This is why I still often use FAT16, rather than FAT32, for small data volumes holding small files.

Another strategy is to avoid storing material in the root directory itself (for some reason, this is often trashed, especially by some malware payloads on C:) and to also avoid long and deeply-nested paths.  Some recovery methods, e.g. using ReadNTFS on a stricken NTFS volume, requires you to navigate through each step of a long path, which is tedious due to ReadNTFS's slowness, the need to step over bad sector retries along the way, and the risks of the path being broken by a trashed directory along the way.

Some recovery tools (including anything DOS-based, such as DiskEdit and ReadNTFS)can't be safely used beyond the 137G line, so it is best to keep crucial material within this limit.  Because ReadNTFS is one of the only tools that accesses NTFS files independently of the NTFS.sys driver, it may be the only way into to NTFS volumes corrupted in ways that crash NTFS.sys!

Given the poor results I see when recovering data from NTFS, I'd have to recommend using FATxx rather than NTFS as a data survivability strategy.  If readers can attain better results with other recovery tools for NTFS, then please describe your mileage with these in the comments section!

27 March 2008

Why "One Bad Sector" Often Kills You

Technorati tags:

Has it ever seemed to you, that if there's "one bad sector" on a hard drive, it will often be where it can hurt you the most?

Well, there may be reasons for that - and the take-home should affect the way file systems such as NTFS are designed.

As it is, when I see early bad sectors, they are often in frequently-accessed locations.  This isn't because I don't look for bad sectors unless the PC fails, as I routinely do surface scans whenever PCs come in for any sort of work.  It's good CYA practice to do this, saving you from making excuses when what you were asked to do, causes damage due to unexpected pre-existing hardware damage.

Why might frequently-accessed sectors fail?

You could postulate physical wear of the disk surface, especially if the air space is polluted with particular matter, e.g. from a failed filter or seal, or debris thrown up from a head strike.  This might wear the disk surface most, wherever the heads were most often positioned.

You could postulate higher write traffic to increase the risk of a poor or failed write that invalidates the sector.

Or you could note that if a head crash is going to happen, it's most likely to happen where the heads are most often positioned.

All of the above is worse if the frequently-accessed material is never relocated by file updates, or defrag.  That may apply to files that are always "in use", as well as structural elements of the file system such as FATs, NTFS MFT, etc. 

Core code files may also be candidates if they have to be repeatedly re-read after being paged out of RAM - suggesting a risk mechanism that involves access rather than writes, if so.

As it is, I've often seen "one bad sector" within a crucial registry hive, or one of the core code files back in the Win9x days.  Both of these cause particular failure patterns that I've seen often enough to recognize, e.g. the Win9x system that rolls smoothly from boot to desktop and directly to shutdown, with no error messages, that happens when one of the core code files is bent.

I've often seen "one bad sector" within frequently-updated file system elements, such as FATs, NTFS "used sectors" bitmap, root directory, etc. which may explain why data recovery from bad-sector-stricken NTFS is so often unsatisfactory. 

But that's another post...

24 March 2008

Google Desktop vs. Vista Search

Technorati tags: , , , ,

Google accuses Microsoft of anti-competitive behaviour, in that Vista currently leverages its own desktop search over Google Desktop and other alternatives.  This issue is well-covered elsewhere, but some thoughts come to mind...

Isn't Google hardwired as the search engine within Apple's Safari?

Isn't Apple pushing Safari via the "software update" process as bundled with iTunes and QuickTime, even if the user didn't have Safari installed to begin with?

I'm seeing a lot of black pots and kettles here.

More to the point: If an alternate serach is chosen by the user or system builder, is the built-in Microsoft indexer stripped out?  This article suggests it won't be.

That's the ball to watch, because so far, Microsoft's approach to enabling competing subsystems has been to redirect UI to point to the 3rd-party replacement, without removing the integrated Microsoft alternative. 

That means the code bloat and exploitability risks of the Microsoft stuff remains, and that in turn makes it impossible for competitors to reduce the overall "cost" of that functionality (as using something else still incurs the "cost" of the Microsoft subsystem as well).

This is particularly onerous when the Microsoft subsystem is still running underfoot. 

For an example of the sort of problems that can arise; if you have an edition of Vista that does not offer the "Previous Versions" feature, you still have that code running underfoot, maintaining previous versions of your data files.  If someone subsequently upgrades Vista to an edition that does include "Previous Versions", then they can recover "previous versions" of your data files, even though those files were altered before Vista was upgraded.

So it's not enough to give Google (and presumably others, this complaint is not just for the benefit of the search king, is it?) equal or pre-eminant UI space.  If one has to accept the runtime overhead of some 3rd-party's indexer, then it's imperitive that Microsoft's indexer is not left running as well. 

As it is, indexer overhead is a big performance complaint with Vista.  If 3rd-party desktop search has to suffer the overhead of two different indexers, the dice are still loaded against the competition, because no matter how much more efficient the 3rd-party indexer may be, the overall result is worse performance.

21 February 2008

SysClean on Bart PE

Technorati tags: , , ,

Trend SysClean is a self-contained, stand-alone malware cleaner that can be used from the Bart PE boot CDR environment.  Unlike many free cleaners such as McAfee Stinger and Avast Cleaner, it detects most things that a full-range resident av would detect, rather than a small subset of these.

On the face of it, it should be easy to run SysClean from a Bart CDR boot, but there are a few gotchas that can mess you up.  If you're one who has sorted those out ages ago, yet recent found SysClean to no longer work in Bart. 

Either way, read on...

How SysClean works

SysClean exists as a SysClean.com engine, plus a larger signature data file with names such as LPT%VPN.xxx, where xxx is a 3-digit number that rolls over from .999 to .000 or .001 whenever there have been that many updates.  You can wildcard the data file as LPT$*.*, LPT$VPN.*, LPT$*.???, etc.

SysClean does not have to be installed before use, which makes it attractive as an intervention scanner.  There is no integrated updater, so you'd manually download the latest signature data before use.  As the engine is also subject to change, I'd recommend downloading a fresh engine alone with new signature data.

SysClean.com is not a true .com file, i.e. it is not a DOS-era 16-bit memory image of code that runs with all segment registers set to the same 64k space.  Instead, it is a Win32 executable that unpacks itself and then jumps into itself to run.

From all of the above, you can predict pitfalls when using SysClean from Bart CDR. 

General issues

The easy way to avoid these pitfalls, is to copy the files to a HD location and run them from there, all done within the same Bart boot session.

If you want to integrate via a SysClean plugin into Bart, you have to essentially automate this process, as well as avoiding a few other issues.

As SysClean writes to its own location, that location must be writable (i.e. can't run directly off the CDR) and must have enough free space to unpack (which may not be the case if running within a small RAM drive).

As SysClean.com chains into the SysClean.exe that it spawns, you must ensure your automation logic does not prematurely continue, i.e. after SysClean.com terminates but while SysClean.exe is still running.  The Start /W approach is likely to fail in this way.

SysClean launches sub-tasks, and that means it may fail in environments that impose limits on the number of tasks that can run at the same time.  WinPE, Bart PE and Windows XP Starter Edition may fall into this category.  If you have one "wizard" batch file that launches another batch file that launches a set of scanners in sequence, then scanners that start additional processes may hit the limit.

Some of the scans are launched as sub-tasks that run in "DOS-style" CLI windows.  If you used a tool within your launcher batch file to hide the batch file window, this may hide the CLI subtasks as well - creating the impression these aren't running from Bart.  I haven't gone into this, e.g. removed my CLI winder hider etc. and thus am not sure the CLI scans are done in Bart, or really are skipped.

Recent failure pattern

If you've beaten all of the above problems years ago, you may have hit the following failure pattern recently...

SysClean extracts itself and runs OK, presenting you with its GUI.  You then slick the Scan button and it starts scanning memory, before scanning files.  But it never completes this process; even after the CD and hard drive burbling stops, it just sits there "scanning memory..." forever.

The system and app haven't crashed.  If you click to stop the scan, nothing happens, but if you click the [x] to close SysClean's window, that works.

Quick fix

If you run SysClean again, within the same Bart session, it works perfectly!

Why does if fail the first time?

A few months ago, SysClean changed its behaviour; at the start of the scanning process, between checking memory and scanning files, it now pops up an "OK" status dialog, to the effect that no viruses were found in memory.

When run in Bart, this dialog never appears - so you can't see it and you can't click it away.  And thus, SysClean will stall, waiting for an "OK" click that will never come.

Why does it work the second time?

When SysClean runs, it spawns a resident process called TSC.BIN that remains running after SysClean is done.  This is spawned before the failed "OK" prompt; I suspect it's spawned as early as possible, to run as "air cover" should any active malware code try to interfere with the scanning process.

The problematic prompt is only launched if TSC.BIN is not already running when SysClean starts its scan (perhaps TSC.BIN is itself the origin of the prompt, as part of its initialisation). 

So the first scan starts TSC.BIN and suffers the UI stall, whereas all subsequent scans during the same Bart session will already have TSC.BIN running and are OK.

I see one SysClean plugin approach may side-step this issue by scooping the extracted files into the plugin, rather than having the plugin run SysClean.com to extract them at runtime.  This may avoid the problem if it is the extraction process that triggers the dialog - though that seems unlikely.