20 November 2009

Sysprep Fails, WinPE Sees Wrong Drive Letters

If you…

…then you may find…

  • Sysprep fails before it completes
  • WinPE “sees” partitions with incorrect drive letters

The impact can be severe; finding you’ve built your .WIM from the wrong partition, having Sysprep ruin both the .WIM you harvest plus the reference system you’d built, etc.  Attempts to maintain the Windows installation via WinRE or “just” re-install Windows may fail, too; I haven’t tested those scenarios.

The fix is to make sure the Windows boot partition is set as active in the partition table before you apply Sysprep or attempt access from WinPE, WinRE, OS installation disk, etc.  You can do this after Windows and Ubuntu have been installed; it won’t affect these, or how grub works.

The cause is a combination of the way grub works (which bypasses the normal MBR “boot the partition that is set as “active” code logic) and the way Microsoft code assigns drive letters to Windows-visible partitions and logical volumes.

Standard MBR logic

The Master Boot Record (MBR) is the first sector of the physical hard drive, and acts as an extension of the system BIOS.  It exists outside of any OS, running as it does before any particular OS has come into effect.

The standard MBR contains a partition table defining up to 4 partitions, one of which may be flagged as “active”.  The standard MBR code logic is to look for the (first?) active partition and chain into code within the first sector of this space.  At this point, the system phase of the boot process ends, and the OS phase begins.

How grub works

The grub boot manager adds some initial code to the MBR which links to the bulk of its code within the Ubuntu partition.  At boot time, this modified MBR code will always chain into the rest of grub, irrespective of which partition entry in the partition table is set as “active”.  The partition table is still referenced to find partitions, but the “active” setting is now ignored, and is thus irrelevant.

You may assume that the partition you booted via grub will be set as “active” in the partition table, but this is not the case; grub (at least grub 2, as contained in Ubuntu 9.10) does not update the “active” flag status according to what you booted last, even if set to default to this on next boot.

How Microsoft assigns drive letters

Microsoft OSs can “see” two groups of partition types; primary partitions that may be bootable and define a single volume, and an extended partition type that is not bootable but can contain multiple logical volumes.  Each volume contains a single file system and is typically assigned a single drive letter.

Drive letters have validity only within a Microsoft OS.  In the absence of “remembered” settings within that OS, they are assigned as follows…

A: and B: reserved for legacy diskette drives
For each physical hard drive…
  Assign ascending letters to each “active” primary partition
…next drive until all done
For each physical hard drive…
  Assign ascending letters to each logical volume in extended partition
…next drive until all done
For each physical hard drive…
  Assign ascending letters to each “inactive” primary partition
…next drive until all done

For example, if you have an NTFS primary partition and an extended partition containing three logical volumes, these will be lettered as C:, D:, E: and F: if the primary is set as “active”, and F:, C:, D: and E: if the primary is not set as “active” – and so…

Here comes the pain

When Windows boots off the hard drive, it can override the above logic in two ways. 

Firstly, it is aware of which partition it booted from, and which volume contains the bulk of its own code; these drive letters are recorded within the OS and can’t be changed.

Secondly, it remembers drive letters assigned to volumes it has “seen” before.  Unlike the letters for boot and OS volumes, these can be changed by the user, causing new values to be “remembered” and applied on subsequent boots.

But when you don’t boot this OS code, e.g. you boot WinRE, WinPE or the OS installation disk instead, then all those remembered settings do not apply.  I suspect Sysprep applies fresh logic during its processing as well, thus breaking its assumption base and causing it to fail.

Further, one may not be aware that the “active” flag status is an variance with boot history, and therefore assume that because you last booted Windows, that Windows partition will be the one currently set as “active”.  But that is not what happens when grub is in effect.

Best practices

I would suggest the following, to reduce these sort of risks…

1.  Always do an image backup prior to Sysprep

Sysprep can be as destructive as “just” re-installing Windows, or shifting/resizing existing partitions.  In practice, I have far higher destructive failures with Sysprep than repair installs of XP, over-old OS version upgrades and partition management, all of which have been safer than Service Pack installs.  So if you would always backup before doing those sort of things, then all the more so to backup before Sysprep.

Unlike Win9x and older Microsoft OSs, accurately copying every single file from one drive to another will not result in a bootable system, even if the drives and partitions are identical in size and you also copy over PBR contents that exist outside the file system. 

That is why you have to do a partition image backup (e.g. from BING boot, using Drive Image from Bart boot, etc.) to preserve your “undo” trail.

2.  Check that the Windows primary is set as “active”

This should now be added to your sanity-checks before signing off on a system build, running Sysprep, harvesting .WIM images from WinPE, etc. 

If you have a WinRE installation set up to boot in the event of Windows boot failure, then it may be important for the correct partition to be set as “active” at all times.

3.  Apply descriptive names to disk volumes

I apply the names “C-Drive”, “D-Drive” etc. to partitions and volumes as I create them in BING, so that these are the names I will see when working in BING to manipulate them as partitions. 

BING writes these names into the boot record of the volume, whereas the name you apply in Windows is held as a Volume Label entry within the root directory of that volume.  So you can have “pretty” names in Windows, Bart CDR boot, etc. and accurate names in BING.

My own practice is to choose “pretty” names that happen to start with the expected drive letter, so I get a quick visual sanity-check before operating on them in Windows.  For example, if I see “Core” is C: but “Data”, “Extras” and “Factory” are E:, F: and G:, then I know something’s gone wrong and should be fixed before I generate new paths based on these wrong letters.  I’d know to look for an optical drive or other intruder that has become “D:”, and fix that.

How this was tested

I tested this on new PCs build with the following hardware:

  • Intel “GoldTree” G43 chipset motherboard, latest BIOS applied
  • E6300 processor, VT enabled in BIOS (is off by duhfault)
  • 2 x 2G = 4G DDR2-800 Kingston Value RAM
  • S-ATA Seagate 1.5T hard drive as S-ATA 0
  • S-ATA LG DVD writer as S-ATA 3 (last)

Partitions and OSs were:

  • 30G Ubuntu 9.10 partition (not visible to Windows)
  • 4G Ubuntu swap partition (not visible to Windows)
  • 64G primary partition, Windows 7 64-bit, as C:
  • Extended partition containing FAT32 logicals D:, E: and F:
  • MBR contains grub 2 as installed with Ubuntu 9.10

Two PCs were tested, one with Home Basic and one with Pro as the Windows 7 edition, both being DSP (small OEM) installations.  The grub menu was set to default to the OS that was booted last, and this was always Windows during these tests. 

BING was used to create and manage partitions (unlike Windows, can format FAT32 larger than 32G) and was not installed as boot manager.

Test procedure:

  • BING boot, image backup Win7 primary to logical E:
  • Set Win7 primary as “active”
  • Boot hard drive; grub defaults to last selected (Windows), OK
  • Boot Windows; works, drive letters OK
  • Boot WinPE; what should be C: D: E: F: seen as C: D: E: F:, OK
  • Boot Windows, run Sysprep; works OK
  • BING boot; now…
  • Set Ubuntu primary as “active”
  • Boot hard drive; grub defaults to last selected (Windows), OK
  • Boot Windows; works, drive letters OK
  • Boot WinPE; what should be C: D: E: F: seen as F: C: D: E: - Fail
  • Boot Windows, run Sysprep; fails before post-processing boot
  • Windows is now not functioning, and remains so after reboot - Fail

In each case, Sysprep was run without answer file or CLI parameters; OOBE was selected, Generalize was checked, and Reboot selected as the post-processing action.

17 November 2009

XP Blank Desktop, No Task Manager, No UI

Technorati tags: , ,

I hit this failure pattern in the context of doing an XP SP3 repair install over XP SP3 with IE 8 installed, and it may be that this is a generic issue.  Searching the web did not find a solution, which is why I’m writing this. 

The fix is to install IE 8 from Safe Mode.

Failure pattern

Windows XP boots to the desktop, showing wallpaper, but nothing else; no icons, Taskbar, Start button, etc.  Mouse pointer present and moves OK, and the “lock” keys toggle the appropriate keyboard LEDs, so the system is still running.  Safe Mode works OK.

Pressing Ctl+Alt+Del does not bring up Task Manager, pressing Alt+Tab or the Flag key does nothing, and pressing (but not holding) ATX power off does not initiate a shutdown.  Pressing the case Reset button forces a hard reset and holding down ATX power button forces ATX “power off”; both cause some file system damage due to bad exit with files open for writes.

Note how this failure pattern differs from some others that are more common; no icons but UI elements present (desktop properties setting to hide icons, icons unselected, etc.), other “shell” failures where Ctl+Alt+Del and ATX power press still work, and malware effects that specifically knock out Task Manager while leaving the desktop UI functioning.

Typical Scenario

It is often necessary to do a “repair install” of Windows XP if one has changed core hardware that breaks compatibility with XP’s pre-PnP code base.  This is what happened to me, when I had to replace a dead motherboard with a different one, even though this was based on the same chipset and the hard drive interface remained the same. 

In addition, folks often “just” re-install Windows for all sorts of problems, even when there are cleaner ways to fix the problem, and/or when doing so is likely to fail.  So I’m surprised there hasn’t been solutions visible via Internet search for this issue – if indeed it is as generic as I suspect it may be.

A few weeks earlier I’d done a similar replacement on a similar PC, using a completely different motherboard chipset.  In that case, Windows booted just fine, did a bit of PnP device detection and driver installation, tossed its Activation cookies out of the cot, but was ultimately fine.  But in the second case, I had the half-expected STOP BSoD error (“Windows has been shut down to prevent damage to  your computer”) earlier during the boot process.

What didn’t work

Writing to an at-risk system, especially installing new software, can often make things worse – so installing IE 8 was far from the first thing I tried.

The PC had already donethe prelim”; RAM, motherboard capacitors, hard drive, file system were all OK and malware had been managed from a Bart boot, and the C: partition had been backed up as a partition image using BING.

First, I hunted down and removed all startup items and drivers for old hardware, working both from Safe Mode and Bart boot.  No joy.

Because Safe Mode worked OK, I suspected a shell integration factor, so I tried setting the shell to Cmd.exe and starting Windows normally.  This failed in the same way; no Cmd.exe window appeared, and ATX power, Ctl+Alt+Del etc. still didn’t work.  Disabling shell extensions using Nirsoft Extension Viewer didn’t work either.

Then I thought there may be a problem with launching the shell, so I edited a batch files run via an existing Task so that it launched Explorer.exe instead of doing what it had done before.  I’d noted this Task running behind the dead desktop, but from Safe Mode one cannot create new Tasks properly – some properties can’t be set, such as “run only when logged on”.  That’s why I edited the batch file of an existing Task, rather than creating a new one.  No joy, again.

The fix

Re-installing Windows often causes problems when bundled subsystems (e.g. Windows Media Player, Internet Explorer) are forced back to older versions.  With that in mind, I tried re-installing IE 8 from Safe Mode, half expecting the usual “Windows Installer is not running” failure pattern.  Much to my surprise, not only did the installation of IE 8 from Safe Mode work, the next boot brought up a functioning shell in normal Windows mode.  Problem solved!