20 November 2009

Sysprep Fails, WinPE Sees Wrong Drive Letters

If you…

…then you may find…

  • Sysprep fails before it completes
  • WinPE “sees” partitions with incorrect drive letters

The impact can be severe; finding you’ve built your .WIM from the wrong partition, having Sysprep ruin both the .WIM you harvest plus the reference system you’d built, etc.  Attempts to maintain the Windows installation via WinRE or “just” re-install Windows may fail, too; I haven’t tested those scenarios.

The fix is to make sure the Windows boot partition is set as active in the partition table before you apply Sysprep or attempt access from WinPE, WinRE, OS installation disk, etc.  You can do this after Windows and Ubuntu have been installed; it won’t affect these, or how grub works.

The cause is a combination of the way grub works (which bypasses the normal MBR “boot the partition that is set as “active” code logic) and the way Microsoft code assigns drive letters to Windows-visible partitions and logical volumes.

Standard MBR logic

The Master Boot Record (MBR) is the first sector of the physical hard drive, and acts as an extension of the system BIOS.  It exists outside of any OS, running as it does before any particular OS has come into effect.

The standard MBR contains a partition table defining up to 4 partitions, one of which may be flagged as “active”.  The standard MBR code logic is to look for the (first?) active partition and chain into code within the first sector of this space.  At this point, the system phase of the boot process ends, and the OS phase begins.

How grub works

The grub boot manager adds some initial code to the MBR which links to the bulk of its code within the Ubuntu partition.  At boot time, this modified MBR code will always chain into the rest of grub, irrespective of which partition entry in the partition table is set as “active”.  The partition table is still referenced to find partitions, but the “active” setting is now ignored, and is thus irrelevant.

You may assume that the partition you booted via grub will be set as “active” in the partition table, but this is not the case; grub (at least grub 2, as contained in Ubuntu 9.10) does not update the “active” flag status according to what you booted last, even if set to default to this on next boot.

How Microsoft assigns drive letters

Microsoft OSs can “see” two groups of partition types; primary partitions that may be bootable and define a single volume, and an extended partition type that is not bootable but can contain multiple logical volumes.  Each volume contains a single file system and is typically assigned a single drive letter.

Drive letters have validity only within a Microsoft OS.  In the absence of “remembered” settings within that OS, they are assigned as follows…

A: and B: reserved for legacy diskette drives
For each physical hard drive…
  Assign ascending letters to each “active” primary partition
…next drive until all done
For each physical hard drive…
  Assign ascending letters to each logical volume in extended partition
…next drive until all done
For each physical hard drive…
  Assign ascending letters to each “inactive” primary partition
…next drive until all done

For example, if you have an NTFS primary partition and an extended partition containing three logical volumes, these will be lettered as C:, D:, E: and F: if the primary is set as “active”, and F:, C:, D: and E: if the primary is not set as “active” – and so…

Here comes the pain

When Windows boots off the hard drive, it can override the above logic in two ways. 

Firstly, it is aware of which partition it booted from, and which volume contains the bulk of its own code; these drive letters are recorded within the OS and can’t be changed.

Secondly, it remembers drive letters assigned to volumes it has “seen” before.  Unlike the letters for boot and OS volumes, these can be changed by the user, causing new values to be “remembered” and applied on subsequent boots.

But when you don’t boot this OS code, e.g. you boot WinRE, WinPE or the OS installation disk instead, then all those remembered settings do not apply.  I suspect Sysprep applies fresh logic during its processing as well, thus breaking its assumption base and causing it to fail.

Further, one may not be aware that the “active” flag status is an variance with boot history, and therefore assume that because you last booted Windows, that Windows partition will be the one currently set as “active”.  But that is not what happens when grub is in effect.

Best practices

I would suggest the following, to reduce these sort of risks…

1.  Always do an image backup prior to Sysprep

Sysprep can be as destructive as “just” re-installing Windows, or shifting/resizing existing partitions.  In practice, I have far higher destructive failures with Sysprep than repair installs of XP, over-old OS version upgrades and partition management, all of which have been safer than Service Pack installs.  So if you would always backup before doing those sort of things, then all the more so to backup before Sysprep.

Unlike Win9x and older Microsoft OSs, accurately copying every single file from one drive to another will not result in a bootable system, even if the drives and partitions are identical in size and you also copy over PBR contents that exist outside the file system. 

That is why you have to do a partition image backup (e.g. from BING boot, using Drive Image from Bart boot, etc.) to preserve your “undo” trail.

2.  Check that the Windows primary is set as “active”

This should now be added to your sanity-checks before signing off on a system build, running Sysprep, harvesting .WIM images from WinPE, etc. 

If you have a WinRE installation set up to boot in the event of Windows boot failure, then it may be important for the correct partition to be set as “active” at all times.

3.  Apply descriptive names to disk volumes

I apply the names “C-Drive”, “D-Drive” etc. to partitions and volumes as I create them in BING, so that these are the names I will see when working in BING to manipulate them as partitions. 

BING writes these names into the boot record of the volume, whereas the name you apply in Windows is held as a Volume Label entry within the root directory of that volume.  So you can have “pretty” names in Windows, Bart CDR boot, etc. and accurate names in BING.

My own practice is to choose “pretty” names that happen to start with the expected drive letter, so I get a quick visual sanity-check before operating on them in Windows.  For example, if I see “Core” is C: but “Data”, “Extras” and “Factory” are E:, F: and G:, then I know something’s gone wrong and should be fixed before I generate new paths based on these wrong letters.  I’d know to look for an optical drive or other intruder that has become “D:”, and fix that.

How this was tested

I tested this on new PCs build with the following hardware:

  • Intel “GoldTree” G43 chipset motherboard, latest BIOS applied
  • E6300 processor, VT enabled in BIOS (is off by duhfault)
  • 2 x 2G = 4G DDR2-800 Kingston Value RAM
  • S-ATA Seagate 1.5T hard drive as S-ATA 0
  • S-ATA LG DVD writer as S-ATA 3 (last)

Partitions and OSs were:

  • 30G Ubuntu 9.10 partition (not visible to Windows)
  • 4G Ubuntu swap partition (not visible to Windows)
  • 64G primary partition, Windows 7 64-bit, as C:
  • Extended partition containing FAT32 logicals D:, E: and F:
  • MBR contains grub 2 as installed with Ubuntu 9.10

Two PCs were tested, one with Home Basic and one with Pro as the Windows 7 edition, both being DSP (small OEM) installations.  The grub menu was set to default to the OS that was booted last, and this was always Windows during these tests. 

BING was used to create and manage partitions (unlike Windows, can format FAT32 larger than 32G) and was not installed as boot manager.

Test procedure:

  • BING boot, image backup Win7 primary to logical E:
  • Set Win7 primary as “active”
  • Boot hard drive; grub defaults to last selected (Windows), OK
  • Boot Windows; works, drive letters OK
  • Boot WinPE; what should be C: D: E: F: seen as C: D: E: F:, OK
  • Boot Windows, run Sysprep; works OK
  • BING boot; now…
  • Set Ubuntu primary as “active”
  • Boot hard drive; grub defaults to last selected (Windows), OK
  • Boot Windows; works, drive letters OK
  • Boot WinPE; what should be C: D: E: F: seen as F: C: D: E: - Fail
  • Boot Windows, run Sysprep; fails before post-processing boot
  • Windows is now not functioning, and remains so after reboot - Fail

In each case, Sysprep was run without answer file or CLI parameters; OOBE was selected, Generalize was checked, and Reboot selected as the post-processing action.

17 November 2009

XP Blank Desktop, No Task Manager, No UI

Technorati tags: , ,

I hit this failure pattern in the context of doing an XP SP3 repair install over XP SP3 with IE 8 installed, and it may be that this is a generic issue.  Searching the web did not find a solution, which is why I’m writing this. 

The fix is to install IE 8 from Safe Mode.

Failure pattern

Windows XP boots to the desktop, showing wallpaper, but nothing else; no icons, Taskbar, Start button, etc.  Mouse pointer present and moves OK, and the “lock” keys toggle the appropriate keyboard LEDs, so the system is still running.  Safe Mode works OK.

Pressing Ctl+Alt+Del does not bring up Task Manager, pressing Alt+Tab or the Flag key does nothing, and pressing (but not holding) ATX power off does not initiate a shutdown.  Pressing the case Reset button forces a hard reset and holding down ATX power button forces ATX “power off”; both cause some file system damage due to bad exit with files open for writes.

Note how this failure pattern differs from some others that are more common; no icons but UI elements present (desktop properties setting to hide icons, icons unselected, etc.), other “shell” failures where Ctl+Alt+Del and ATX power press still work, and malware effects that specifically knock out Task Manager while leaving the desktop UI functioning.

Typical Scenario

It is often necessary to do a “repair install” of Windows XP if one has changed core hardware that breaks compatibility with XP’s pre-PnP code base.  This is what happened to me, when I had to replace a dead motherboard with a different one, even though this was based on the same chipset and the hard drive interface remained the same. 

In addition, folks often “just” re-install Windows for all sorts of problems, even when there are cleaner ways to fix the problem, and/or when doing so is likely to fail.  So I’m surprised there hasn’t been solutions visible via Internet search for this issue – if indeed it is as generic as I suspect it may be.

A few weeks earlier I’d done a similar replacement on a similar PC, using a completely different motherboard chipset.  In that case, Windows booted just fine, did a bit of PnP device detection and driver installation, tossed its Activation cookies out of the cot, but was ultimately fine.  But in the second case, I had the half-expected STOP BSoD error (“Windows has been shut down to prevent damage to  your computer”) earlier during the boot process.

What didn’t work

Writing to an at-risk system, especially installing new software, can often make things worse – so installing IE 8 was far from the first thing I tried.

The PC had already donethe prelim”; RAM, motherboard capacitors, hard drive, file system were all OK and malware had been managed from a Bart boot, and the C: partition had been backed up as a partition image using BING.

First, I hunted down and removed all startup items and drivers for old hardware, working both from Safe Mode and Bart boot.  No joy.

Because Safe Mode worked OK, I suspected a shell integration factor, so I tried setting the shell to Cmd.exe and starting Windows normally.  This failed in the same way; no Cmd.exe window appeared, and ATX power, Ctl+Alt+Del etc. still didn’t work.  Disabling shell extensions using Nirsoft Extension Viewer didn’t work either.

Then I thought there may be a problem with launching the shell, so I edited a batch files run via an existing Task so that it launched Explorer.exe instead of doing what it had done before.  I’d noted this Task running behind the dead desktop, but from Safe Mode one cannot create new Tasks properly – some properties can’t be set, such as “run only when logged on”.  That’s why I edited the batch file of an existing Task, rather than creating a new one.  No joy, again.

The fix

Re-installing Windows often causes problems when bundled subsystems (e.g. Windows Media Player, Internet Explorer) are forced back to older versions.  With that in mind, I tried re-installing IE 8 from Safe Mode, half expecting the usual “Windows Installer is not running” failure pattern.  Much to my surprise, not only did the installation of IE 8 from Safe Mode work, the next boot brought up a functioning shell in normal Windows mode.  Problem solved!

26 May 2009

Vista UI Annoyances

Technorati tags:

When a user interface has different behaviours, and you can’t predict which one will arise, it can drive you nuts.  Sometimes this is due to cues it takes from material you haven’t seen yet, and sometimes there’s something you need to do slightly differently to select one or other behaviour – but the difference in what you do is too subtle to learn.

File operation, multi-selection, or re-ordering?

This was always a pain in XP’s Start Menu; you try to drag an item to re-order it, and it doesn’t go with the mouse because for some reason the OS didn’t know that’s what you wanted to do.  So I’d stamp the mouse button on the thing, hold still with button down for a while, then drag smartly while staying on the menu – trying to make my gestures and timing as clear as possible.  No joy; sometimes it works, sometimes not.  Then I tried making the first move sideways vs. up or down, making the start of the move gradual vs. sudden, and I still could not get consistent results.

In Vista, the problem sprawls over to all folder views, making the problem that much more annoying as it now pervades the whole shell.  Even if you deliberately choose List view in an attempt to avoid useless icon positioning info clogging up the registry, Vista still seems to remember item positioning, as imposed via dragging within the pane.

The effect is the reverse of the Start Menu context, because usually in the shell I’m trying to select a large number of items by “lasso’ing” them, or move one or a selected wad of items from one pane to another.  Sometimes I get what I want; sometimes Vista thinks I’m trying to lasso-select items when I’m trying to drag what I’ve stamped the mouse on, and other times it does the re-ordering thing, which is never what I want.

In XP, I didn’t have that confusion between lasso-selection and dragging items.  As long as I started by lasso-select from an “empty” point in the folder, I’d know I’d get lasso behaviour, and not drag-and-drop behaviour.

But there’s something different in the way Vista selects things, and that’s a problem in its own that we’ll come to later.  Perhaps that difference affects this UI behaviour as well?

Letter case for drive volume names

In the days of Windows 95, to avoid the overhead of LFN directory entries for valid 8.3 names, you’d have to stick to ALLCAPS.  The NT family may use other more economical cues for ALLCAPS, Sentence.Case and allsmalls names, which is one reason to be less tense about all this… so today, I usually use the letter case that I want to see, rather than try to reduce system overhead of LFNs.

This works fine for files and folders, but gets wobbly when it comes to the names used by hard drive volumes.  The problem is common to both XP and Vista – it appears to be impossible to force your choice of letter case; sometimes you get ALLCAPS, other times Sentence Case.

Now volume labels are tricky things, down at the file system where they are stored.  Each volume actually has two separate name locations; one is embedded within the volume’s boot record, and the other is held in the root directory as a “legacy” 8.3 entry.  There’s a twist to the way that 8.3 entry is interpreted; all 11 characters are seen as one name (not as 8 character name plus 3 character extension), and lower case and space characters are allowed.  This behaviour goes back as far as pre-Windows MS-DOS.

When you set the volume name via the shell, only the root directory entry is affected, and this is what is displayed if it exists.  If it does not exist, the name embedded within the boot record is shown; if that is blank, you will see “local disk” instead.  If you want to operate on the embedded boot record name, you can do that from BING after booting it from CDR, cancelling the install prompt, and using it in partition maintenance mode.

On the face of it, preserving letter case should be even easier than for normal files and directories, because the legacy behaviour does this even without LFNs.  The shell appears to restrict itself to the original 8.3 entry, as it accepts only 11 characters as input. 

But whether I use F2, right-click Rename or right-click Properties etc., I cannot impose my choice of letter case, whether I “break the rules” with spaces or not.  Often I have some volumes displaying as ALLCAPS and some in Sentence case, after using the same UI methods to name all of these in the same way… very strange.

Content-sensitive folder views

I’m not the only one who hates this with a passion!  When you view the items in a folder, Vista gropes those items to “smell” what type of things they are, then selects the appropriate view.  AutoPlay does a similar thing when it populates the pop-up list of things you can do.

If these behaviours were restricted to clearly-defined contexts, such as defined shell folders and true audio CDs, I wouldn’t mind.  The problem is the cues that Vista is using to determine the context, are far too variable and flaky – one image file doesn’t mean this is a collection of photos, and one .MP3 or .WAV doesn’t make it a music collection either.  Several apps will include a few image or audio files in the same directory, yet if anything these should be handled as “mixed content”. 

Vista’s guessing is as absurd as vintage Windows 95’s auto-resolution of shortcuts that point to missing targets (remember “can’t find WINWORD.EXE, should I point to SMARTDRV.EXE instead, and do so forevermore if you click OK?”).  Navigate into a new Start menu folder containing the Skype icon, and it will always be shown as thumbnails view; other Start menu folders containing other icons typically look “normal”.  Bizarre.

There are safety aspects to this as well – when I view a folder, it may be because I know there’s malware in there and I intend to delete files without “opening” them.  Having the shell code automatically groping all this material is exactly what I DON’T want – and that applies especially to the “autoplaying” of arbitrary CDRs and external storage devices.

Dead icons for living shortcuts

This also drives me nuts in Vista, and may be related to the way that Windows Installer smurfs pointers to files and icons through CLSIDs.  Specifically, shortcuts created by Windows Installer’s processing of .MSI files, will not point to the actual executable, but to a spare copy of this that is held within %WinDir%\Installer – and yes, this junk can’t be relocated off C:.

Well, all of that’s just Windows Installer; irrespective of whether that’s on XP or Vista, it’s equally flaky and tedious, e.g. prone to spontaneously demanding install disks for stuff you thought you’d already installed, and weren’t even running at the time.

What’s particular to Vista, is how icons within the Start Menu – both (All) Programs and the “recently used” and pinned lists – often flip to the generic “file not found” icon, and stay that way even if you can right-click the shortcut and re-assert the icon.  Given that CLSID-based post-.MSI shortcuts preclude user UI editing of target filespecs etc., maybe this isn’t related to Windows Installer after all – then again, I’ve enough other reasons to wish Installer and .MSI to go away forever.

Is it selected or not?

Vista feels “different” to XP when one is selecting items, as well as nodding the current item through these.  For example, for item1 … item5, if you’re holding down the Control key and using the arrow keys to nod the current item along, after using Space to select item2 and item4, then the appearance of these items can be confusing.  The selection colours are usually quite pale, and the difference between “selected”, “unselected”, “current and selected” and “current but not selected” is extremely subtle. 

In contrast, XP uses different and mutually exclusive UI techniques for selected vs. unselected (different background color) and current item (outline rectangle).  Vista’s fancy 3D pastels may be pretty to look at, and thus nice for the first few minutes, but they’re hard to work through, and thus a pain for ever.

In fact, it really amuses me how we’ve “progressed” as far as monitors and GUIs are concerned.  First, we used curved reflective tube monitors that picked up reflections from lights and windows, and all we wanted was a matt non-reflective screen, or better yet, a screen with a flat surface that didn’t show these highlights.  Now that we have flat LCDs that don’t reflect the room back at us while we’re trying to work, we add fake highlights all over everything.  Then we’re told we need higher-performance (and power-hogging) hardware so we can see these added imperfections – very strange.

5 March 2009

Automatic Update May Force-Feed You

Technorati tags:

Here’s a fun thing to try, as re-verified tonight in XP SP3: Set Automatic Updates to “Download updates for me, but let me choose when to install them”, then when the yellow shield shows new updates are ready to install, go “Advanced”, UNcheck all of them, and ignore the prompt.

Now do the same thing with the yellow shield.  See how the updates are checked again?  UNcheck them again, as you did before.

Now go to the Start Menu, Turn Off Computer.  Notice how the dialog box is set to install updates, with the non-icon link text to shutdown without installing them?

In tonight’s case, the updates were two; one, a self-serving “Genuine Advantage” for MS Office, and the other, something to update with Windows Live Sign-In Assistant.

I’ve debated this topic in a security newsgroup, who are gung-ho to have us consumers swallow updates immediately, even if they advise against immediately rolling updates across their own corporate network “production machines” before these are tested.  Well, as a consumer, I have a network of one crucial “production machine”, I don’t have pro-grade in-house testing capabilities, and yet I don’t want my system adversely impacted either.

As it stands, you can read “Download updates for me, but let me choose when to install them” to mean “let me choose whether to install them right now, or have them forced into the system on next shutdown” or (as I did), “let me choose whether or not to install them at all”.  I want my downloaded updates stored somewhere in a redirectable location (hint: Not C:) so that I can initiate installation when I choose.  Yes, pre-check them in the yellow shield dialog box, but if I assert my desire to NOT install them by UNchecking them, then DO NOT install them.