18 November 2020

SSDs and Windows 7

Geek summary: This post attempts to collect all you need to use SATA SSD in Windows 7

  1. Check CMOS Setup SATA is AHCI, not IDE or RAID (no TRIM)
  2. Check whether SSD is listed for Defrag
  3. Check/refresh WEI Storage score; should be > 5.9
  4. Check SuperFetch and PreFetch via Regedit; DWORD 0
  5. Check SuperFetch via Services; ?Disable
  6. Admin Cmd: fsutil behavior query disabledeletenotify = 0

Windows abstracts SSDs and disk drives together as "storage", although the two technologies are completely different, with different gamuts of strengths and weaknesses. Windows Vista, XP and older will treat SSDs as if they were hard drives, which will shorten their life due to increased write operations.

Windows 7 can safely use SSDs, but the process is less transparent than it is on Windows 8.x and Windows 10, so you have to dig deeper to ensure all is set and working as it should be.

AHCI vs. IDE and RAID

SSDs that connect via PCI-e or M.2 slots are beyond the scope of this post; I don't expect Windows 7 to "see" these interfaces at all, especially if the M.2 SSD is NVMe.  Motherboards new enough to include an M.2 slot, are probably "too new" for Windows 7 anyway.

CMOS Setup can set SATA to operate as legacy IDE, AHCI, or RAID.  Windows 7 may only apply TRIM via AHCI mode, but switching between these modes may cause the next Windows 7 boot to fail on a BSoD, so ensure Windows is OK for AHCI before changing this setting in CMOS!

Note: Bart PE and Windows XP and older will need "F6 diskette" boot drivers added in order to boot in AHCI mode, else with fail with the same BSoD. 

While in CMOS Setup, you may also want to disable Hot Plugging (eSATA) for SATA ports connecting to internal drives, so these are not shown as "removable" in the Safe To Remove UI.  This is far easier than chasing registry settings etc. via methods that vary with Windows version, Microsoft vs. Intel drivers, etc. as it fixes the issue at its source.

Does Windows 7 see drive as SSD?

This is easy to determine; right-click a drive letter, Properties, Tools tab, Defrag, drill into Schedule, Drives, and see if the driver letters from the SSD are listed.  If they are, Windows 7 is seeing and treating these as hard drives; bad news!  If they are not listed to be selected, they are seen as SSD, OK.  

Note that you'll still see SSD drive letters listed to be manually defragged in Windows 7 - but even if you leave Scheduled Defrag enabled, the SSD won't be defragged automatically, which is good.

This is where Windows 8.x/10 are so much better; "Defrag" is now "Optimize", each drive is listed as SSD or Hard Drive, and the Optimize operation will show as Defrag or TRIM.  But while Windows 7 may have the right clue behind the scenes, everything is still UI'd as if SSDs don't exist!

Does Windows 7 see the speed?

ReadyBoost, Prefetch and SuperFetch are storage behaviors that are enabled or disabled according to the detected speed of the storage device.  This is tested by the WinSat CLI tool, which is in turn called by the Windows Experience Index (WEI) UI. Subsystem scores are shown by WEI up to the 7.9 maximum value, but are stored in the registry as decimal integer values; e.g. the typical hard drive score shown as 5.9 is stored in the registry as 59 decimal.

WinSat and WEI are not updated automatically, so the old hard drive score may persist, causing a newly-added SSD to be treated as if it were a slower hard drive; Prefetch and Superfetch will be enabled to help speed it up, while ReadyBoost will be disabled as the drive will still be considered too slow to speed up anything else.

There are registry entries to manage Prefetch and Superfetch, but in my experience they don't "stick", so it's better to tackle the root of this issue by refreshing the WEI scores.  This post suggests why; the behaviors periodically check the storage speed score and act accordingly, and this may include overriding the registry settings that "control" these behaviors. 

However, manually re-testing the storage speed, either via the WEI UI or digging into WinSat, may failProbable causes include boot-time storage filter drivers as part of resident antivirus, and can thus affect Safe Mode, where such drivers remain integrated.  There is a Microsoft Hotfix for this, but as it's no longer available from Microsoft, you have to grope for it elsewhere.  After checking the digital signature and uploading to VirusTotal for safety, I installed the Hotfix, and WEI then worked without having to uninstall Avast on my client's PC (disabling Avast did not fix the problem).

I also tried editing the storage score via Regedit; that was ignored, and a good thing too - else "price hero" PC vendors could more easily fake these scores to hide poor performance!

I didn't delve into ReadyBoost, but found these interesting posts.

Is TRIM enabled?

The following details are from my fuzzy memory, and are not required for the scope of this "how to" reference; search and read up multiple sources for more accurate information.

At the SSD firmware level, there is no awareness of partitions, file systems, whether things are deleted or not, so any block which has ever been written and/or has non-zero contents, will be preserved by the firmware.  Flash memory blocks are large, and the firmware's wear-leveling logic may spread writes to unused blocks first, which are also faster to write (hence best out-the-box performance and benchmarks).  Eventually, all blocks will have been used (even if the drive has never contained much data), and write amplification may set in.

TRIM is an SSD firmware feature that optimizes write operations, by adding awareness of what blocks need not be preserved and/or can be erased and/or zero'd out.  Windows 7+ can send the firmware "hints" as to what blocks are to be considered deleted, so the firmware can better inform its internal garbage collection etc. - but it's a horse-and-water situation as to whether the firmware will apply these "hints" immediately, enqueue them for action when "idle", lose them from the queue when it fills up before "idle", or ignore them completely.

Windows 7 sends these hints to the SSD, if this "as admin" command...

fsutil behavior query disabledeletenotify

...returns the value 0.  It's far harder to determine whether the SSD firmware is actually doing TRIM at all, and I've not delved into such detail; hopefully we can assume recent SSDs should "just work" for TRIM?

Windows 7 has no UI to initiate a TRIM (strictly speaking, a "re-trim" operation to send hints to the SSD in the hope its firmware will TRIM).  Once again, Windows 8.x/10 are far better at this; the updated "Defrag" (now "Optimize") UI shows SSDs as such, and that it is TRIM rather than Defrag that happens when an SSD is Optimized.  These newer OSs also improve the chances that the SSD firmware will actually TRIM, by sending re-trim hints when idle, etc.

TRIM, NTFS and Defrag

In Windows 10, TRIM requires NTFS, and presumably this is the case behind Windows 7's opaque UI also.  Ironically, the design of NTFS means it needs Defrag, though less often, for different reasons, and with different logic to be effective.

The SSD itself doesn't need Defrag at all; not only is there no head travel to optimize, the actual location of data within flash memory addressing is most likely scrambled by the SSD firmware wear-leveling logic.  TRIM helps clean up the garbage, whereas Defrag just moves the garbage around.

But whereas FATxx file systems set aside a fixed and duplicated File Allocation Tables to hold cluster chaining information, NTFS attempts to improve scalability by holding pointers to each extent (i.e. run of contiguous clusters) within each file or directory's metadata.  

This is fine when the cluster chain is unfragmented; the overhead is the same as a FATxx directory entry (one pointer, to the start of the cluster chain), effectively dumping the FAT tables for free. But a fragmented file needs an additional metadata pointer for each fragment, and that can bloat the metadata into trouble.

For this reason, Windows 10 will do some sort of defrag on SSD volumes where the Volume Shadow Copy service is active, once a month or so, to prevent NTFS from blowing itself up.  This wouldn't be necessary for FATxx, but Windows won't TRIM FATxx, so you can't avoid the issue.

Volume Shadow Copy (VSC) is the engine behind System Protection, System Restore, File History, and the ability to back up files that are "in use".  It creates and populates "\System Volume Information", where ChkDsk results are also stored.  You may be able to avoid VSC by disabling System Protection, either entirely or for particular volumes that are on SSD.

Windows 7 may predate this awareness of the need to defrag NTFS, and so there may be reason to do a manual Defrag of SSD volumes, though rarely; perhaps twice a year or so?  

However, the standard Defrag logic will not defrag files larger than 64M - exactly the files that are most likely to blow up NTFS, as they will have the largest number of metadata cluster chain pointers.  Concurrent write operations will do the most damage, and huge slowly-growing files (hello, Outlook .pst) are most likely to get into trouble.  VSC increases the risk of concurrent writes, which is probably why Windows 10 takes that as a cue to surreptitiously defrag SSD volumes, at the risk of horrified Internet posters screaming "never defrag an SSD!!"

14 November 2020

Intel 10nm GPU Driver Blanks BCD Boot Menu

This is the second case of a brand new laptop based on Intel's 10nm 10xxGx processors, in which the Intel Display Adapter driver causes the BCD {BootMrg} menu to be invisible (black-on-black), though still working.  Here's how to demonstrate the bug:

  1. Run BCDEdit from an "As Admin" Cmd or PowerShell
  2. Add an OSLoader entry to BCD {BootMgr}, so BCD boot menu will be invoked
  3. Set the Timeout to 15 or so
  4. Restart system from cold, i.e. not "Fast Startup", Resume from Sleep or Hibernate, etc.
  5. Rotating dots will vanish to blank black screen, where you should have seen the boot menu
  6. Wait for timeout or press Enter; system will boot as expected
  7. Device Manager, select Intel Display Adapter, Disable
  8. Repeat from (4), menu will now appear as it should, OK
  9. Device Manager, select Intel Display Adapter, Enable
  10. Repeat from (4), menu will fail to be visible again

The current case is a Dell Inspiron 3593 based on i7-1065G7, whereas the first case was an Asus X509JA-i541GT based on i5-1035G1.  Both processors are 10nm but have different integrated GPUs.  Windows 10 versions 1909 (Dell original), 2004 (Asus updated) and 20H2 (Dell updated) equally affected.

Click the link words to drill down into detail, and here for a fuller description of the problem.

Don't Kick Away The Ladder

You climb a ladder onto the roof, then kick away the ladder. 
How do you safely get down to put the ladder back up so you can safely get down?

Such an obvious f-up, you wonder why I need write this post.  Like a cartoon character who runs off a cliff, panics while briefly suspended in mid-air, then inevitably plunges to a cartoon death; surely, system designers aren't that stupid?  

Examples abound, especially in the age of Class 3 UEFI that forces us to share these stupid risks. The "Extensibility" of UEFI allows code to be integrated before any OS can boot, and this code can persist into the OS runtime, so it's hard to see how this can be "more secure" when fundamentally unsafe.  Years of (U)EFI's buggy "growing up in public" makes it clear such code is insufficiently trivial to be free of exploitable bugs.  A ladder should be trivial enough to never break; flaky firmware can break systems in ways that cannot be fixed!

I'm setting up a new Dell laptop, and in the firmware setup, is an on-by-default option to allow UEFI firmware to connect to the Internet and grope for "updates", before any OS is booted from which malware could be tackled, as if vendor supply-chain attacks had not already happened.  Specifically, if attempts to boot Windows fail successively "too many times", the firmware will try to launch Dell's repair, and if that in turn fails, will try to download the repair material via the Internet.

This Dell laptop also suffers from this bug, based as it is on the same 10nm Intel processor family.  The problem applies to both the original Windows 10 1909 installation, and after this was upgraded to 20H2 as per current Media Creation Tool.  The .iso created by this tool no longer fits a standard DVDR disc, so a bootable USB stick was created instead, and the file set copied from there. 

The laptop's nice NVMe SSD can't be seen from my rescue WinPE boot discs, nor from a freshly-downloaded Kaspersky Rescue Disk; none of these can see the drive via the PCI interface.  Once again, the ladder is kicked away; crucial boot-time code should "always just work", i.e. standard trivial code baked into the firmware, not requiring "special drivers" to work.

As it is, the nature of the UEFI display bug demonstrates how a setting at the top of the ladder (Windows 10 Device Manager, enabling or disabling the Intel display adapter) kicks away the bottom of the ladder (OS-level setting affects pre-OS UEFI, such that pre-OS BCD boot menu fails to display).

Do we really have to wait for more shoes to drop?