09 September 2025

KB5063878: Too many NTFS Extents?

Still thinking about KB5063878 (when you get to remember a KB number, it's usually a bad one) and several things in the code stack may apply; Device Encryption, Ring -2, bigLITTLE cores and threads, file system resource depletion, motherboard and device firmware, processor microcode, motherboard chipset, and those elusive "Hardware Error" items that turn up in Reliability.

Disable Sandbox?

This recent article holds a clue, if you scroll down about a third way down, and I paste: 

“I myself was able to recreate the same initial error I got while copying the 151G file. Not only that, but the epic fail originated a WHEA hardware error in the event viewer related to the PCIe controller, which eventually forced me to restart. I then disabled sandbox, uninstalled the update, and the file copied just fine without a hitch… no errors, no freezes, no hangs.”

“I have a Crucial T710 2T, and I also suffered a glitch. Not as serious, but nevertheless, a glitch. I tried transferring a 151G file; it failed, and it lingered in my SSD as a ‘ghost’ file. I could not delete it, access it, or anything. After 3 attempts, I was able to delete it via Safe Boot Minimal,” another tester told Windows Latest.

After that, the article blandly states...

We don’t know how some people have a botched-up SSD after the recent Windows updates, but it appears to affect a very small number of users, and unless Microsoft finds something in telemetry data, we’ll never know what really happened.

No, it's not OK to shrug off data and storage loss as JOOTT (Just One Of Those Things), even if affecting "a very small number of users".  

Good to know that disabling Sandbox may be a workaround, and a lot cleaner than "just" trying to uninstall the face-hugging KB, where disabling the Sandbox may be required before this will work beyond an error and failure to uninstall.

Dead runtimes don't talk

Forget telemetry, it can't tell you anything about the most significant failures that kill the runtime, if not the entire system.  A bullet through the brain means you can't even log "Something went wrong"; don't get distracted by the tyranny of the measurable!

Reliability: Hardware Error

Part of the support ritual is to check Reliability, a useful feature prototyped in Vista and maturing somewhat thereafter, as a manageable tap into the fire-hose of Event Spewer.

There, I often see "Hardware error" in systems that are otherwise fine, with nothing amiss on DISM and SFC do-it-for-me code fixers.  There are no further details for these entries, and so far my limited efforts to link them to Event Viewer items has not shed a light.  Perhaps they are related to GPU glitches, or something else too deep in the hardware, such as... PCIe.

Ring -2

I can't find links for this, or recall the name of the subsystem involved, but I remember what I read; that a deep processor ring -2 mode is how "BIOS" presents USB keyboard and mouse to software as if they were PS/2, and by implication, possibly legacy hardware emulation in general.

In this mode, regular CPU execution (including kernel Ring 0) is paused while the Ring -2 code does its thing.  Any bugs here are likely to hard-hang the system, but could delay return to the point that time-sensitive code may time out or fail.  This code is so deep under the kernel carpet, perhaps Windows can only report "Hardware error"?

Under-the-rug stuff like this, or "remote admin" opportunities, are a great place for malware to dabble.

Current favorite: NTFS Extents

We know not to defrag SSDs, as that "just moves the junk around" and hammers the flash memory cells' limited write life; it's better to ask the SSD firmware to Trim, and hope it will eventually do so.  

Keeping track of where the file system thinks things are, and where the SSD firmware chooses to (eventually) write them, is the black art of the SSD firmware, and likely a big reason why SSDs cost more than the bare flash memory sold as camera cards and USB flash drives.  There's very likely to be resource limitations and opportunities for things to go wrong in this space, which may be why Phison found themselves in the cross-hairs after KB5063878 brought our new crisis du jour.

Upstairs in the NTFS, there's a known resource depletion risk; cluster chaining info "Extents".  Whereas FATxx dedicates slabs of pre-booked space for cluster chaining info (i.e. which storage block is next after reading the current one), NTFS stores the start of each run of contiguous clusters, and presumably how long the chain will be before the next extent is to continue the chain.

This avoids the scalability impact of FATxx File Allocation Tables, at the risk of adding the "lie to me" meta-bugs of "thin provisioning", e.g. where assumed compression, sparse files etc. fail to actually fit within available space.  There's also a lot of hop, skip and jump when MFT and other files have to extended to arbitrary fragments in the storage map, inviting further resource depletions and errors elsewhere, boosting write amplification, and widening critical periods while our digital Superman is poised mid-leap between skyscrapers.

The mystery is then not why things go wrong during massive file ops on a busy NTFS that has never been defragged, but why this is only rearing its head after KB5063878?  

  • What code has KB5063878 changed?  
  • Is it specific to the Sandbox subsystem?  
  • Is it affected by Intel's bigLITTLE mix of "real" and "eco" cores, and how Windows assigns threads to these? 
  • Does it happen less if stealth Device Encryption is not imposed?  
  • Does it still happen when offline, excluding incoming pokes?  
  • Does it still happen if all Power Management is disabled and flattened, including Modern Connected Standby and network "magic packet" wakes?  
  • Is it related to any particular hardware or firmware, aside from SSD controllers?  
  • Does it happen to low-spec SSDs, eMMCs and hard drives, or over SATA or USB?

These may be the next set of questions to test, now that we may have repro(ducability) at last.


04 September 2025

KB5063878 Bug: Gotcha!!

I think I've figured out the data-killing KB5063878 bug; it's when UAC tangles with UI-less activities, as described in this report.  Throws back to Vista's birth pains, when attempting add back lost immutability to the many-to-many relationship between things that happen, to what should not be allowed  :-)

The report speaks of unexpected UAC prompts that now pop up due to changes added by KB5063878.  

If that collides with last month's changes to UI-less "pre-Windows" code, e.g. before BCD is processed to menu OSLoaders (or blunder into {default}), or when WinRE boots instead, or when a "mini-Windows" is applying deep code changes before Windows loads, then you'll never see an error message, let alone UAC prompt to which the user can respond.  

This is vendor-knows-best territory, locking out users and administrators alike.

Now if those changes were attempting to change partitioning, e.g. to shrink C: for space to be assigned to a new WinRE Recovery partition, then things could get messy - especially in a multi-threaded environment, and/or failure of the "black box" of steps to be properly atomic.  

For example, imagine if one part of the process is allowed to message updated partition info to other threads, but another part of the process is blocked from actually applying those changes, then the runtime hopefully will screw up and crash out of functioning before it writes raw data to the wrong storage addresses, trashing file systems and/or partitions.  If less lucky, such writes may trash these structures, so the storage is latched into a corrupted state.

So, is an apparent automatic recovery on reboot, actually safe?  Well, if NTFS C: is foreshortened, everything may still appear valid and work.  If the runtime crashes out, then the "dirty bit" should remain set, prompting the next boot's AutoChk to "kill, bury, deny" the file system's partition-end mismatch, "fixing" it to something at least valid for future file system operations. 

The user may or may not see an AutoChk prompt to press a key to skip checking drive C:, but with fast-enough SSDs and the trend to hide details from users ("don'tcha wurry your pretty littul haid, Sue-Ellen, ever'thing's gonna be fine-just-fine"), perhaps that will be hidden, too.

This assumes Fast Startup doesn't just resume the doomed runtime and botch everything not already botched thus far, but that's likely to have been disabled for the next boot by some sort of "update in progress, boot properly, run this OSLoader instead" or similar logic.

Anyway, I think that is where I'd dig next, if trying to fix this mess, rather than just claiming "not my dog" after testing to clear a path to legally disclaim responsibility.

Why does bulk testing miss this?

Previous post explains that; bug may only arise when the full set of real-world conditions apply.  Simply trashing an SSD controller with bulk writes from a KB5063878-updated Win11 won't do it, and neither may a virginal Win11 24H2 update to KB5063878 do it, if bulk writes don't happen at a time to overlap other factors that may not be present, etc.

Why only with bulk operations?

There may be race conditions involving various levels of OS and component cache managements that arise only in the context of cache saturation and flush periods extending beyond sate time-outs or blind wait periods.  There may also be interplay with Delayed Start and Scheduled Tasks, especially when certain OEMs trigger underfootware to run every few minutes.

Vendor disclosure attempted

I've alerted Microsoft via Feedback Hub, and my ex-MVP colleagues via private email list, in case they don't experience the same lightbulb effect I did, when reading the "feed seed" article linked above.


02 September 2025

KB5063878 Storage Corruption: External Factors?

Following up on reports and post-test denials of August Cumulative + SSU KB5063878 corrupting and possibly destroying storage when under a 50G+ bulk operation load, some likely scenarios come to mind, that may be missed during artificial accelerated testing sessions.

The initial focus was on SSDs based on Phison controllers, prompting Phison to test and exclude their controllers as a cause of the problem, while recommending heat sinks to protect SSDs against load-related failures.  Subsequent reports suggest other SSDs, and even hard drives, can also be affected.

Accelerated testing?

Phison claims 4,500 cumulative testing hours across the drives reported as potentially impacted and conducted over 2,200 test cycles, which would be 187 days of testing if done on a single device.  Testing 1,000 devices in parallel would reduce testing clock time to 4.5 hours per device, each iterating to a bit over 2 test cycles per device.  You can shift the numbers around, e.g. 100 devices etc. limited by the number of clock days since the problems were first reported - but it's unlikely testing would have been real-world, i.e. based on individually-installed Windows 11 with a wide range of co-installed software, etc.

From Phison's perspective, all those software variables are irrelevant; as long as the hardware itself can be shown to work, it's some other vendors' problem if it's a software thing.  In any case, attention shifts off Phison, and hardware specifics, once reports of other storage devices are considered.  This spectrum of affected devices also suggests this isn't limited to overheating hi-performance SSDs.

The MemTest86 experience

A familiar type of artificial accelerated testing is MemTest86, in search of "bad RAM", but also as proof of hardware ability to not crash, power off, reset or lock up over a "long enough" clock-time period.  I've done this for decades of PC builds, troubleshooting, laptop pre-acceptance testing, etc. and have settled on 24 hours as the shortest 99%-certain test period.  

I've seen one case where the first error showed up at around 25 hours, and one where the first error showed up in an over-weekend 100+hour unattended test run.  In both cases, the first error was the only error, and neither system latched into a persistent error state thereafter.

Shorter test periods, e.g. 18 hours, would be more convenient, e.g. allowing in and out turnaround within the same time of day, but I saw too many first-errors within the 18 to 24 hour period.  Clearly, this makes the typical default 4-pass loops completing in an hour or two, unfit to be trusted as exclusionary.

Even so, "burn-in" testing with MemTest86 is not real-world, as it exercises a very limited subset of what the tested hardware has to do.  It doesn't test GPU or DMA access to RAM, localized heat related to different kinds of CPU activity, and obviously anything to do with storage or other components.

Cache and Race Conditions

Microsoft's test methods are reportedly thorough, but not detailed, and are likely also to involve accelerated automated test methods that may be as narrow in their way, as is MemTest86's testing of processor and RAM.  

Variables may include how soon after Windows 11 boot the tests are started, bearing in mind how "underfootware" can be triggered at arbitrary times - consider Delayed Start, that seeks to pretend Windows boots faster than it actually completes all inits and startups; application pre-loading that seeks to pretend applications aren't slow, but your Windows and hardware may be, stuff triggered via Scheduled Tasks, and hidden ServiceWorkers that may be triggered remotely.

Bulk operations will saturate caches, possibly revealing lower raw direct transfer speeds.  These caches will be full at the "end" of file operations, needing time to spool out to where the data has already pretended to have been written... can you guess the problems that may happen next?

Modern PCs are more like networks of DOS-sized systems.  A modern CPU has enough cache RAM to run a Windows 9x installation, while firmware logic within black-box devices such as hard and solid-state "disk" storage is at least a 20th-century DOS or BIOS.

Computational scale

There's a certain size of code that we may expect to be bug-free, at least if kept fully encapsulated; I'd guess somewhere between a DOS in a 1M memory map, and a Win9x in 4M or so.  

Beyond that, things rapidly bog down such that attempts to complete a project will fail and have to be abandoned (WinAmp 3, Netscape, the original Microsoft Edge, even Windows 11 24H2's attempts to become acceptably reliable before 25H2 is due), or it will become a bunch of separate boxes of code linked together, as is the case with modern PC hardware subsystems and "web apps", or a thin layer of new top-soil over a decades-old mass of existing code, e.g. just about every OS other than Windows that is based on ancient *NIX, or how Windows 95 had to re-use solid 16-bit Assembly code to dance within 4M of RAM.

Whether it's a team of human workers, or a map of code black-boxes, new challenges and inefficiencies arise in how these interact.  Add race conditions that arise when critical periods shift in phase, and exclusionary testing becomes a really hard problem that may defy automation.

So yes; Phison may prove thier controllers are OK, and Microsoft may conclude KB5063878 is OK, but neither may satisfy our need to be sure the KB will be safe on our particular systems, for reasons that both vendors can blow off as "not our problem".  And what happens with this KB, may happen again with others, so we need a systematic fix for future scenarios.

Power (mis-)management

One very likely scenario involves power management, when power to a subsystem is cut before that subsystem has actually done the work that it claimed to have finished.

Consider an external USB hard drive and "Safe To Remove".  To be aware that an external device is connected, it helps to see the relevant icon in the SysTray (sorry, "notification area"), but it's hidden under the More... by default.  To click on it in order to initiate a "Safe To Remove" data flush to storage, you have to see the icon, which is instead hidden on the assumption you only need to see it when it has something to "notify" you.  So far, so... not good.

Let's say you do remember you have an external plugged in and you do click the icon, then await the feedback that the device is safe to remove, or should not be removed because it is still in use.  If you have Focus Assist enabled, you will never see that feedback, because that Notification is not considered sufficiently important, even though it's a part of your Focus that is to be Assisted.  So, how are you supposed to know you can safely unplug the storage device?

We're told this "doesn't matter", but I've seen enough corrupted external storage to know that it does.  The damage may be hidden by the "kill, bury, deny" logic of NTFS transaction rollback, AutoChk and ChkDsk, but you're still losing data that you expected to have been written to storage.

Finally, listen to how external USB hard drives often burble on after the "Safe To Remove" notification pops up.  Has the drive firmware really flushed its cache to platters, or did it lie when it told the parent subsystem that it had finished all pending writes?  Which drives do you think will look faster when tested by hardware reviewers, and feel faster to users (at least while all appears to be well)?

The same glitches that can corrupt external drives, can lose data if a component is expected to be idle, having completed all pending tasks, thus safe to be powered off.

Fast Startup and partition changes

So far, we've considered loss of pending write operations when caches take too long to flush, and/or when subsystems are prematurely disconnected and/or powered off - but there's another aspect to KB5063878 that could trash file systems and partitioning, if factors excluded from automated and/or accelerated testing were to pop up in real-world scenarios as occasional race conditions.

By duhfault, Windows 11 fakes "shutdown" as part of the Fast Startup "feature".  Specifically, instead of doing a true shutdown (which has its own risks when wait time is shorter than time needed to complete tasks), Fast Startup hibernates the system state after all users are logged out.  

The next startup then appears to be faster, because the previous runtime state is Resumed, persisting any runtime glitches, resource depletions, etc.  More to the point, all sorts of sanity-checks and initializations are bypassed, on the assumption that the way things were at (fake) "shutdown", are still holding true when the runtime session is resumed.

So, if an external USB drive was disconnected while the system was "shut down", anything that was still to be saved within the hibernated runtime, will be lost.  And if any changes to partitioning were made in between the (fake) "shutdown" and "startup", then the continued runtime will be unaware, and will write raw storage data blocks to where the partitions and file systems... used to be.

It was this that alerted me to the dangers of "Fast Startup", after booting USB partitioning tools to resize and shift partitions while the system was "shut down".  The next Windows boot then promptly destroyed C: and other partitions, by overwriting the raw areas of storage where these partitions and their file systems were defined.  Fast Startup is Not Your Friend.

The failure pattern I saw when Fast Startup missed out-of-runtime partition changes, is similar to that reported for KB5063878; drives "stop responding" and/or vanishing, once write operations no longer continue using stale and invalid in-memory assumptions on partition and file system raw locations, which may only happen once those raw storage blocks are lost from cache and have to be reloaded, only to find the structures trashed by preceding mis-directed writes.

The next boot may or may not silently "fix" things, either via AutoChk file system "repair", or by WinRE's startup recovery, or even the new "call home and auto-fix" facility that may or may not yet be in play.  These "fixes" may cover up the damage and data loss, but is that enough for you?

KB5063878 and WinRE

KB5063878 does more than change code within Windows; it also changes WinRE.  The previous monthly Cumulative also changed pre-OS code, such as the mini-Windows that hosts the installer for Windows, and/or that which processes the BCD before the decision to boot Windows is taken, as well as the Servicing Stack.  These deeper changes make it harder to uninstall the KB, as the code that manages the uninstallation is itself subject to changes imposed by the KB being uninstalled!

When setting up a new laptop already running 24H2, I noted a 850M Recovery partition as expected.  From within Windows, I shrunk C: to 150G, creating a new D: partition to fill the remaining space on the 500G SSD up to the 850M Recovery and vendor-specific 260M MyAsus partitions at the "end".

Windows Update installed only one Cumulative update, being KB5063878, along with the usual Defender and Dot Net updates.  This may or may not have included changes added by the July 2025 Cumulative, thus creating a "YMMV" for those who had already installed the July Cumulative separately, to become the baseline to which uninstalling KB5063878 would return.

So right there, we have a divergence that would probably be missed by automated testing my Microsoft and Phison, as they seek to disclaim responsibility for reported problems.

After these updates, the space previously occupied by the 850M Recovery partition was left empty, while C: was now smaller, with space between C: and D: being allocated to a new 950M Recovery partition.  It's unclear as to when these partitioning changes were applied, and there may be opportunities for these changes to be mis-merged with bulk file operations, lost to un-flushed caches in prematurely disconnected subsystems and/or hardware devices, tangled up with Modern Connected Standby and/or fake "Shutdown" of Fast Startup, etc.  

"Many a slip between cup and the lip", as they say.  These are the specific scenarios I'd set out to test, if I had the resources to do so - KB5063878 and/or Phison may not be "to blame" when considered in isolation, but nothing exists in isolation in today's sprawling, over-connected infosphere.

What next?

Microsoft is still pushing KB5063878 while still investigating reports of significant data loss.

So, as we can't trust vendors to block dangerous updates from the server side, we need ways to block specific updates before they install, especially when these are too entangled to be uninstalled once injected into the system.

And yes, we can expect scenarios where a malicious FUD campaign may socially-engineer users into delaying updates, to hold the door open to exploit code defects the updates would have fixed.

As it is, this risk is greater when we have to advise users to Pause all updates altogether, as the only way to avoid a specific update reported to be toxic.


29 August 2025

Microsoft: Stop Pushing KB5063878 "Death Patch"

Please, Microsoft; place a "hold" on KB5063878 August Cumulative Update for Windows 11, until it can be trusted not to destroy storage devices, installations and data.

We're encouraged to trust vendors to know best, including blocking updates known to cause trouble and/or suffer from compatibility issues.  Microsoft knows that KB5063878 corrupts storage (although this is not documented here), including destroying storage hardware, yet even after testing indicated the issue affects more than just a few particular SSDs, but also hard drives as well, it still pushed this update yesterday (28 August 2025) to a brand new laptop.

Yes, the issue may "only affect a few systems" and only when doing bulk file transfers of 50G+, but that's exactly what a new system will do straight after mandatory updates; bulk transfer data onto the new system from the one it is to replace.  This scenario is even more likely at a time Microsoft is telling us to replace perfectly capable Windows 10 systems so we can "be supported" on Windows 11 - even as that "support" involves pushing known-lethal updates "to keep us more secure".

As it is, KB5063878 is a nasty face-hugger beast, including as it does a Servicing Stack Update as well as changes to code outside Windows itself; since the previous month's Cumulative changed WinRE, thus the automatic recovery system for failed boot, and likely WinPE, perhaps pre-BCD and UEFI, who knows?  Now that "BIOS" is "extensible", toxic OS drivers can permeate that space via UEFI drivers, as already afflicting the BCD Boot Menu.

So, it's not as simple as uninstalling the update, and/or blocking it by hiding it from future Windows Update activity.  Uninstalling the update may fail with errors, requiring a more elaborate approach via DISM, possibly disabling WinRE and Sandbox first, etc.  Advice then suggests Pausing updates in the hope that Microsoft fixes what is quite a deep change to the code base, in the hope that this happens before the maximum allowed Pause time expires, that rushed fixes don't create new issues, and that exploits don't start hitting whatever KB5063878 may have fixed while we wait.

If we cannot trust Microsoft to place a "hold" on updates that can destroy data, installations and hardware - surely the biggest impact possible - then we need a way to block particular updates before they get rammed into the system.  We should not have to first accept the update before uninstalling and blocking it, nor should we have to Pause updates altogether, just to avoid a crisis du jour.

25 August 2025

Windows Update: Control Bandwidth

Here's a performance tip affecting online traffic in particular; control the bandwidth allowed for manual and background Windows Update activity.  Use this to speed up manual updates when you aren't doing anything else, and slow down background activity when you are busy!

The settings for this are buried deep in Settings, Update and Security.  Go down to "Advanced options", then scroll down below what may appear to end with "Pause updates", to "Delivery Optimization" and click into that link.  Then "Advanced options" on the next page that appears, to finally reach the playground.

I choose the "Percentage..." radio button, then check the boxes for "...background" and "...foreground", to accept the defaults, or set to taste.  The "...foreground" setting should apply when you manually visit Settings, Updates and click "Check for updates", then "Download and install".

UI safety tip

It's so easy to beef about mandatory Windows updates, that you can miss what control is offered, and it's easy to miss things hidden below a "scroll" you didn't know was there.  

You can fix that UI risk via Settings, Ease of Access, scroll down (of course), "Automatically hide scrollbars in Windows", and turn that Off.  UIs should tell the truth, the whole truth, and nothing but the truth; hence the advice to "always go Custom, check every Advanced" etc. when dealing with software, especially in contexts where vendor goals may mis-align with yours.

Vive la difference!

I was working on two laptops, neither of which are supported in Windows 11.  Both had Intel i5 processors, before AMD shamed Intel into offering more cores in the 8xxx generation, when up-spending on laptop "i5" and "i7" got you what would be called "i3" in a desktop PC - but one was a Generation Zero xxx with a SATA SSD, and the other was a generation 7xxx with a hard drive.

Guess which was slower, and why?  Yep, the 7th generation system went 100% storage and bogged down so badly, WhatsApp Web would give up and release the link, windows would grey out as Not Responding, and it was hard to get a click in edgeways.  The hogs were Windows background rubbish; not just the visible Windows Update but "App" updates and Feed, even though the News pimple on the Taskbar was long disabled.  It felt like something one just had to accept... at the time.

Later I was manually catching up a fresh Windows 10 to 11 upgrade, which had downloaded and installed overnight.  Next morning I clicked to Restart, and the new Windows installed fairly quickly and cleanly; then off to Windows Update, Check for updates... and that took hours to download 10%.

I finally got a clue, and checked the settings described above - and yes, way back when originally setting up Windows 10, I'd set both bandwidth % sliders to minimum, when still allowing peer-to-peer update traffic over LAN, but not Internet.  Once I changed the "...foreground" slider to max, the remaining 90% of the download, and the install, was done within minutes!

So if you're wondering why a routine Cumulative is taking as long as an OS download and install, go dig into that "advanced..." etc. UI, and ye may find what ye seek  :-)


22 August 2025

Windows 10 EOL is not about Windows 11, it's about OneDrive "Backup"

The end of Windows 10 support is not about Windows 11; it's about stampeding everyone on to OneDrive cloud storage - either as a pure money-maker, and/or to extend geopolitical reach.

Microsoft has to patch Windows 10 anyway

Consider: If Windows 10 will have code repair updates developed against exploitability for those who choose to buy extended support for three years, then that work has to be done anyway.  Extending update delivery to systems is the same cost, whether they are on Windows 10 or 11.  A wider pool of Windows 10 users may mean more niche testing, but also means more involuntary testers, making it easier and quicker to find out what needs fixing next.  

In the worst-case scenario, a massive exploitable base of unpatched Windows 10 systems could pose risk to everything connected to the Internet, which may compel Microsoft to "support" (fix) all those systems.

Folks aren't joyously flooding to Windows 11, throughout years of ongoing development, right up to these final days before Windows 10 (and Windows 11 23H2 and older) go out of support.  Many systems are disqualified due to TPM 2.0 and other requirements, but others are simply user refusal, as well as the inertia of large managed networks.  

In response, Microsoft offers Extend Security Updates (ESU) programs to both professional networks administrators and consumers.  The "pro network" crowd have to pay, but a new "free" option has been added for consumers that looks too good to be true... and is.

Baked-in ransomware

If you allow the Windows Out Of Box Experience (OOBE) to lead you by the nose ("a little WiFi here, a little sign-in there"), then this is what happens:
  • you sign in to an online Microsoft Account
  • your internal storage is encrypted without your knowledge or consent
  • the encryption key is available only from your online Microsoft Account
  • all appears to work as normal, so you don't get the key you didn't know you'd need

Now if that smells like a ransomware attack, that's because it is exactly that.  Microsoft doesn't extort money upfront, payment doesn't involve complicated crypto currencies, and it's not about payment anyway; it's about the option to deny access, either for breach of the vendor's private law (the EUL"A" that no-one reads) or at the behest of US policy, such that sanctions can apply to data as well as money.

Data survivability is now brittle, as various local situations can trigger a demand for the key:

  • you need to boot into Safe Mode
  • you need to access your storage from a different system
  • something glitches the TPM, e.g. a firmware ("BIOS") update

There may be server-side issues too, e.g. your online account is hacked, or deleted by the vendor, or you follow advice to discard the account to use a Local account instead, or your account hasn't been signed in for "too long", or the vendor or US policy applies a "data sanction" on you.

Theft-to-cloud as "backup"

Backup is actually a hard problem; the aim is to keep all wanted data changes, while excluding all unwanted changes - a mix of sheep and goats, needles in the haystack.  Strategies vary, but always involve multiple copies of data such that if one is afflicted, the other is available to restore.

Sync is the opposite of backup; if anything bad happens on any one system, that unwanted change is immediately propagated to all systems.  The server beyond your reach is now the dog, and all "your" devices are now its chew-toys.  Whatever entity signs into that online account is deemed to be "you", and shares some control with the vendor; if you're locked out, sorry for you!

Once you get past the OOBE (if not before), you're pestered to "backup" to OneDrive.  If you swallow the bait, the content of many shell folders is automagically copied to the OneDrive cloud storage service, while what you see locally as your files may appear to work, but in reality may have been replaced with stubs to online files, "to save space".  Just like automatic Device Encryption, this payload is hidden, with delayed effects that arise when you try to work offline, and "your" files aren't found within the large cache footprint that the cloud service uses as an ashtray.

So, now your data is exfiltrated, local copies destroyed, and you're even more vendor-dependent.  When you run out of free space at the server end, you'll have to pay up for more space, or buy some other service that bundles the extra space you need.  

This is a straightforward hook-and-reel-in sales scam, similar to a time-bombed "free trial" that lasts just long enough to create data you can't use unless you pay (especially Outlook's .pst walled garden).  That's a significant incentive to the vendor, leaving aside any geopolitical implications.

You can use Windows 11 safely

As at August 2025, there are ways to skirt these risks while still upgrading to Windows 11 for more effective ongoing support.  There are ways to break into the OOBE to trigger a restart that will add small links for "I don't have Internet" and "Continue with limited setup"; there are ways to craft a bootable USB installer that bypasses various compatibility checks and onerous UI pressures.  As long as you can install Windows 11 and run the OOBE while safely not connected to the Internet, you have the potential to be safe; staying that way requires ongoing resistance to embedded sales pitches.

There are also ways to block Device Encryption by policy, as applies via a .reg import or direct Regedit; to hide OneDrive, or at least stop it reducing your files to online pointer stubs, and so on.  If Device Encryption is already in effect, you can step over the scary warning to turn that off; maybe it will take a long time, as the warning states, or maybe not - the whole process is so opaque (or transparent, in that one sees through it even if trying to see it is harder) that I've no idea if it completes as quickly as it seems, or if it grumbles along unseen for hours or days.

Upgrade carefully, if system is compatible

In my opinion, it's better to carefully upgrade Windows 10 to 11, after suitable backups, as long as your system is compatible, than stay on Windows 10.  It's also a good time to upgrade the OS hard drive to a speedier SSD, as that way the original hard drive can be your "undo" fallback backup.

If your system is incompatible, you may be able to force the upgrade via Rufus or similar tools, and/or more manual methods - but I'd be reluctant to do that.  "Hard" incompatibilities include PopCnt instruction and SSE 4.2 support; without these, Windows 11 24H2 (the minimum version supported after October 2025) will likely BSoD on boot.

Some of the softer requirements may be attained by changing partitioning from MBR to GPT, changing boot mode from CSM BIOS emulation to UEFI, enabling Secure Boot, and enabling TPM, either as such, or drilling down into CMOS Setup to where the processor vendor implements this as a fTPM.

So far, so good - but if you can't pass the PC Health Check or the Windows 11 Installation Assisitant won't install, then you'd have to resort to an "unsupported" state via bypass methods e.g. Rufus.  I don't see a great future there; the PC may be fine, until some update or annual new version starts to invoke things that are not there, which could leave the system unable to run, or even boot up.

"If a bad guy can run code on your system"...

Microsoft's 2000 Ten Immutable Laws of Security still make sense to me, even if the battle to keep our computers "Personal" has long been lost.  The list has been weaseled to pass off "the cloud" (= other people's computers) as safe enough, but the original is here.  The first Law:

"If a bad guy can persuade you to run his program on your computer, it's not your computer anymore"

I'd show you the rest of the laws, virtually all of which are broken by the unwanted intimacy of current vandor (vandal/vendor) practice, but the WayBack archive page first bordered on the unusable (refreshing banner ad moving the page, refusal to copy selected text to clipbord or print the page), then after coerced "donation", lost where I'd come from and failed to load the page when the URL was re-pasted.  Enshittification is certainly not unique to a few big vendors; buggy code is everywhere, and it can be hard to distinguish stupidity from perfidity!  Bah, humbug, etc.

Should you trsut your vendor?

At the top of the Trust Stack is the intent of the party to be trusted; at the bottom is the competence to do what they intend to do.  There have been updates that either trashed data completely, or accidentally placed it out of reach, raising concerns about the bottom of the trust stack, as if the need to constantly fix code via "updates" wasn't enough.

Most of the links are about a scenario where the user profile subtree in C:\Users was shunted off and replaced with a new profile, so at least the files could be found... unless they couldn't in some cases.  However, I remember a much harder crisis where user data was deleted completely, not in the recycle bin, due to a side-effect related to... OneDrive (formerly called SkyDrive, until someone watched the Terminator movies and suggested a branding change).

It is utterly indefensible for a code vendor to delete user data, no matter what they were trying to do with it; the sheer arrogance beggars belief. Moving directory file entries to a C:\Windows.old is one thing, but to delete completely and irreversibly, is quite another risk to take with what is not yours.

So... should you trust the "man behind the curtain"?  After all, well-resourced professionals with a large budget should keep their servers running more reliably than a home user's system, and you may feel that is true for you.  

But leaving aside vendor priorities and intent, the fact is that there's nothing magical about what the cloud is made of; it's all stacked layers of code with significant error rates, whether it's the cross-platform compilers, the microcode squirted into what used to be "hard logic" processors, a UEFI as complex as Windows 95 (or "extensions" thereof), the firmware within off-processor components as complex as MS-DOS, web browsers and the unwashed junk they have to run, or the strapping together of these things in de-featured web Apps and PWAs.

We have no insight into what goes on in cloud servers; the fixes, emergency kludges, crises narrowly avoided or not, the data loss affecting "only a few users", etc. until a Cloudflare mess wakes us up.

21 August 2025

AI: The Computer Revolution, Again

I'm lucky to have experienced personal computing from the beginning, from Sinclair's Black Watch, ZX80, ZX81 and Spectrum through Pick R83 and MS-DOS 3.3 to today's herding into Microsoft's pen for Windows 11 survivability.

We are perhaps the last human generation to believe we could understand electronic digital computers at every level, from transistors and logic gates through to automated online banner ad auctions, Bitcoin mining, botnets, etc.  It's got so complex, newcomers have to choose which slice of the tottering stack in which they will specialize; code is becoming ephemeral, beyond human scope if it is to be kept on track.

In the 16-bit home computing era, computers became our toys, while futurists were still telling us we needed to learn binary arithmetic at school to prepare for tomorrow's careers.  Hobbyists were asked to justify the time and effort they lavished on their FREDs (Folking Ridiculous Electronic Devices), who would reply "Look, I can create a page of printed text in minutes!" ...not counting minutes spent waiting for code to load off audio cassette at one end, and the noisy printer at the other.

So it is now, with "AI", i.e. the extension of expert-system pre-loaded wisdom, to machine learning.  We play with ChatGPT etc. as toys, while bigger budgets put AI to work; new but transient careers beckon, and early adopters may find the skills built in advance are as mis-aligned as trying to apply only binary arithmetic to higher-level programming languages.

Performance is not yet there; early AI-capable laptops and PCs are as rare and costly as the first round of "multimedia" (sound card, CD-ROM, video playback, a handful of available titles) before Windows 95. Nvidia's monster AI chips are as unattainable for us as 3DFX's dedicated 3D accelerators were way back in the day, when affordable "Windows Accelerator" graphic cards just couldn't cut it for new 3D games.  Industrial-grade AI thrives on ye olde IBM mainframe budgets of half a century ago.

This time round, I'm content to watch from the sidelines - if I was 20 years old again, I'd have jumped into Android when toy smartphones escaped Apple's iron grip, I'd have years of Linux under my belt, and I'd be very actively involved in "playing with AI".


05 August 2025

Bug: Windows 11 Safe Cmd OSLoader loads Explorer.exe as shell

Using BCDEdit to /Copy {default} to two new GUIDs, setting those to Safe Mode and Safe Cmd, then adding the GUIDs to {bootmgr} via /DisplayOrder, is a great way to pause the Windows boot process at a BCD boot menu, to either power off or choose a safer option when needed.

I started doing this in Windows 7 and it's worked well up until Windows 11, possibly version 24H2, where the alternate shell directive (SafeBootAlternateShell)  in the Safe Cmd OSLoader is ignored, causing Explorer.exe to load as the shell instead. This may run unwanted code integrated into Explorer.exe, or cause the system to crash if something is seriously awry within the Explorer.exe shell sub-system - so advice to "just RegEdit HKLM...WinLogin, Shell and restart" won't avoid that risk.

If I navigate the BCD boot menu via the Tab key or mouse to "Change defaults or choose other options" section below the OSLoader list, and use the Boot Options there to force Cmd as shell, that works after the usual restart and boot.  Command Prompt also works when selected from OSLoaders that launch a .wim via RAM Drive, e.g. the built-in WinRE or added WinPE, such as offered by Macrium Reflect, EaseUS To Do Backup, or your own "home-rolled" WinPE.

So there's something amiss with how Windows 11's pre-OS code interprets OSLoader settings to ignore the setting to use alternate shell , or something else at that fork in the BCD interpretation logic.

Here's what these OSLoaders look like, from a working Windows 10 22H2 system:

C:\WINDOWS\system32>BCDEdit /Enum OSLoader

Windows Boot Loader
-------------------
identifier              {current}
device                  partition=C:
path                    \WINDOWS\system32\winload.efi
description             Windows 10
locale                  en-US
inherit                 {bootloadersettings}
displaymessageoverride  Recovery
recoveryenabled         Yes
isolatedcontext         Yes
osdevice                partition=C:
systemroot              \WINDOWS
resumeobject            {<GUID1>}
nx                      OptIn
bootmenupolicy          Standard

Windows Boot Loader
-------------------
identifier              {<GUID2>}
device                  partition=C:
path                    \WINDOWS\system32\winload.efi
description             Safe Mode
locale                  en-US
inherit                 {bootloadersettings}
displaymessageoverride  Recovery
recoveryenabled         Yes
isolatedcontext         Yes
osdevice                partition=C:
systemroot              \WINDOWS
resumeobject            {<GUID1>}
nx                      OptIn
safeboot                Minimal
bootmenupolicy          Standard
sos                     Yes

Windows Boot Loader
-------------------
identifier              {<GUID3>}
device                  partition=C:
path                    \WINDOWS\system32\winload.efi
description             Safe Cmd
locale                  en-US
inherit                 {bootloadersettings}
displaymessageoverride  Recovery
recoveryenabled         Yes
isolatedcontext         Yes
osdevice                partition=C:
systemroot              \WINDOWS
resumeobject            {<GUID1>}
nx                      OptIn
safeboot                Minimal
bootmenupolicy          Standard
safebootalternateshell  Yes
sos                     Yes

It is the safebootalternateshell = Yes that is ignored, in Windows 11 24H2.