14 November 2020

Don't Kick Away The Ladder

You climb a ladder onto the roof, then kick away the ladder. 
How do you safely get down to put the ladder back up so you can safely get down?

Such an obvious f-up, you wonder why I need write this post.  Like a cartoon character who runs off a cliff, panics while briefly suspended in mid-air, then inevitably plunges to a cartoon death; surely, system designers aren't that stupid?  

Examples abound, especially in the age of Class 3 UEFI that forces us to share these stupid risks. The "Extensibility" of UEFI allows code to be integrated before any OS can boot, and this code can persist into the OS runtime, so it's hard to see how this can be "more secure" when fundamentally unsafe.  Years of (U)EFI's buggy "growing up in public" makes it clear such code is insufficiently trivial to be free of exploitable bugs.  A ladder should be trivial enough to never break; flaky firmware can break systems in ways that cannot be fixed!

I'm setting up a new Dell laptop, and in the firmware setup, is an on-by-default option to allow UEFI firmware to connect to the Internet and grope for "updates", before any OS is booted from which malware could be tackled, as if vendor supply-chain attacks had not already happened.  Specifically, if attempts to boot Windows fail successively "too many times", the firmware will try to launch Dell's repair, and if that in turn fails, will try to download the repair material via the Internet.

This Dell laptop also suffers from this bug, based as it is on the same 10nm Intel processor family.  The problem applies to both the original Windows 10 1909 installation, and after this was upgraded to 20H2 as per current Media Creation Tool.  The .iso created by this tool no longer fits a standard DVDR disc, so a bootable USB stick was created instead, and the file set copied from there. 

The laptop's nice NVMe SSD can't be seen from my rescue WinPE boot discs, nor from a freshly-downloaded Kaspersky Rescue Disk; none of these can see the drive via the PCI interface.  Once again, the ladder is kicked away; crucial boot-time code should "always just work", i.e. standard trivial code baked into the firmware, not requiring "special drivers" to work.

As it is, the nature of the UEFI display bug demonstrates how a setting at the top of the ladder (Windows 10 Device Manager, enabling or disabling the Intel display adapter) kicks away the bottom of the ladder (OS-level setting affects pre-OS UEFI, such that pre-OS BCD boot menu fails to display).

Do we really have to wait for more shoes to drop?

No comments: