Geek summary: First post-install Win10 update of Intel Graphics drivers for i5-1035G1 renders the BCD Boot Menu invisible, although it still works. Fixed if Device Manager, Display Adapter is Disabled; problem reproduced if Enabled, effects taking place after Windows restart.
I suspect the cause is failure of the driver to attain color values when started in the raw EFI context, as using the Win10 Settings, Recovery, Advanced UI will show the boot menu in proper color. That UI reaches a different boot menu, with the normal boot menu seen via Other Operating Systems UI, without restarting through raw EFI boot. Either the first menu applies the needed color settings, or bypassing the raw EFI phase preserves the successful Win10 OS context.
Test system where problem encountered; brand new Asus laptop based on new 10nm 10xxGx series processor, specifically i5-1035G1. Not encountered in a new desktop PC built on Gigabyte motherboard with Pentium Gold G6400 processor, also as set up last week.
Background
EFI boot from internal storage enters that storage via {bootmgr}, which displays a boot menu if there are more than one OSLoader entry in the "DisplayOrder". By default there's only one entry to boot Windows 10, so this boot menu is normally bypassed, and the bug is thus unobserved.
As part of my standard setup, I add boot entries for Safe Mode and Safe Cmd, to float these less-destructive troubleshooting opportunities above the deceptively-named "Refresh Your PC" (a bit more than a F5 web page "refresh") and "Reset Your PC" (far beyond pressing the Reset button to force a bad-exit Restart) bear-traps that you'd have to walk past to eventually find the Safe Modes. This causes {bootmgr} to display the BCD Boot Menu for the Timeout seconds, thus revealing the bug.
Failure pattern
This particular system displays a GUI "Asus" image during the EFI firmware phase of the boot process, which fades before the BCD Boot Menu appears. As this logo fades, the color undergoes a subtle shift to a less-blue hue of white; possibly a switch to greyscale, rather than a Win10 "night light" setting (as changing that setting does not change this behavior). When the failure pattern is not in effect, the Asus logo does not change hue as it fades.
Normally, you'd then see the Boot Menu, but instead, the screen stays black. There's still display signal present, and if if blindly use the arrow keys before pressing Enter, the menu works; you'd load whichever menu item you'd blindly selected. If you use the trackpad or a mouse to move the mouse pointer, it will appear as the expected white arrow, and blindly clicking will also succeed in selecting and launching a menu entry.
If you do nothing, the screen remains black for Timeout seconds and then boots normally. The initial impression is that the system has "hung" or "crashed" (untrue, as safely tested by pressing Caps Lock to toggle the keyboard LED) or that the system is way slower to boot than expected, especially for an NVMe SSD.
Problem onset
I set up systems offline, to limit problems to one system rather than whatever is being pushed from the entire Internet. During this phase, the BCD Boot Menu worked normally as expected, both before and after upgrading the "new laptop" Windows 10 version to a freshly-made version 2004.
Problem only appeared after attempting to disable Asus's aggressive underfootware, and initially I ascribed it to this and quickly reversed changes back to the default non-Microsoft Services, Startup entries, and Scheduled Tasks. However, this was also the first Restart after going online and letting Windows Update pull down and install updates, which included "driver updates", which in turn included OEM programs now pushed as "drivers" to evade user management via Settings, Apps or Control Panel, Programs and Features.
The fix
BIOS update, re-defaulting CMOS Setup settings, power off at the mains, holding down Power switch (part of keyboard) for 20+ seconds, BCDEdit nudge to {bootmgr} do not fix. Device Manager, Display Adapter, Update Driver reports the latest (thus surely the "best") driver is already installed, and the Rollback Driver button is greyed out.
What fixes the problem, is Device Manager, Display Adapter, Disable and then a Shudown UI, Restart to put this change into effect across the EFI boot phase. Enabling the Display Adapter reproduces the failure pattern after the Restart; the problem remains present until Display Adapter is Disabled again.
Note; I also disable the Windows 10 "Fast Startup" setting via the convoluted Settings, Power UI required. So at least we know we're not resuming a flawed system runtime after a fake "shutdown".
Likely cause
I suspect the Intel graphics driver depends on context established by Windows, which is absent (nul pointer, anyone?) when the driver is run from raw EFI. It either sets an incorrect graphics mode, or draws color values from zero'd memory such that "ink" and "paper" are both black.
Safety implications
Class 3 UEFI forces EFI boot, and thus all the flaky complexities of "Extensibility". Whereas the ancient BIOS/MBR code was sufficiently trivial to be free of bugs, EFI is not, and adds the risk of malware positioning itself to run before any OS or storage device can boot.
The fact that a Windows driver can poison the pre-OS EFI boot process is worrying, especially as the choice of driver to load is either read by pre-OS EFI from Windows, or has been latched into pre-OS EFI behavior by a setting applied from within Windows.
Scenario 1
EFI executable .efi files are able to read the Windows registry, and do so, as the BCD is in fact a Windows registry hive in structure. However, {bootmgr} is expected to be OS-agnostic, as at the time the Boot Menu is displayed, no decision has been taken as to what OS to boot - could be any version of installed Windows, a PreOS WinPE, a Linux, anything. So the code that runs before the Boot Menu should not dip into Windows registry hives, e.g. to load drivers or pull variables such as the colors to use for the boot menu, etc.
In fact, safest would be for pre-OS {bootmgr} code to use the lowest default screen resolution, rather than loading any 3rd-party "drivers" for a "better visual experience". This is a similar safety issue as code integration into "safe modes" (e.g. screen savers).
Scenario 2
When a device driver is selected in Windows, e.g. by disabling or enabling a Display Adapter, Windows may also be changing drivers within firmware EFI. If so, then a different EFI driver will load, depending on that Windows setting, and a buggy EFI display driver could cause the problem directly, rather than via using null data.
All this is hard to assess, as modern systems blur hardware, firmware, "BIOS", drivers and OSs. Everything is now likely to contain non-trivial and thus buggy code, and everything is treated as a black-box object that may "leak". The interface programming model is supposed to blacken the boxes of the object-orientated model, hiding the gooey details more effectively; instead of the "calling code" examining exposed variables (object Properties), it now asks the object to return these variables (object Methods), trutsing the object's code to do that - which is not a great safety/security idea.
No comments:
Post a Comment