28 August 2007

Design vs. Code Errors

Technorati tags: ,

When Microsoft finds a code error, it generally fixes this fairly promptly.

In contrast, design errors generally remain unfixed for several generations of products; sometimes years, sometimes decades.  Typically even when addressed, the original design will be defended as "not an error" or "works as designed".

Old ideas that don't fit

As an example of bad design that has persisted from the original Windows 95 through to Vista, consider the in appropriateness of Format on the top layer of the Drive context menu.

The logic is old, and still true; hard drives are disks, and formatting is something you do to disks, therefore etc. 

But around this unchanged truism, other things have changed. 

We now have more things we can do to disks, many of which should be done more often than they are; backup, check for errors, defrag.  Because these are "new" (as at Windows 95), they are tucked several clicks deeper in the UI, e.g. Properties, Tools.

Also, the word "Format" has some to mean different things to users.  In 1985, users would routinely buy blank diskettes that had to be formatted before use, and so the immediate meaning of the word "format" was "to make a disk empty by destroying all existing contents".  In 2007, users store things on USB sticks or optical disks, none of which have to be formatted (unless you use packet writing on RW disks) and the immediate meaning of the word "format" is "to make pretty", as in "auto-format this Word document" and "richly-formatted text".

The goal of software is to abstract the system towards the user's understanding of what they want to do.  In keeping with this, "hard drives" have taken on a different conceptual meaning, away from the system reality of disks, towards an abstracted notion of "where things go".  In particular, modern Windows tends to gloss over paths, directories etc. with conceptual locations such as "the desktop", "documents" etc. and the use of Search to find things vs. formal file system navigation across disks and directories.

New things that break old truths

When a risk doesn't arise due to hard scopes, one doesn't have to consider it.  For example, if you build a house with a mountain as your back wall, you don't have to think about burglar-proofing the back wall.  For example, if your LAN is cable-only in a physically-secured building, you have less worries about intrusion than if you'd added WiFi to the mix.

When a risk doesn't arise because a previous team anticipated and definitively fixed it, future teams may be oblivious to it as a risk.  As Windows is decades old, and few programmers stay at the rock face for decades without being promoted to management or leaving, there's a real risk that today's teams will act as "new brooms", sweeping the platform into old risks.

In many of these cases, the risks were immediately obvious to me:

  • \Autorun.inf processing of hard drive volumes
  • Auto-running macros in "documents"
  • Active content in web pages

In some cases, I missed the risk until the first exploit:

  • Unfamiliar .ext and scripting languages

But it generally takes none to one exploit example for me to get the message, and take steps to wall out that risk.  Alas, Microsoft keeps digging for generations:

Auto-binding File and Print Sharing to DUN in Win9x, the way WiFi has been rolled out, dropping "network client" NT into consumerland as XP, hidden admin shares, exposing LSASS and RPC without firewall protection, encouraging path-agnostic file selection via Search... all of these are examples of changes that increase exposure to old risks, and/or new brooms that undermine definitive solutions as delivered by previous teams. 

For example, the folks who designed DOS were careful to ensure that the type of file would always be immediately visible via the file name extension, limiting code types to .COM, .EXE and .BAT, and they were careful to ensure every file had a unique filespec, so that you'd not "open" the wrong one.

These measures basically solved most malware file-spoofing problems, but subsequent teams hide file name extensions, apply poor file type discipline, dumb "run" vs. "view"/"edit" down to the meaningless "open", act on hidden file type info without checking this matches what the user saw, and encourage searching for files that may pull up the wrong filespec.

Avoiding bad design

How would I prevent bad designs reaching the market, and thus creating an installed vendor/user base that create problems when the design is changed?

  • Keep core safety axioms in mind
  • Maintain old/new team continuity
  • Reassess logic of existing practices
  • Don't force pro-IT mindset on consumers
  • Assume bad intent for any external material
  • Make no assumptions of vendor trustworthiness

The classic safe hex rules...

  • Nothing runs on this system unless I choose to run it
  • I will assess and decide on all content before running it

...seem old and restrictive, but breaking these underlies most malware exploits.

2 comments:

netjustin said...

Bravo, Chris. Another favorite on that list of no-nos is enabling the NIC before Windows XP installation is completed. That, coupled with File Printer Sharing enabled by default, and of course each of the appropriate pinholes is great wormbait during those installations! ^_^

Chris Quirke said...

There are a number of expansions on "never turn your back on an installer": Stay offline, write-protect all installation disks, don't turn off av (tho oft instructed to), and don't consider the job done until you've done a first-run and gone through all the settings.

That's because the relationship between sware vendor and user is either triangular (vendor makes money from some 3rd-party, thus serves their interests first) or bilaterally hostile (you want to use the sware, they want you to pay for it).

So, aside from network exposure of exploit surfaces, it's best to stay offline during installs - and that's one objection I have to IE7.

The issue gets another dimension when you use a baseline patch-level code base (e.g. boot an installation or mOS disk); should that be allowed to "connect" to anything, e.g. to run online scanners, etc.?