27 March 2008

Why "One Bad Sector" Often Kills You

Technorati tags:

Has it ever seemed to you, that if there's "one bad sector" on a hard drive, it will often be where it can hurt you the most?

Well, there may be reasons for that - and the take-home should affect the way file systems such as NTFS are designed.

As it is, when I see early bad sectors, they are often in frequently-accessed locations.  This isn't because I don't look for bad sectors unless the PC fails, as I routinely do surface scans whenever PCs come in for any sort of work.  It's good CYA practice to do this, saving you from making excuses when what you were asked to do, causes damage due to unexpected pre-existing hardware damage.

Why might frequently-accessed sectors fail?

You could postulate physical wear of the disk surface, especially if the air space is polluted with particular matter, e.g. from a failed filter or seal, or debris thrown up from a head strike.  This might wear the disk surface most, wherever the heads were most often positioned.

You could postulate higher write traffic to increase the risk of a poor or failed write that invalidates the sector.

Or you could note that if a head crash is going to happen, it's most likely to happen where the heads are most often positioned.

All of the above is worse if the frequently-accessed material is never relocated by file updates, or defrag.  That may apply to files that are always "in use", as well as structural elements of the file system such as FATs, NTFS MFT, etc. 

Core code files may also be candidates if they have to be repeatedly re-read after being paged out of RAM - suggesting a risk mechanism that involves access rather than writes, if so.

As it is, I've often seen "one bad sector" within a crucial registry hive, or one of the core code files back in the Win9x days.  Both of these cause particular failure patterns that I've seen often enough to recognize, e.g. the Win9x system that rolls smoothly from boot to desktop and directly to shutdown, with no error messages, that happens when one of the core code files is bent.

I've often seen "one bad sector" within frequently-updated file system elements, such as FATs, NTFS "used sectors" bitmap, root directory, etc. which may explain why data recovery from bad-sector-stricken NTFS is so often unsatisfactory. 

But that's another post...

2 comments:

Miha said...

I'm curious to know which tools do you use for scanning?

Chris Quirke said...

I'm using HD Tune, a very useful free tool from www.hdtune.com - it doesn't have to be "installed", can run from Bart CDR, but needs admin rights to be useful in Vista and won't run in WinPE 2.0

There's also a generic limitation with S.M.A.R.T. and USB access to IDE or S-ATA hard drives, that means no temperature or S.M.A.R.T. details can be seen via USB-interfaced external housings.

HD Tune has various tabs to do benchmarking, show drive details, S.M.A.R.T. details, and do a surface scan (I avoid the "quick" checkbox there). The tabs are updated in real time and can be watched when a task is active in another tab, e.g. you can watch S.M.A.R.T. detail counters going up as a surface scan is in progress.

In addition, the drive temperature is visible at all times, both above the tabs, and also in the SysTray if that is running (as it isn't, in Safe Cmd or Bart CDR boot).

The surface scan ignores partitions, and doesn't try to "fix" anything - though the process can trigger the hard drive's firmware to attempt such "fixes", as watched via the S.M.A.R.T. tab.