25 September 2006

Banking on Java

Way back in 2003, South African bank ABSA were in the news after customers had lost money through hacking. Here's a report from 21 July 2003 and another one with more detail. The story was that some uber-hackers robbed ABSA, were caught, and now Internet banking is safe again.

However, check out the detail on Bugbear B from June 2003; an in-the-wild malware that was noted to steal information from a number of banking domains in several different countries, including South Africa. Was there one uber-hacker attacking ABSA, or multiple tiny hacks by folks who figured out how to make use of Bugbear B?

The South African banking industry responded to the ABSA debacle by boasting new improvements in security, implying that what happened to ABSA would never happen at their bank. These improvements included on-screen mouse-driven number pad to avoid keylogging, and free (but UI-less and thus uncontrollable) MyCIO antivirus and firewall from McAfee.

At this point, the article you are reading is going to jump around seemingly-unrelated topics. Have faith; it will all come together at the end...

Microsoft Java

Sun sued Microsoft over the MS Java VM that was included in Windows and Internet Explorer, as Microsoft's Windows-specific extensions broke the "write once, run anywhere" goal of cross-platform usability. Sun contended that developers attracted to MS Java would be locked into Windows by these extensions.

Recently, I cleaned up an XP SP2 system that included Java malware, and which was running the old MS Java VM. I found instructions on removing MS Java, and the steps looked like those that should be done automatically by an uninstaller - if Microsoft had followed their own advice to developers and provided one for MS Java.

Not only did Microsoft provide no Add/Remove entry for the MS Java VM, but running one of the manual steps to remove it popped up a dialog box with the odd warning that "Internet Explorer will no longer be able to download from the World Wide Web". Now I can understand Java applets not working or pages being unable to display as the site intended, but not being able to do standard downloads? Smells like a smoking gun to me...

Sun Java

By now, most users of Java will be using Sun's Java Runtime Engine (JRE) instead of Microsoft's Java Virtual Machine (VM). We've also become accustomed to the need to fix code defects by updating subsystems such as Java, applying code patches, and so on.

A long-standing bone of contention with Sun has been that when you install a new JRE, the old one remains in place - and we suspected this old and vulnerable code could be used and thus exploited by java malware. We bitched about this all the way from 1.4.xx through 1.5.xx, and yet Sun just carried on installing new JREs while leaving old ones (at 100-150M apiece) in place.

It seemed that unlike Microsoft, Sun just didn't "get" what patching was all about. They seemed to think we downloaded and installed new JREs because we wanted kewl new features, and kept the old ones around for backward compatibility - whereas what we really want to do is smash this "backward compatibility" so that malware could not exploit flaws in the old versions.

Finally, Sun came clean and admitted what we'd always suspected; that a Java applet could specify which version of JRE it would like to be interpreted by, and the current version would obligingly hand off to the applet's JRE of choice.

Java malware

The first known Java virus was written in 1998, and detected as StrangeBrew. Since then, Java has been attacked and exploited in various ways, and both Microsoft and old Sun Java JREs are considered to be hi-risk exploitable surfaces. By now, Java malware abounds, and indeed there was such malware on the system I recently cleaned up. The beat goes on.

Note the dates involved in some of the above links, e.g. Sun JRE 1.4.2.xx was found to be exploitable way back in 2004 (the "Sun" link above) - as well as the versions that are vulnerable, such as 1.4.2_04.

Internet Banking in 2006

After cleaning up the system, I uninstalled MS Java VM, checked that no old Sun JREs were present, and installed Sun JRE 1.5.008 as the only Java engine on the system. After a while I had a call to say that Internet Banking wasn't working anymore.

Indeed, it wasn't working, so I called the bank's tech support, explained the system's history and why the MS Java VM had been removed, and they gave me a link to download a fix. The fix turned out to install the MS Java VM again, which I disallowed.

I called back to ask about an update that would work with current Sun Java, and they said yes, the newest version of the software no longer needs MS Java. I was a bit puzzled to hear it took them this long to switch, given that MS Java was pulled from XP in the days of SP1a, and SP1 is now so old that it's about to lose all further testing and patching, with SP2 as the new baseline.

So we rushed off to the city to collect an installation CD for their newest software, as it is not available as a download. This also did not work, and after another tech call, it turns out that this newest software does not support any Sun Java JRE beyond 1.5.005, so I was advised to fall back to that from the 1.5.008 that I was using.

I noticed that the new banking software installed Sun JRE 1.4.2_03, which is ancient and has been vulnerable to attack since 2004 at least. I uninstalled that old JRE when the banking software had finished installing, and after shutting down and restarting Windows, I tried the new banking software, which again failed to work.

After a bit of technical discussion, it turns out that the new banking software's real JRE threshold is in fact 1.4.2_03, and the only reason it "works" up to 1.5.005 is because it relies on these newer JREs to pass control back to 1.4.2_03.

This is really quite nasty, because users will think they are protected against Java exploits because they installed the latest JRE, while in fact the banking software is undermining this safety by slipstreaming in an old exploitable JRE. It makes a mockery of banking's usual assertion that they do their best to maintain security, but are let down by users who fail to keep their PCs safe and clean. There's something odd in being forced to accept an exploitability risk in order to use security-orientated software.

I haven't named the bank in question (it's not ABSA this time), because they are the only bank I've had reason to check out. For all I know, most or all of our local banks may be just as negligent, so it would be unfair to single out this one just because I found out about them first!

15 September 2006

How To Design a mOS

A maintenance OS (mOS) is one that you can use when you daren't trust your system to boot into the OS that is installed on it. Through the DOS and Win9x years, we were used to diskette-booted DOS in this role - but NTFS, > 137G hard drives, USB etc. make this less useful in XP.

As at September 2006, Microsoft provide no mOS for modern Windows, but you can build one for yourself by using Bart PE Builder (and perhaps you should!). Out of the box, Bart CDR meets the criteria for a safe mOS, but you can botch this when "enhancing" it.

There are all sorts of jobs one can do from a mOS, but mainly, it's:
  • Diagnostics
  • Data recovery
  • Malware management
You may need to do all three, when approaching carried-in PCs that "don't work".

Re-establishing safe functioning

Running a PC assumes various levels of functionality work perfectly. When a PC "doesn't work", one has to re-establish each of these in turn, before one can stand on each to reach the next. At each stage, one has to not use what cannot yet be trusted.

Is it safe to plug into the mains?

PCs with metallic rattles when shaken, may not be - a loose metal object could short out circuitry and burn it out. It's best to check inside the case for loose objects; salty wet dust; metal objects, flakes or rinds; power connecters danging onto pins on circuit boards,
and also that the power supply is set to the correct mains voltage, and that rain didn't fall into the case and power supply while the PC was being carried in.

Is the hardware logic safe?

This mainly goes about RAM, but implicit in a 12-hour RAM test is a test to see whether the PC can stay running that long, or will spontaneously reset or hang. The ideal RAM checker would also display processor and motherboard temperatures, and possibly operating voltages, best served with latched lowest and highest detected values.

Is the hard drive safe to use?

That goes about the physical condition of the hard drive, and is tested retrospectively by looking at the S.M.A.R.T. details, and also by test-reading every sector on the drive. It's important not to beat the drive to death; ideally, the surface test should avoid getting stuck in retry loops when a failing sector is encountered, and should abort when the first bad sector is found. The testing process should not attempt to "fix" anything!

Is the hard drive safe to write to?

Certain contexts (e.g. requests to recover deleted data) define the hard drive as being unsafe to write to, because material outside the file system's mapped space is not protected from overwrites. Otherwise, the drive may be considered safe for writes if the file system contains no physical errors, plus the hardware and physical hard drive must pass their tests.

Is the hard drive installation safe to run?

In addition to all of the above, this requires the presence of active malware to be excluded - and in practice, this may form the bulk of your mOS use. There are many challenges here, given that even a combination of anti-malware scanners is likely to miss some things that you'd have to look for and manage by hand.

Is it safe to network?

This goes about what's on the rest of the network (i.e. are all other computers on the LAN clean, and is WiFi allowing arbitrary computers to join this network?) and whether your system is adequately separated (NAT, firewall, patching of network edge code) from the 'net. The latter question has to be asked twice; for the mOS (if you are networking from it) and for the hard drive installation when this is finally booted again.

Boot safety

Many boot CDs are not safe, because they will automatically chain into booting the hard drive unless a key is pressed within a short time-out period. This is particularly dangerous, given that the chaining process ignores CMOS settings that would otherwise define what hard drives are visible, what device should boot next, or whether the hard drive should boot at all.

Every bootable Windows installation disk from Microsoft fails this test. Standard Bart PE is safe here, but has a plugin setting that can select the same automatic chaining to hard drive behavior. The Bart-based Avast! antivirus scanning CD enables this, and thus fails the test, as may other Bart-based boot disk projects.

Many mOS tasks take a lot of unattended clock time to run, starting with RAM testing, then hard drive surface testing, then virus scanning or searches for data to recover. If anything should cause the system to reset (remember, this is a sick PC being maintained) then it will fall through to boot the hard drive, thus running ?infected code in ?bad RAM that writes to an at-risk hard drive and file system. Disaster!

Even if you have tested RAM, hard drive etc. and now consider the hardware to be trustworthy, an unexpected reset will usually dispell that trust. The only safe thing for a mOS boot disk to do under such circumstances, is to stop and wait for a keypress (with no time-out fall-through).

It's tempting to have a mOS disk boot straight into a RAM check, as that's generally what one should do after unexpected lockups or resets, but that can make it easy to miss spontaneous resets during an overnight RAM test. You'd wake up, see the test still running and no errors found, but for all you know it may have reset and restarted the test a dozen times.

Testing RAM

At the time one tests RAM and perhaps core motherboard and processor logic, one can assume nothing to be safe. So the mOS and the programs you run from it should not write to the hard drive, or even read it (as a bad-RAM bit-flip can change a "read disk" to a "write disk").

I haven't figured out how to integrate RAM testers such as MemTest86, MemTest86+ , SIMMTester etc. into the same CDR as Bart, so I use a separate CDR for this. I then remove the CDR after it's booted and swap it for another that will boot but not access hard drive, such as a different RAM tester or a DOS boot CDR.

I'd love a RAM tester that showed system temperatures, but I haven't seen one that does.

Hardware compatibility

One would prefer a mOS that works on any hardware without having to have "special" drivers added to it, and Bart generally passes this test, unless oddball add-on hard drive cards or RAID are in use. Even S-ATA hard drives on the current i945 chipsets will work from Bart.

Bart will detect USB storage devices at boot time, but won't detect changes to these thereafter. So you'd have to insert a USB stick before boot, and not pull it out, swap it, add others, or add the same one back after changing the contents elsewhere. However, Bart treats card reader devices as containing removable "disks", so you can add and swap SD cards etc. quite happily. For this and other reasons, I generally use SD cards instead of USB sticks.

You cannot remove the Bart disk during a Bart session, and that means no burning to CDRs from most PCs.

Memory management

A mOS has to take no risks that are not initiated by the user, and on a sick PC, everything is a risk until testing and management re-establishes it as safe.

So a mOS should not make assumptions about the hard drive contents; automatically access, "grope" material or run code from the hard drive, or commence networking. That also means not using the hard drive for swapping to virtual memory or temp file workspace - and that makes memory management a challenge, especially when some of the available RAM is already used as a RAM drive.

A standard Bart CDR will create a small RAM drive and locate Temp files there, and will prompt before commencing networking. I've modified mine to leave networking inactive, and added on-demand facilities to change RAM drive size, relocate Temp location, create a page file on a selected hard drive volume, and start networking if required.

My usual SOP is then to divert Temp to a newly-created location on the hard drive, once I've tested the physical hard drive and logical file system. If RAM is low, I shrink the RAM disk and create a page file on the hard drive, before starting programs that will need Temp workspace (e.g. anti-malware scanners that extract archives to scan the contents).

Testing hard drive

The usual advice is to use hard drive vendors' tools, or ChkDsk /R. Neither are really acceptable, but for different reasons.

Hard drive vendor tools tend to display a summary S.M.A.R.T. report, which can be "OK" even when S.M.A.R.T. detail shows multiple failed sectors have been detected and "fixed". The surface scan may be useful, as long as it doesn't "fix" anything. Then there may be "deeper" tests that are data-destructive, such as "write zeros to disk" or a pseudo-"low level format".

ChkDsk /R is unacceptable because it's orientated to "fixing" things without prompting you for permission. First it tests the file system logic and "fixes" it, so that when it tests the surface of the disk, it can "fix" bad clusters by re-writing the contentrs elsewhere in the file system. All of which is unacceptably destructive if you'd rather have recovered data first.

Instead of these, I use HD Tune for Windows, which will run from Bart CDR just fine. It ignores the contents of the hard drive entirely, reports S.M.A.R.T. detail that is updated in real time even during other tests, can test hard drives over USB and memory cards (neither will show S.M.A.R.T.), and displays the hard drive's operating temperature (again, updated in real time) no matter which test is currently in progress.

Testing file system and data recovery

I haven't any good tools for NTFS, alas, so I use ChkDsk without any parameters that would cause it to "fix" anything. If the file system is FATxx and hard drive is < 137G, I prefer to use DOS mode Scandisk, as that allows interactive repair, and DiskEdit for when I'd rather do such repairs manually.

If data is to be recovered, I have a few semi-automatic tools in my Bart that are sometimes effective - but before using them, I prefer to copy off files and do a BING image backup of any NT-family partition that is to remain bootable.

I usually keep core user data on a 2G FAT16 volume, so if that requires data recovery, it's small enough to peel off as raw CDR-sized slabs using DiskEdit. I can then reformat the stricken data volume and get the PC back into the field, while I operate on the volume as pasted onto a different and working hard drive. FAT16's large data clusters mean any files that can fit in a single cluster, can be recovered intact even if the FATs are trashed.

Malware management

A mOS will often have to work on infected systems, so it must never run code from them unless the user explicitly initiates this. That requirement goes beyond not booting from the hard drive, to not including the hard drive in the Path, and not handling material on the hard drive in a "rich" enough way to expose exploitable surfaces.

A mOS should not "grope" the hard drive for other reasons, e.g. in case some of the material includes bad sectors that would bog the mOS down in retry loops, or cause it to crash on deranged file system logic. When your file manager of choice lists files, you want no cratching in file content for icons or metatdata.

Standard Bart is safe in this regard. There's no "desktop" in the hard drive file system sense, and the file managers that are included do not grope metadata when they "list" files. However, many Bart projects use XPE or similar to improve the UI by using Explorer.exe as the shell; I prefer not to do this, because doing so may expose exploitable surfaces.

A mOS should perform no automatic disk access - thus no indexing service, no System Restore, no resident antivirus and no thumbnailling.

Many malware scanners and integration checkers require registry access, and that is complicated when you have booted from a different OS installation. If simply used as-is, these tools would report results based on the Bart CDR's registry, not the one on the hard drive.

The solution for Bart is the RunScanner plugin. This redirects registry access to the hard drive installation for the tool that is run through it, but not child processes that this tool may launch. There are parameters to specify which hives to use, and to delay the swich from Bart to hard drive hives so that the tool can initialize itself according to the former before use on the latter.

Any tests that rely on run-time behavior (such as LSPFix, some driver and service managers, and most rootkit scanners) will not return meaningful results during a mOS session (unless you wish to test the behavior of the mOS). In particular, drivers and services may list a a mixture of "live" and registry-derived results, thus blending these from the mOS and hard drive. Interpret such results with care.

Any changes you make from mOS will not be monitored by the hard drive installation. This is generally desirable, as it prevents malware intervention, or Windows itself updating registry references so that malware may remain integrated. But it also means no System Restore undoability, and the quarantine material from various scanners may be lost, and/or not work when attempts are made to restore these later.

For this reason, I usually scan to kill when dealing with intrafile code infectors and other hard-core malware, but scan to detect only, when it comes to commercial malware that I expect to pose more problems due to botched removal than malicious persistence. I defer clean-up of those to a later Safe Mode Cmd Only boot, so that undoability is maintained.

When it comes to rootkits, these are exposed to normal scanning just like any other inert file. Tools that aim to detect rootkit behavior will not have any such behavior to detect, unless the mOS has triggered the malware into action. It can also help to save integration checks (such as HiJackThis or Nirsoft utility logs) as redirected by RunScanner and compare these with logs saved from Safe Mode or normal Windows. Unexplained differences may suggest rootkit activity during your "Safe" or normal Windows sessions, unless the mOS tests were done based on the mOS's registry rather than the hard drive's hives.

Beyond the mOS session

A mOS disk can be useful even when not being used as a mOS. For example, it can Autorun to provide tools for use from Windows, be used as storage space for updates and installables, and can operate as a diskette builder for tasks the mOS cannot do from itself.

As an example of the last, my own Bart CDR can spawn bootable diskettes for BING, RAM testers, and various DOS boot disks containing various tools. The DOS boot diskettes can then access the Bart CDR and thus extend the range of available tools via an appropriate Path.

I also set up my Bart so that I can test the menu UI against the output build, even before it is committed to disk, and the installation of some tools can double up to be run from both the host system and from Bart CDRs built from it. This is accomplished mainly by careful use of base-relative paths within the nu2menu (the native shell for standard Bart) and batch file logic.

I've found nu2menu to be useful in its own right, and use a stand-alone menu to manage the entire Bart-building process - updating the scanners, selecting wallpaper and UI button graphics, editing and testing the nu2menus, accessing Bart forums and plugin documentation, and building the CDRs themselves.

10 September 2006

"...but you're Not a Programmer"

Never trust a programmer who says something can't be done (so don't worry about it)...

When programmers say something can't be done, they mean they can't see a way to do it - and after all, they made the code, so surely they would know, right?

When an interested non-programmer asks themselves if something can be done, they work from a higher level of abstraction, disregarding the details of how it might be done.

The programmer's views are informed by the intended behaviour of what they made, and may be blind to the full range of possible behaviors.

Look at the track record of exploitability that results from design safety failure; the MS Office macro malware generation, the email script generation, malware like Melissa that scripts Outlook to send itself out, and so on.

The stupidity/perfidity question (see previous blog entry, it's not Googleable yet) arises at this point, but either way, the result is the same; trust in these programmers may be misplaced. Either they weren't aware of the implications of what they created, and are thus liklely to fail the lower levels of the Trust Stack, or they have a hidden agenda that fails the upper levels of that stack.

Either way, I wouldn't stop worrying because they tell me to.

Mistake or Malice?

I was going to call this "Perfidity or Stupidity", until I saw the lean number of Google hits for "perfidity", and that Chambers Dictionary can't find the word. In any case, it may be better to avoid the perjurative aspects of "stupidity" :-)

We perform the Turing Test every day (and often lose) whenever we have to consider whether material is from a human (e.g. email from a user) or bot (e.g. email form a user's infected computer). This is a generalized identity/category test, similar to "is this my bank's site, or is it a phishing site?"

When we find something that sucks (or is downright dangerous) we also ask ourselves; were they stupid and did this by accident, or are they perfidious and did this to further a hidden and possibly malicious agenda?

This question runs as a vertical slash through the Trust Stack. Things that would be errors in the lower levels of the stack if there by mistake, would in fact be a failure in the top levels of the stack if they were there intentionally. This applies particularly at the safety and desgn layer of the stack, which may be where most exploitability occurs.

7 September 2006

DRM Revocation List

DRM is inherently user-hostile, acting against user interests under a cover of stealth and mystery. As such, I'd classify it as commercial malware. It's politically significant as by design, it facilitates control over users' digital resources to be exercised by global agencies. So there are problems at the top of the "trust stack" - but this post isn't about that.

One of the features of DRM is the revocation list concept. This is a list of applications approved to work with DRM-protected material, and the EUL"A" will typically allow this list to be updated in real time as the list's originators see fit.

The idea is that if media playing device X was cracked to subvert DRM protection of content Y, then the ability to use device X would be revoked.

For now, I'll leave aside the obvious questions, such as:
  • Who controls the list?
  • Who else controls the list, i.e. as associates or legally-mandated?
  • Who else controls the list, by hacking into the list updates?
  • How well-bounded is the list mechanism to what it is supposed to do?
  • Who is accepted as a DRM content provider, and on what basis?
Instead, let's consider a possible obverse of DRM revocation - as a way to disarm media providers found guilty of exploiting users.

For example, if a media provider was caught dropping open-use rootkits from "audio CDs" (hey, that would never happen, right?) one possible remedy would be to revoke all of that provider's rights over all of their material. In essence, such a provider would be found unfit to exert any sort of control over any users, and be swept off the DRM playing field.

Obviously, this would materially reduce the value of that media provider to the artists who are contracted to it - in effect, the penalty undermines the contract with the artist, because the provider can no longer protect the artist's content. So for a certain period (say, a year) the artist has the right to drop their obligations to the provider and seek a new contract elsewhere. The reverse right does not apply, i.e. the provider cannot drop the artist if the artist chooses to stay.

Further, it has to be accepted that all existing protected material from that provider is now unprotected - so for a similar period, artists can sue the provider for damages, either as a class group or outside of any class action.

If this would seem to tip the scales to the extent that the media provider's business would be smashed, then fine. After all, were the provider to be an individual caught "hacking", they'd likely lose their livelyhood and do jail time - why should larger-scale criminals get off more leniently? Do we really want to leave known exploiters in the provider pool?

I'll bet there's no plans in place to use DRM revocation lists to defend users' rights in this manner, even though it's technically feasable. That speaks volumes on why one should IMO reject this level of real-time DRM intrusion. On the other hand, once you open up DRM revocation for broader use, why not use it to apply global government censorship, etc.? After all, there's nothing to limit it within the borders of any particular jurisdiction.