Chris Quirke's Blog: May 2005

Heh, this is where a non-coder (well, OK; ex-coder) sounds off on programming. File under "teaching grandma to suck eggs" if you like, unless you find yourself asking "I make what I think is pretty good software, yet my clients shout at me and go off and use something else, and I can't figure out why".

Features may make folks warm to a product, but there are two things that make them hate a product and never want to use it again:

1) It corrupts or loses data

2) It acts beyond the user's intention

Sure, "difficult to use" issues may cause folks to bounce off a product, but nothing else engenders pure hatred as the above two crises will do. Today's post goes about (1).

User Data

User data is the only component that cannot be replaced by throwing money at the problem. It's unique to the user, and should be treated with the utmost respect. User preferences and settings should be included within this umbrella of care.

If your application creates and manages user data, then you have to ensure that the full infrastructure exists to manage that data, including backup, transfer between systems, integration into other data sets, and recovery or repair.

The easiest way to do that is to use a generic data structure that already enjoys such infrastructure, to ensure the location of the data can be controlled by the user, and that the data set is free of infectable code, version-specific program material, or other bloat. This way, the user can locate the data within their existing data set and it will be backed up with that.

Data Survivability

If you have to create a proprietary binary file structure, then it's best (from a recovery perspective) to document this structure so that raw binary repair or salvage is possible. When the app handles the data, it should sanity-check the structure and fail gracefully if need be, with useful error messages. It's particularly important not to allow malformed data to overrun buffers or get other opportunities to act as code.

Large, slowly-growing files pose the largest risk for fragmentation, long critical periods during updates, and corruption. Don't stick everything in one huge file, such as a .PST, so that if this file blinks, all data is lost. It's also helpful to avoid file and folder names with the first 6 characters in common (as these generate ambiguous 8.3 names) and deeply-nested folders. Ask yourself what you'd like to see if you had to do raw disk data recovery, and be guided by that.

Data Portability

When it comes to data portability and integration, this works best if you avoid version dependencies and "special" names. For example, an email app that has "special" structure for InBox and OutBox is going to be a problem if one wants to drop these from another system, so they can be integrated into an existing data set. It should be possible to rename these so they don't overwrite what is there already, and have the application see them as ordinary extra mailboxes.

From a survivability perspective, it should be possible to manage the data from the OS, i.e. simply drop the files into place and the application will see and use them. If you fuss with closed-box indexing or data integration, then you're forced to provide your own special import and export tools, and things become very brittle if there is only one "special" code set that can manage the data.

Don't forget whose data it is - it's the users, not yours. Warn the user about consequences, but it is not your data to "own" in the sense that nothing else can touch it, and that anything outside the app that nudges the data should cause the app to be unable to use it.

Data Security

What makes for good survivability, may be bad for data security. If there's a need to keep data private, you may have to impact on data survivability to the point that you have to assume full responsibility for data management - something not to be taken lightly.

Data Safety

This is a different issue from security, and goes about limiting the behavior of data to the level of risk that is anticipated by the user. But that's another day's bloggery :-)

The inevitable first posting drought has been and is now gone, and so the immortal opening line from Blood (yes, I do think of computer games as a "literature" - but that's another topic) applies!

I needed to think about other things for a while, and so I did, though not the practical things I should have been attending to (e.g. getting paid). I had a night down south near Cape Point, where a wild genet popped into our bridge game to eat chicken with us, and that was the first 24 hours without mains electricity I've spent in several years.

And I read Paul Hoffman's "The Man Who Loved Only Numbers", and that made me think about a lot of things.

Identity

For example, there's Rudy Rucker's assertion that "information" constitutes a fourth "mind tool", up there with number/counting/measurement, space/geometry, and infinity/wholeness. The core thing I come away from Rucker with is this: If you were to accurately identify and reproduce every piece of information about some thing, would that thing be a true clone, and would it have the same consciousness? If it had consciousness, would it be the same one, or a new one?

The answers that came to mind were that for the new version to exist separately from the original, some information would have to change - i.e. it would have to be displaced from the original in space, time, or some other dimension, else it would be the original instance. And once spawned, different "life experience" will cause that instance to diverge from the common starting point, at least initially.

Is the path forward in time always one of divergence? I don't think so, but the initial ratio of sameness vs. difference leaves divergence as the only initial direction it can move in - unless there's some magic "pull" to nullify the single difference that defines this instance from the original. Intuitively, one feels that such closeness has to be weighted in one way or another, i.e. that there's attraction to "become one" or repulsion to become equidistant from the original as much as it is from everything else.

In practice, trying to identify all the information in an object is a bottomless pit. Just as an infinity is qualitatively different from very large, and behavior at the speed of light is different to what we attain as "very fast", so the behavior of near-identical instances may differ from crudely similar objects. When we work with similar things, they are as artificial as frictionless masses in physics experiments; magical objects that have no properties or content other than the information about them that we define. What is the number "10" in "real life", anyway?

Complexity

This comes to a recurring concept that underlies my deep interest in computers. We understand most things in two ways; from the "top down", i.e. observing crudely visible behavior and delving down into detail, or "bottom up" by understanding the detail and building up to complexity. This is similar to the difference between bridge and chess, and may explain why few folks are equally good at both. You can also think of this as a core axis within politics; seeing the world in terms of available resources (the cards in a bridge deal) or what is needed (the end-point goal of chess). These seem closest to "right" and "left" perspectives.

It's rare that we approach things from the bottom up, which is the perspective of the creator rather than the created. Or rather, when we create things, they rarely get complex enough to be interesting.

For example, if we build an axe out of wood and iron, most of the characteristics are those of the natural constituents, with the "value added" by our creation of the axe creating little new to study and ponder about.

There's a fundamental shift that happened when humans began to compress time. It's one thing to visualize a sequence of visible events, and reproduce these in the same time scale - e.g. from throwing a stone to building a catapult that throws a stone for you. It's another to build something that does things faster than you can observe, e.g. the way an internal combustion engine whirrs along at 5000 RPM. In essence, a computer is a device that compresses time, so that it can act as a brain proxy fast enough to "think" inside small time scales, e.g. processing sound waves as the waves curve their way along.

This stretches what we would normally consider "natural" science. The behavior of elecrons passing through a transistor gate; is this natural science, given that transistors don't occur in nature? Yes, in that it is underpinned by the same deeper laws that creating a transistor merely makes visible. Perhaps science is a matter of creating new things to improve visibility?

Computers are complex enough to be interesting. We can no longer deterministically predict what will happen based purely on the initial conditions we started with. That observation is less profound than it sounds, if you consider Turing's Halting Problem; it's just a restatement that you can't solve a heavyweight computational problem with a lightweight computer.

Or is it? Does a computer generate only the rational results of its computations, or does it escape the system as Goedel might predict? If the latter, then does the computational power plus entropic effects constitute a "stronger" computer than the original design, and therefore make it impossible for a computer to predict its own end state?

One answer may be found by asking the question: Can the state of a computer be captured, projected onto another computer, and result in identical future behavior in the two systems? If not, why not? This is similar to my post-Rucker musings mentioned above.

Well, you can express the contents of RAM or hard drive as a large integer, by considering its entire address space as a single binary number. If you capture all such numbers, i.e. RAM, hard drive, CMOS and PnP NVRAM, SVGA memory, processor and other DSP registers, RTC etc. you can claim to have captured the entire digital layer of abstraction. If you trust the underlying analog layer of abstraction to properly support the digital layer without error, then you should be off to a flying start.

The trouble is, even if you test the hardware for billions of operations to determine how well the analog layer supports the digital, your confidence in future reliability can only tend towards unity. In fact, test too long, and every system will burn out - by definition, that is "testing to destruction"! There's something almost Heisenberg about the truism that you cannot test the reliability of an object without destroying it, and perhaps it's a restatement of identity; that identity precludes two things from being identical, and still being two (different) things.

Number theory

You can think of the digital layer of abstraction as akin to integers, while the underlying analog world of voltages, wires and milliseconds is akin to real numbers. I'm about to swerve off the road at this point, so get ready to change gears...

Perhaps we don't really understand rational numbers, let alone real ones; we think of them as "integers plus fractions". They aren't; they are simply what they are. Integers are no more a special case than, say, fractions which have 455 as the divisor, except for one thing that defines rational numbers; every series based on any divisor will include all the integers.

So perhaps the rational numbers are merely one of several special-case subsets of real numbers, depending on what you want to choose. What makes some real numbers irrational is that they fail the test of including all integers. So maybe we need to "get real".

We intuitively consider real numbers with integers overlaid, and I think we may get blinkered there. In fact, concepts such as "order" and "entropy" may be nothing more than artefacts that result from this reality distortion. Conceptualizations as ugly as 10-dimensional string theory make me want to step back and look at our mind tools, starting here.

As a mind experiment, consider that integers are just arbitrary values within the real number set. If there is an infinite number of real values between one integer and the next, there may not even be any special ordinal significance to them; perhaps we can choose some other series as a way of visualizing the scaling up of successive real number values.

We've already found this helpful, in that certain things snap into focus (and elegance) only when a different scaling is used, e.g. logarithmic. We've more or less been forced to abandon the comfort of integers when considering sound (decibels), earthquakes (the Richter scale) etc.

Now look back at integers from a chaos theory perspective. As I understand it, chaos theory shakes up the notion of things always proceding from complexity to... well, that's the point; often chaos isn't the featureless grey sludge you'd expect; new complexity (order?) may arise.

At first blush, integers are all alike, differeng only in the distance between each other on the number line. But in fact, there's a wealth of complexity that develops fairly rapidly, when you consider primes, and all the other patterns and series that weave through the integers.

Then we find that certain universal constants, such as e or pi, are not only not integers, but are not even rational. Could we have been using the wrong yardstick all along? What would the universe look like if visualized on an axis scaled by the prime number series?

Order and entropy

I love the concept of entropy, but I'm beginning to wonder what it is. We usually think of entropy as that which tends to disorder, but what is order? Is a glass vase more "ordered" than a very specific collection of glass shards? Or does the order lie in specificity, and can entropy be redefined as a drift from that specificity? Is the glass vase just a particular real number that happens to be an integer, but in all real respects just as ordered as a particular collection of shards? Is that collection of shards a particular real number that happens not to be an integer?

For that matter, could the dropping of a vase to produce those specific shards be considered as a highly specialized manufacturing process - more so than the making of the vase itself, if a lower degree of specificity defines the vase as what it must be to be that vase?

I think it may be useful to look at this by asking a seemingly unrelated question: What is the relationship, if any, between information (or "order") and energy? The more computers compress time, the more energy management becomes a problem.

Perhaps order is specificity, and specificity is energy. Perhaps the universe is an infinity of the unspecified, from which things come into existence by specifying them, and that the process of specifying things is an "uphill" (counter-entropic) one that requires energy. Perhaps mass is simply the most familiar form of specificity, or existence.

Well, that's enough for one post; I'll stop here for now :-)

Chris Quirke's Blog

28 May 2005

Today's Link...

23 May 2005

Writing a Decent Application, Part 1

16 May 2005

Some netshards

09 May 2005

I Live... Again

CQspace

Here By Label

Here By Date

CQuirke