Apr 222011

It has become a truism to assert that we are witnessing an information explosion; that we suffer info-glut or information overload.  Relief apparently can be had from either filtering or summarising all this information or storing it in an accessible form for later use. I see the problem as one of a data deluge and an inability to allow ourselves to be informed by it.

The other night, I gave a presentation to the:  Institute for Information Management (IIM), Sydney Branch. My talk was part of a set of two speaker talks collectively billed as:   “Getting Up To Speed with Text and Data Analytics”

The talk gave me an opportunity to view information not as a ‘stuff’ to be summarised or filtered but an event to be unleashed.   In giving the talk and in subsequent and lively discussion, a number of insights emerged which I think I can share, and in so doing -  clarify.

Historical Parallel – the abacists vs the algorists

The image that came to me after the discussion was from an earlier time, in Renaissance Italy, when battles raged over two kinds of  ways of dealing with the recently developed financial data (double entry book-keeping being one of the great inventions of the times).  The protagonists of this battle were between two types of ‘reckoners’: the abacists, who used the abacus, and the algorists who used the  Arabic mathematics called  ‘algebra’.  Algebra  caused much consternation at the time – it was so different – even allowing ‘nothing’ (in the form of zero) to exist!   The woodcut above depicts one of these competitions with the ‘spirit of arithmetic’ overseeing matters.

At it turned out, the algorists won the day, even if initially their reckoning took more time. It turned out that abacus-based calculations of compound interest on loans produced under-estimates of correct interest amounts so the merchants took to the new mathematics, even if Universities at the time lagged behind!  I came across a cute anecdote of a German merchant pondering, back in the 1400s, on the future of his son and his fellow merchants recommending that if he wants his son only to be able to add and subtract, then German universities would do, but if he wanted his son to use multiplication and division, better he go to the merchants of Venice!

The deeper reason for the success of mathematics over method lies in the generality of the former and the false confidence ‘methods’ create: you follow the rules right, so the results must be right, no analysis, intuition, or wider views are required.

The cockpit of illusions

Now we have, again, a new kind of abacist, the ‘information professional’ who yearns to deliver information on-time, any-time, for whosoever with, of course,  the right platform – ‘abacus’. Information behaves like water – apparently to be filtered, pumped around, available ‘on tap’ and ready to quench the thirst for insight and oversight.

Current Business Intelligence promise ‘cockpit’ views and dashboard plasma screens of entire enterprises,  where managers can be made to feel ‘in-the-know’ as they survey the panorama of orchestrated business processes without the smell of sweat, treadmill of toils, or the sound of alienated voices, emerging from ‘below’.

Many an information professional has been lured into the view of being a handmaiden to such data-based chimera, made all the more alluring by virtue of the new breed of user-friendly data-analytic and visualisation tools.  Apparently  the user can  point and click their way to deep data derived discoveries! If only data were so ….. serviceable!

The ascent of records management

By good fortune, the people at this talk were not this ilk of ‘information professional’.   They displayed the kind of patience that comes from operating with gritty, even grotty,  information resources that need to be cleansed, marshaled and then accessibly housed.  The shift in thinking I wanted to explore lay in admitting to a data deluge, not an information overload with the ramification being we still find it inordinately difficult to release all the information contained in data as a proliferation of self-propagating events of ‘informing’.

Core concept: Information as event, not substance

Information, as a noun,  I think makes sense to see as describing a rapid change,  like ‘explosion’.  It refers to an event, not a stuff.  Just as explosions dramatically alter physical form, so information dramatically alters semantic form, that is, what we believe, perhaps without even thinking.

The verb form of information (to inform) captures this nicely: the difference between telling someone something and informing them of something lies precisely in the assumption that in the latter case, the person communicating takes the effort to check the other has indeed understood what has been communicated.  An informed person in thus ‘re-oriented’ to what is going on. The aim of the game might be said to be to “inform beliefs with believable information”.

Sometimes explosions just happen, just as sometimes,  situations surprise us. But the chemistry and physics of explosions has led to the creation of explosives:  these are ready-made to explode,  but only when detonated.  We need an equivalent term for information: I suggest informatives.  I use informatives to refer to carriers of information-as-informing-event. Just as an engineer might, on seeing some obstacle to an objective, deploy well placed explosives, so when we see obstacles to insights – typically in the form of confusions, misunderstandings, prejudices or opinions, we place ‘informatives’ to dissolve them, so that beliefs are better adapted to the situation. Part of the challenge here lies in showing that ‘informatives’ is not simply a fancy name for ‘fact’.

Much of what people mean about having, even too much,  information really refers to having informatives – just as the military might refer to having explosives, rather than explosions.  With some flurried pen movements over a whiteboard, I tried to depict, four distinct informatives:  Experience, Expertise, Data, and Fidelity.  The impact of the informatives ‘going off’ might be so called facts, but we need to focus on the capabilities, not their activation.  And just as explosives may pack a big or little punch or just fizzle out,  so too, these informatives may or may not inform. The informatives cane be ‘empty’:  people can get so focused on an intense experience they dwell on its intensity rather than what it is about; experts can get so full of themselves that they imagine they can pontificate on some matter without really delving in the the nuances of a situation that has posed the question in the first place;  data itself might just be noise; and finally fidelity may draw on transcendental sources making it seem answers can be deduced to any questions (eg fundamentalists).  Here’s what I was trying to draw on the whiteboard:

Data will inform to the extent it has been captured in records containing relevant fields that describe important objects within the flux and flow of events.  These days the vastness of data collections has yet to hit home.  Data are mute as they only contain nouns (objects) and adjective (fields). Putting verbs back in (ie verb-alising data) requires particular kinds of graphs and associated elaborations (para-graphs).  Experience builds on similarities between objects and events (rows of a data table). By contrast data informs expertise through differences and similarities found between fields (columns).  The more data can be modelled through analytical processes, the more it can inform expertise.  The issue of which fields matter and which objects are worth tracking and tagging comes from fidelity – abiding and collectively relevant concerns about eventualities.  Fidelity defines not simply a question, it motivates a questioning quest. It informs us of what we need to be informed about – ie our ignorance and our sense of commitment that we can get to the bottom of the issue with enough resolve, cunning and imagination. (Ironically, fidelity has too often been seen, wrongly I think, as a set of ‘answers’ – when that happens it cannot inform since it does not marshal ignorances into a common plight).

Illustration: Ambulance Data

My talk referred to a few more practical cases to illustrate the above, but in this post, I wanted to clarify (to myself at least) the wider issues for data analysis in and for organisations.

The speaker who followed me Paul Middleton, gave a superb example of the problem I sought to describe. I didn’t realise how large an operation the NSW ambulance has become.  The ambulance service collects massive amounts of data, mainly for billing purposes.  Every call out involves data collection. This same data could yield insights into many areas of health:  how different kinds of interventions (on the run) work in different settings – different pain killers, oxygen delivery, supports, ambient sound, monitoring equipment and communication with the hospital along with all of the contextual variables relating to patient demographics, locale, and injury. And Paul’s summary of analysis offered some marvel of good classical statistical designs – for instance trying two different approaches to a problem and comparing which was the more effective.

Some impressive results were shown which means more lives can be saved – for instance the use of some interventions at causualty can be applied by paramedics at the very site of the accident producing better outcomes. Hospital care can then be less intensive (expensive).  Insurance companies and governments save money.  Beyond that, more virtuous loops of support can then unfold:   Part of these savings could support more data acquisition and research. Para-medics could then not simply give out care but also take care to collect patient data.  Currently para-medics focus, naturally enough, on the patient, apparently even  jotting down data on their gloves which they transcribe later onto relevant forms once they arrive at the hospital. Some of the current value of the data in informing health practices could also fund data acquisition devices – from bar code readers, voice recorder, digital convergence of data from instruments to one storage device and so on.   All that data can illuminate far more than how much each case cost.  The data summarised in terms of co-variations can clearly inform medical expertise – that provide more nuanced causal mechanisms.  The data, displayed as a series of alterts, indicators and visualisations (of healthy functions) inform the experience of nurses and carers.  And these very para-medics, carers, in so far as they have an abiding fidelity to ‘health’ (which goes alas too often without saying) they can provide valuable insights into what else they would like to know via such indicators and their visualisation.

There are many software and data analytic issues to then operate in this way. I depicted some of them and have written about the software (R, application servers, messaging, and visualisation here) and data analysis here.

I end with thanks to all those that asked questions and special thanks to Brian Bailey for getting me to give the talk. I hope this post adds to what was discussed.

Sorry, the comment form is closed at this time.