An assertion for Christmas

Discussions on the future of databases, computing , the meaning of life ........

Moderators: Phil Winkler, Graham Smith, Pete Tabord

An assertion for Christmas

Postby Pete Tabord » Fri Dec 26, 2014 10:39 pm

"There is no such things as unstructured data"

Discuss

My view:

All data has a structure. The structure may be unsuitable for the job you are trying to do (for example, reams of English prose that you are trying to extract names and addresses from) but there must always be some structure.

Input that had no structure whatsoever would not be capable of analysis - indeed, it would be the instantiation of 'Garbage In'....

Those who talk about processing unstructured data are actually talking about processing inconveniently structured data.
Peter J. Tabord
Head of Development
Database Software Ltd.
ptabord@ffenics.com
Pete Tabord
 
Posts: 1881
Joined: Fri Sep 07, 2007 12:48 pm
Location: Caernarfon, Gwynedd, UK
Has thanked: 0 time
Been thanked: 3 times
 

Re: An assertion for Christmas

Postby Graham Smith » Wed Dec 31, 2014 6:10 pm

Well, let me say this about that. I have often found that, with discussions like this, the fundamental issue is not one of fact but one of meaning. For example, what does 'structure' mean in this context?

Given the data set, "cat", "automobile", "sand", "Higgs boson"; it is extremely difficult to see any structure unless you define structure so broadly as to mean anything you want it to be.

I could, for example, say that the structure is that these are all tangible objects, providing of course that you accept that the Higgs boson actually does exist and can be measured.

Perhaps the best that I can do is to say that these are all words that can be found in a dictionary, assuming your dictionary is up to date and includes theoretical physics trivia.

The are certainly four English words (Higgs boson is really a phrase where boson is an elementary particle and Higgs is the fellow who postulated it's existance), so perhaps that is the best we can do when it comes to structure.

But for all intents and purposes, if there is any structure there, it's about as meaning less as any four random words can be.

Or am I missing something?
Graham Smith
DataSmith, Delaware
"For every expert there is an equal and opposite expert.", Arthur C. Clarke (1917 - 2008)
"X-Clacks-Overhead: GNU Terry Pratchett"
User avatar
Graham Smith
 
Posts: 2501
Joined: Fri Sep 07, 2007 11:31 am
Location: Delaware, USA
Has thanked: 0 time
Been thanked: 1 time
 

Re: An assertion for Christmas

Postby Pete Tabord » Thu Jan 08, 2015 9:49 am

There is lot of structure there and you've actually identified one of the most important - the words are English, and thus we can easily search on them .

This is not true in all languages, Welsh being an example - it mutates both the beginnings and endings of words so that it is very difficult to define a search that will find all occurrences of the Welsh for 'big'. We are not talking about different words meaning the same thing, we are talking about the same word that can be spelt in a number of different ways depending on context.

Further, because the words are in a known language, we know each word only has a limited number of meanings. This would not be true of letters generated randomly.

I accept it is a glass half full argument, but completely unstructured data would be useless to anyone - it would contain no meaning. So we always have something we can start with and build on. We usually have more than just the words alone, we usually know what document they are from, when they were created, who by, etc. etc. . All this is structure and can be used.
Peter J. Tabord
Head of Development
Database Software Ltd.
ptabord@ffenics.com
Pete Tabord
 
Posts: 1881
Joined: Fri Sep 07, 2007 12:48 pm
Location: Caernarfon, Gwynedd, UK
Has thanked: 0 time
Been thanked: 3 times
 

Re: An assertion for Christmas

Postby Graham Smith » Thu Jan 08, 2015 7:31 pm

I guess that this probably says it best:
http://arxiv.org/abs/0909.4061
:mrgreen:
Graham Smith
DataSmith, Delaware
"For every expert there is an equal and opposite expert.", Arthur C. Clarke (1917 - 2008)
"X-Clacks-Overhead: GNU Terry Pratchett"
User avatar
Graham Smith
 
Posts: 2501
Joined: Fri Sep 07, 2007 11:31 am
Location: Delaware, USA
Has thanked: 0 time
Been thanked: 1 time
 
 

Return to Philosophical Discussions

Who is online

Users browsing this forum: No registered users and 1 guest

cron