Data mining is a hype. As a result everything is called data mining. I suppose reading a newspaper to find some interesting information is called “data mining” by some people too.
However there is only one problem : not everything IS data mining.
To clear this mess a bit, in what follows I list and explain several activities that are sometimes (mistakenly) called “data mining”.
“the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing” (wikipedia)
“Data extraction software can enable agencies to collect data on the race, gender, and ethnicity for the person(s) owning the majority of rights, equity, or interest in a business.” (Mozenda)
My definition is simple : you get the data from somewhere with some data extraction program. What you do afterwards with that data is not relevant.
Is making a report : “A Report is a piece of information describing, or an account of certain events given or presented to someone“. (wikipedia)
“Reporting is just a genre of writing, alongside essays and stories, and blogggers most certainly fall into that genre. Imho, when they talk about reporting on a show like Frontline, they mean the process a reporter goes through.” (Scripting.com)
This seems a bit more complicated than data extraction. I would say : “extracting from whatever sources of data/information those pieces of information that are sufficiently important an structuring/presenting them to be communicated to your audience, customers, boss or whatever other party”.
My defition: reporting is not showing raw data, but some communicable description. This can be in the form of tables, charts, structured drawings, or simply words.
Statistics
” statistics is … a distinct mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data . ” (wikipedia)
“methods to collect, analyze and interpret data” (Nebraska university)
“collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting and then drawing conclusions” (Akila)
Is a very broad definition, and it has obviously a lot to do with data.
For me, a part from “data”, the words that are most important here are “science”, “methods”, “interpretation”. Statistics is not just extracting data or reporting, no, here we have to do better.
Hence my definition : we use some mathematical method(s) to extract the right data, to interpret the data, to draw conclusions based on mathematics and to present these results/conclusions.
Data mining
This is the most difficult one, and most misunderstood.
Some definitions:
“the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with <a title=”Database management” href=”http://en.wikipedia.org/wiki/Database_management”>database management.” (wikipedia)
“the process of analyzing data from different perspectives and summarizing it into useful information” (UCLAAnderson)
I love this article, I am an Econometrician and I am sick of hearing how easy it is to data mine these days. Without technical knowledge of data all you are doing is playing with tools, you will not be able to determinate any usable results. Thank you for writing this and if you have a minute check me out on Twitter, I am @data_nerd (Carla)
By: Carla Gentry CSPO on July 11, 2011
at 5:34 pm
Carla,
Thanks for visiting my blog. I’m glad you like this post.
And, yes, from now on I follow your tweets.
Zyxo
By: zyxo on July 11, 2011
at 7:40 pm
This is the very best definition for the Data-mining, that everybody can understand.
By: wingara on August 12, 2011
at 2:35 pm
I like your definition of Data mining, i really understand what data mining is now. how can i reference you using your definition
By: Angel Gboraloo on November 9, 2015
at 10:53 am
Simple: just use it and include the link to my blog.
Glad you like it!
Zyxo
By: zyxo on November 20, 2015
at 8:37 pm