However there is only one problem : not everything IS data mining.
To clear this mess a bit, in what follows I list and explain several activities that are sometimes (mistakenly) called “data mining”.
“Data extraction software can enable agencies to collect data on the race, gender, and ethnicity for the person(s) owning the majority of rights, equity, or interest in a business.” (Mozenda)
My definition is simple : you get the data from somewhere with some data extraction program. What you do afterwards with that data is not relevant.
Is making a report : “A Report is a piece of information describing, or an account of certain events given or presented to someone“. (wikipedia)
“Reporting is just a genre of writing, alongside essays and stories, and blogggers most certainly fall into that genre. Imho, when they talk about reporting on a show like Frontline, they mean the process a reporter goes through.” (Scripting.com)
This seems a bit more complicated than data extraction. I would say : “extracting from whatever sources of data/information those pieces of information that are sufficiently important an structuring/presenting them to be communicated to your audience, customers, boss or whatever other party”.
My defition: reporting is not showing raw data, but some communicable description. This can be in the form of tables, charts, structured drawings, or simply words.
“methods to collect, analyze and interpret data” (Nebraska university)
Is a very broad definition, and it has obviously a lot to do with data.
For me, a part from “data”, the words that are most important here are “science”, “methods”, “interpretation”. Statistics is not just extracting data or reporting, no, here we have to do better.
Hence my definition : we use some mathematical method(s) to extract the right data, to interpret the data, to draw conclusions based on mathematics and to present these results/conclusions.
This is the most difficult one, and most misunderstood.
“the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with <a title=”Database management” href=”http://en.wikipedia.org/wiki/Database_management”>database management.” (wikipedia)
“the process of analyzing data from different perspectives and summarizing it into useful information” (UCLAAnderson)