Once a university assistant told a student to write in his masters thesis : “the result of this statistical test is not significant, but probably, if we had data on more cases, it would have been significant”. When I heard this, and also that a real statistics professor would judge the quality of the masters thesis, I strongly advised the student not to write this crap in his thesis.
This clearly illustrates one of the problems that many people have with statistical significance.
And, yes, to start with : some have no problem at all with statistical significance because they don’t care. Example : I know people who test their direct marketing campaigns with a much to small control group with only a handfull “hits”. Result : in a lot of of the cases they shout : hurray, our campaign does a lot better than our control group. In fact what they see is merely random patterns.
Say you run an email campaign with 100.000 e-mails and get a 5% conversion rate (5000 people bought your product).
Then you compare this result with a much to small control group of 100 who got no email and only 3 people (=3%) did buy your product. Conclusion : the campaign delivered two extra percents. WRONG. With a simple 2×2 table analysis You have one chance out of 4 tot get that result if your email campaign had no influence whatsoever on the buying behaviour !
And this brings us to the other problem with statistical significance. If the probability of finding a pattern just by chance is < 5% generally we find this statistically significant, which means : we conclude : this is no chance, but the observation of a real existing pattern. A lot of people forget that this threshold of probability, of statistical significance really means that in 100 tests, you will find on average 5 of them which show a “significant” pattern, although in reality there is no pattern !
So the lesson is : can you repeat the test, the treatment and still find the same pattern ?
That is why, in data mining, we should always test the model on another hold-out dataset, to verify if the conclusion still holds on new data.
Other posts you might enjoy reading :
Categorical variables : Solving the overfitting problem in decision trees
Data mining for marketing campaigns : interpretation of lift
Howmany inputs do data miners need ?
Oversampling or undersampling ?
data mining with decision trees : what they never tell you
The top-10 data mining mistakes