Posted by: zyxo | April 10, 2012

The most important decision in data mining


When you are up to a data mining project, you have a lot of decisions to make:

Which data mining software will you use?

Which algorithms software will you use?

Which hardware will you use?

Which data sources will you use?

What will be the size of your hold-out samples?

Will you calculate derived variables? Which ones?

How will you measure the quality of  your model(s)?

How will you deploy your model(s)?

Did you notice? I forgot one. I forgot the most important one.

If you are a data miner, you should know already : The target.

What ?

THE TARGET !

Yes, but is that not something that they order you to predict? Is it YOUR decision?

It is a fact that a prediction model of the right target is much better than a good prediction model of the wrong or suboptimal target.

Let me give some examples of decisions to make:

  • Are you going to calculate the probability to buy, or are you going to calculate the probability to buy online?
  • Are you going to calculate the probability to buy beer, or are you going to calculate the probability to buy a particular kind of beer?
  • Are you going to calculate the probability to buy product XYZ, or are you going to calculate the probability to buy at least nnn items of product XYZ?
  • Are you going to calculate the probability to buy product XYZ, or are you going to calculate the probability to buy product XYZ after a purchase of product ABC?

Can you think of other examples?  Let me know and I will gladly add them to the list.

Admit that it is not the task of your marketeer to decide about these things.  You should decide it, together with him.  Be very aware of the fact that these decisions are more important for the commercial result of your marketing campaigns than your choice of the best algorithm!

Enhanced by Zemanta

Responses

  1. […] The most important decision in data mining […]


Leave a comment

Categories