What skils would a perfect data miner have ?
In short : Technical , Business knowledge, analytical, soft, creativity and practical
– Programming, because as we all know data mining is:
- 95% digging into data with sql, sas, C++, Python, or whatever language you can use to manipulate data and create your training, validation and test data sets.
- and only some 5% really generating models and putting them to work
– Statistics because it makes no sens to calculate for example logistic or linear regressions without having any clue as to what you are doing or what it means
– Data mining techniques, because that’s what data mining is all about : the actual calculating the treasures of information, of patterns that are hidden in the data. You have to know when it is appropriate to use which data mining technique. Should you use a decision tree or a K-means clustering ? Or why not a logistic regression ?
Because we, as data miners do not work with only numbers, but with data that have a business meaning? How could we interpret a model, detect an information leak, spot impossible results that point to some mistake in your data set if we do not know what the data mean ?
Because data miners do not just run data mining agorithms because someone tells him/her to. No as data miners our customers come to us with a problem they want to solve. We must be able to analyze the situation, find out what our customer really want (this is not always what he’s telling us), and create our way to cook a delicious solution to his problem.
Presentations, because you have to convince your superiors, colleagues that you have their models, explain what your models can do and can’t do and how they should use then in their marketing campaigns
report-writing, because like being able to give a decent presentation you should be able to write a good, clear and concise report. Not only for anyone interested in your data mining work/art, but for yourself a year and a some dozens of models later, when you want to know what the heck you have been doing some time ago to get at that particular model.
Because that’s the “art” part of data mining. You cannot stupidly apply some algorithms. You have to have the feeling of what will happen, with such or such algoritm in combination with some particular aspects of your data. You have to have some gut feeling of why you should try something else, in what direction …
Two feet on the ground. Never lose sight of your ultimate goal : Business outcomes. As Avinash Kaushik puts it : “an absolute obsession, with outcomes is mandatory”.
Some further reading :