Why decision trees is the best data mining algorithm

Posted by: zyxo | September 17, 2010

Why decision trees is the best data mining algorithm

Data miners who visited my blog in the past, already know that I like decision trees . They are without any doubt my favorite data mining tool.

Want to know why ? Because it is simply the best data mining algorithm.

For a number of reasons :

Decision trees are white boxes = means they generate imple, understandable rules. You can look into the trees, clearly understand each an every split, see the impact of that split and even compare it to alternative splits.
Decision trees are non-parametric = means no specific data distribution is necessary. Decision trees easily handle continuous and categorical variables.
Decision trees handle missing values as easily as any normal value of the variable
In decision trees elegant tweaking is possible. You can chose to set the dept of the trees, the minimum number of observations needed for a split, or for a leave, the number of leaves per split (in case of multilevel target variables). And many more.
Decision trees is one of the best independent variable selection algorithms. If you really want to make a model with logistic (or linear) regressions or with neural networks, but first you want to reduce the number of variables by selecting only the relevant ones : use decision trees. They are fast, and, unlike calculating simple correlations with the target variable, they also take into account the interactions between variables .
Decision trees are weak learners. At first sight this rather seems to be a disadvantage, but NO ! Weak learners are great when you want to use lots of them in ensembles, because ensembles, like bagging, boosting, random forests, treenets become very powerful algorithms when the individual models are weak learners,.
Decision trees identifies subgroups. Each terminal or intermediate leave in a decision tree can be seen as a subgroup/segment of your population.
Decision trees run fast even with lots of observations and variables
Decision trees can be used for supervised AND unsupervised learning. Yes, even with the fact that a decision tree is per definition a supervised learning algorithm where you need a target variable, they can be used for unsupervised learning, like clustering. For this, see one of my previous posts.
Decision trees are simple. I mean : it is a simple algorithm. No complicated mathematics needed to understand how they work.
Decision trees deliver high quality models, are able to squeeze pretty much all information out of the data, especially if you use them in ensembles.
Decision trees can easily handle unbalanced datasets. If you have 0.1 % of positive targets and 99.9% of negative ones : no problem for decision trees ! (see one of my previous posts)

Reasons enough ? Do you know other algorithms with such beautiful characteristics ?

Please do let me know !

Posted in Data mining | Tags: analytics, Data, Data mining, Decision tree, Knowledge Creation, Knowledge Management, Problem Solving, Supervised learning, Unsupervised learning

Responses

[…] Why decision trees is the best data mining algorithm « Mixotricha RT @eicg: “@zyxo: The best data mining algorithm ever : decision trees #datamining #decisiontrees " […]
By: pinboard September 20, 2010 — arghh.net on September 20, 2010
at 5:40 pm

Reply
good.
got another question for you – what decision tree algoritm is the best one? :o)
By: drinkredwine on June 16, 2011
at 8:24 pm

Reply
[…] https://zyxo.wordpress.com/2010/09/17/why-decision-trees-is-the-best-data-mining-algorithm/ […]
By: Decision trees – Ideasinplain on April 15, 2017
at 4:40 pm

Reply

	zyxo on List of animal species with 46…
	Sam on List of animal species with 46…
	John on List of animal species with 46…
	How to get rid of bi… on Oversampling or undersampling…
	zyxo on Soccer rules are unfair and…

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Mixotricha

Why decision trees is the best data mining algorithm

Responses

Leave a comment Cancel reply

Categories

RSS feeds

Categories

Recent Posts

Top Posts

Authors suggestions

Recent Comments

Blogroll

My recent tweets

Pages

Older posts

Mixotricha

Why decision trees is the best data mining algorithm

Share this:

Related

Responses

Leave a comment Cancel reply

Categories

RSS feeds

Categories

Recent Posts

Top Posts

Authors suggestions

Recent Comments

Blogroll

My recent tweets

Pages

Older posts