In some of my previous posts I referred to data mining algorithms based on ant behaviour : Ant mining. However, these algorithms only follow a limited set of ant behaviour. For example in the Antminer software of Martens and his coworkers at the university of Leuven each step an ant can take immediatly leads to a new rule, whereas in the real ant world, an ant has to take a lot of steps before it finds some food. Other software looks more like locust mining than ant mining (see my post “ants or locusts” in the data mining category).

Here I want to discribe how I see an antmining algorithm. I do not know if it is effecient in terms of performance, but I am sure I must be efficient in terms of finding qualitatively good algorithms.

To start, you need an ant heap : the nest. let’s put in in the center of a 19*19 grid. Then there is the food : scattered around randomly on the grid, let say 20 rules. Each and every ant starts from the nest and begins a random walk: it randomly choses one of the 8 squares around the nest (later on this randomness will be influenced by pheromone levels). But time goes by for every ant at the same time, so in turn each ant chooses a square. Here comes into play one of the parameters : howmany ants can be on the same square at the same time ?

When each ant has taken his step, the “food level”of the heap, in our data mining algorithm the rule effectiveness, must be evaluated. Until now, there is no rule yet, so the random rule still plays. Let us call this each-ant-step-plus-rule-evaluation the “heap step”, which also means one unit of time.

In the second heap step each ant has to decide : return to the nest or look further for food. He wants only to return to the nest when he has found some food. In our algorithm this ‘food’ is a selection rule : a logical rule of a variable having a particular value (this is simple for categorical variables, but continuous variables have to be binned and then fuzzified). And not only a selection rule, but one that improves the model. Did I say that each ant, when it leaves the nest ‘knows’ the food level of the nest: the selection rule that was in place when he left. So in order to know if the rule he found is beneficial, the ant has to perform a local evaluation: is the new rule an enhancement to the ‘nest rule’ he remembers ?

And so we go on and on.

And now the pheromones : they are placed whenever an ant is walking from one square to another. These pheromones influence the random choice of next squares by giving a greater probability to squares with higher pheromone levels. The influence level is a parameter to be set at the beginning and is a characteristic of the ant heap.

In short :

at each ‘heap step’ each ant takes one step. Before the ant step, the ant evaluates the rule he has found (or no rule) and searches further or returns to the nest. At each heap step also the heap rule is evaluated. When an ant returns to the nest with a rule, in the heap rule evaluation it is accepted or rejected (added or not to the heap rule).

So far (and very short) for an ant heap. Now we all know that the ant world consists of many ant heaps en natural selection works on the heap level. This means that for a full-blown antmining algorithm we must have a lot of ant heaps and select the best ones, let them reproduce (duplicate, mutate (=slight changes to the parameters)) and start all over again.

Must be fun to develop and test for a graduate student. Unfortunately I have to be productive doing datamining, so I do not have the time to try this out. If you do : let me know how it works !

*Enjoyed this post ? Then you might also be interested in the following :*

Ant Colony Optimisation : ants or locusts ?

Ant Colony Optimisation : list of source codes

Swarm Intelligence in Data Mining : Studies in Computational Intelligence

Swarm versus intelligence

I might have the opportunity to try something like this. I’m on a team for the Netflix Prize, and it seems that the best strategy is to get a ton of different classifiers and then blend them together. So we could do a few runs of an algorithm like this to build a classifier and blend those in along with our other stuff…

I’ll let you know if it works!

By:

chthenoson March 30, 2008at 12:32 pm

Chthenos,

About this algoritm : I have a worry about the ‘heap’ rule. I do not think it would be simple to combine the different ant rules into one rule. Each ant rule is a classifier on itself, a bit like a decision tree, but then a lot less greedy. I would be simple to calculate the scores for each ant rule and average the scores to calculate the ‘heap’ score. Sort of bagging.

Anyway : I really look forward to the results. And my your team win the Netflix prize !

By:

zyxoon March 30, 2008at 6:53 pm

Hm, I thought you could just use some kind of boosting type method?

By:

chthenoson March 30, 2008at 11:14 pm