In some of my previous posts I referred to data mining algorithms based on ant behaviour : Ant mining. However, these algorithms only follow a limited set of ant behaviour. For example in the Antminer software of Martens and his coworkers at the university of Leuven each step an ant can take immediatly leads to a new rule, whereas in the real ant world, an ant has to take a lot of steps before it finds some food. Other software looks more like locust mining than ant mining (see my post “ants or locusts” in the data mining category).
Here I want to discribe how I see an antmining algorithm. I do not know if it is effecient in terms of performance, but I am sure I must be efficient in terms of finding qualitatively good algorithms.
To start, you need an ant heap : the nest. let’s put in in the center of a 19*19 grid. Then there is the food : scattered around randomly on the grid, let say 20 rules. Each and every ant starts from the nest and begins a random walk: it randomly choses one of the 8 squares around the nest (later on this randomness will be influenced by pheromone levels). But time goes by for every ant at the same time, so in turn each ant chooses a square. Here comes into play one of the parameters : howmany ants can be on the same square at the same time ?
When each ant has taken his step, the “food level”of the heap, in our data mining algorithm the rule effectiveness, must be evaluated. Until now, there is no rule yet, so the random rule still plays. Let us call this each-ant-step-plus-rule-evaluation the “heap step”, which also means one unit of time.
In the second heap step each ant has to decide : return to the nest or look further for food. He wants only to return to the nest when he has found some food. In our algorithm this ‘food’ is a selection rule : a logical rule of a variable having a particular value (this is simple for categorical variables, but continuous variables have to be binned and then fuzzified). And not only a selection rule, but one that improves the model. Did I say that each ant, when it leaves the nest ‘knows’ the food level of the nest: the selection rule that was in place when he left. So in order to know if the rule he found is beneficial, the ant has to perform a local evaluation: is the new rule an enhancement to the ‘nest rule’ he remembers ?
And so we go on and on.
And now the pheromones : they are placed whenever an ant is walking from one square to another. These pheromones influence the random choice of next squares by giving a greater probability to squares with higher pheromone levels. The influence level is a parameter to be set at the beginning and is a characteristic of the ant heap.
In short :
at each ‘heap step’ each ant takes one step. Before the ant step, the ant evaluates the rule he has found (or no rule) and searches further or returns to the nest. At each heap step also the heap rule is evaluated. When an ant returns to the nest with a rule, in the heap rule evaluation it is accepted or rejected (added or not to the heap rule).
So far (and very short) for an ant heap. Now we all know that the ant world consists of many ant heaps en natural selection works on the heap level. This means that for a full-blown antmining algorithm we must have a lot of ant heaps and select the best ones, let them reproduce (duplicate, mutate (=slight changes to the parameters)) and start all over again.
Must be fun to develop and test for a graduate student. Unfortunately I have to be productive doing datamining, so I do not have the time to try this out. If you do : let me know how it works !
Enjoyed this post ? Then you might also be interested in the following :
Ant Colony Optimisation : ants or locusts ?
Ant Colony Optimisation : list of source codes
Swarm Intelligence in Data Mining : Studies in Computational Intelligence
Swarm versus intelligence
Recent Comments