Posted by: zyxo | February 21, 2010

Data mining at a higher level

Data mining at a higher level ?
What is this higher level ? and what was the “normal” level ?

The classic commercial data mining approach, the “normal” level goes as follows :

  • take your historical data
  • indentify the customers who did something (ex. buy product xyz) and the ones who did not.
  • make a model to distinguish between the do-ers and the non-do-ers.
  • use that model to calculate for each of your identified customers the probability that they will do that something.
  • contact those with the highest probability in some marketing campaign

You can find more on this “normal” approach in my previous posts :
data mining for marketing campaigns : interpretation of lift
Mining highy imbalanced data sets with logistic regressions
Howmany inputs do data miners need ?
Classification, Prior Probabilities and Soft Metrics
data mining with decision trees : what they never tell you

But there are better ways, that need a higher level of data mining :

  • estimation of the $ purchase amount
  • net lift of purchase probability
  • net lift of $ purchase amount
  • the above combined
  • optimal pricing modeling

Estimation of the $ purchase amount

Illustration of linear regression on a data set.
Image via Wikipedia

This is a straightforward one. In stead of only modeling the probability of purchase you can also model the expected purchase amount. Probably a simple multivariate linear regression can do the trick (but there are possibilities with decision trees also, I will write about that later). When selecting the people to target in your marketing campaign you simply multiply the estimated absolute purchase probability by the estimated purchase amount. In that way you will contact the people who you expect to shop heavily in your stores.

Net lift of purchase probability

This involves a lot more effort. It is something you have to do in three big steps :

  1. First step : organize a marketing campaign where you contact (e-mail, snail mail, personalized banner) a sufficiently large random target group of your customers. Be sure to keep an equally large control group, equally randomly chosen.
  2. Second step (the modeling step) : you need to make two “normal level” models for calculating the purchase probability : for the first one, you only use the targeted people (I call it the targeted model), for the second one, you only use the non-targeted people (the control model).
  3. Third step (the scoring step) : for each customer in your database you calculate the purchase probability twice : once with each of the two new models. Since the models are different, the two calculated probabilities for each single customer will differ. What you need is the customers where the calculated probability from the targeted model is a lot higher than the calculated probability from the control model. It means that as a result of the e-mail or the banner the purchasing probability went up. What you really model this way is the campaign effect. It gives you a means of selecting those people where your campaign will have the biggest effect on their purchasing behavior.

Net lift of $ purchase amount

This goes the same way as modeling the above. Only here you model the campaign impact on the $ purchase amount of your customers.

The above combined

Obviously this is just … the above combined, where you try to maximize
( net increase of purchase probability x net increase of purchase amount ) – contact cost and this summed over all the customers in you campaign target group.

Leaves us with one more : Optimal pricing modeling

price versus probability to buy

Optimal price chart

This one goes about the same way as the net lift modeling of the purchase probability. There we calculated the probability differences between the target group and the control group. In the optimal pricing modeling we also need a first campaign to get the data and thenwe calculate several models. Each model is calculated for a different target group. The difference between the target group is, well, the price. OK, we need a business where we can easily ask different prices to different customers for the same service or item. Say for target group A we ask 5$ for the item, in target group B we ask 5.5$ and in target group C we ask 6$.
After this test campaign we calculate three models : the 5$model, the 5.5$model and the 6$model.
With these models we calculate the three probabilities to purchase for each customer. The price we ask a customer for our item should be the one where this price times his calculated purchase probability is the highest. If John has a probability of 7% for purchasing at 5$, a prob. of 6.7% of purchasing at 5.5$ and a probability of 5.5% of purchasing at 6$ you get respectively 0.35, 0.3685 and 0.33. So you should try to sell your item to John at a price of 5.5$ for each “John will pay you on the average 5.5* times 6.7% purchases = 0.3685$ .

Reblog this post [with Zemanta]
Posted by: zyxo | February 12, 2010

Data Mining Models ? 10 reasons for not using them !

  1. Modeling is difficult :
    Wrong ! You are a marketer, so you do not have to do the modeling yourself.
    Find or hire a data-loving whizzkid with a solid analysis background. He/she will do the hard work for you
  2. Models are to expensive :
    Wrong ! Wrong for two reasons :
    1. The tools are actually free (download R or Weka from the internet)
    2. The model pays itself if you use it in one or multiple campaigns
  3. Models are black boxes :
    Right ! But that doesn’t mean they are not useful !
    Do you know exactly how your PC works ? Your TV set ? Your car ?
  4. I am not sure the models will work :
    Wrong ! Just ask the whizzkid to test the model with real-life data, that he did not use for modelling. He can make you a nice chart, showing that the top selected customers do much better than the others.
  5. It is impossible to explain the models to the sales people :
    Right ! Because the models are black boxes.
    Wong ! Ask your whizzkid to give you the three most significant variables of the model and communicate these to your sales people.
    (Not so black-box after all !)
  6. selections based on the model are to heterogeneous / targets do not match our campaign materials (banners, ads …) :
    Wrong ! OK, sometimes heterogeneous perhaps, but who said you have to communicate to all of them in the same way ? All you have to do is segment the selection with a communication goal in mind (talk to your whizzkid)
  7. the selection will become to complicated :
    Wrong ! The selection becomes simpler ! Only use the model score as the selection criterion (besides other “administrative criteria”).
  8. I did it for years without data mining models. Why change ? :
    So Sad ! You need to change, otherwise the competition will leapfrog you !
  9. The expected returns based on the model data are way to low. I want to get more out of my campaign! : Unrealistic ! Ask your whizzkid to compare the expected returns from the model with those from your own selection criteria. Then let the data decide !
    A decent data mining model allways outperforms man-made selection rules.
  10. My job is to spend the campaign budget. I do not care about the results (nobody else does anyway) :
    So sad ! You make life easy for your competitors !

You can also view this list as a free to download presentation

Reblog this post [with Zemanta]
Posted by: zyxo | February 7, 2010

Classification, Prior Probabilities and Soft Metrics

I never liked prior probabilities, nor classification.
This is probably a bit weird from someone who likes to mining data with decision trees. But I will explain.

For me classification (in data mining) means that you decide with some sophisticated algorithm to which category an observation belongs. “Is someone a terrorist or not ?”

The algorithm calculates the appropriate categories in three steps.

– First the real data mining is done : based on some discriminating, independent variables a model is trained/calculated/derived.
– The second step is to feed observations to that model for which it will calculate probabilities to belong to the different categories. “Jim has a probability of 0.85 for being a terrorist”.
– Third step : decide to which category the observation belongs.

Prior probabilities can be used in steps 2 and 3. If you do not use prior probabilities but you did some over- or undersampling, the model will etc … etc… Sigh ! For a simple data miner like me it becomes to complicated, to artificial.
If you want to know more about it anyway, you should look at some definitions/explanations : [I], [II], [III], [IV]

A more easy, more straightforward, and more trustworthy way is the following :
– Execute step 1 to obtain you model.
– Use an unseen real life data sample and feed it to the model to calculate the probabilities.
– Use the same unseen real life data sample with known real categories, add the calculated probabilities to the observations and calculate the real probabilities. You can do this simply by sorting them from high to low probabilities and calculate the frequencies of each category in bins of for example 1% of your observation. Now you have a means of transforming the probabilities calculated by the model into real probabilities.

You still have no categories ! Right. But why would you need them ?
Example : commercial targeting. You make a model to optimize your target groups because you want to know who has a high probability to buy you product. What would be your categories ? (potential) Buyers and (potential) non-buyers ? This is nonsense! Models are not perfect. Even the 1 percent of observations with the highest probabilities of belonging to category A will contain a number of observations of the other categories. The only thing you calculate is the probability, not the real category.
Optimizing your target group means finding a balance between 1) the number of people in the group, 2) the cost to contact these people and 3) the expected return.

The probabilities are what they call “soft metrics”. This is sort of a new term for what since long is know as fuzzy logic. It is like if you only want to distinguish black and white in a world of gray scales. It is like you do not know the temperature, but you know it’s warm.

(pl)Logika rozmyta - temperatura (en)Fuzzy log...
Image via Wikipedia

! Wikipedia has no item for “soft metrics” !

Some definitions :

“Soft metrics evaluate the things that aren’t apparent but may help predict a company’s future: are there heavy hitters on the board of directors? Has the management team succeeded before?” (answers.com)

“An approach to decision making based on soft metrics could allow problems to be solved where no definitive “yes-no” answer is possible” (via @ayoubsciences)

In our current social internet world for marketing it means “sentiment metrics” : engagement, conversations, buzz, interactions, word of mouth, awareness and brand as outcomes of marketing campaigns. (hard metrics : sales figures, number of new customers …).

If you want to read more about soft metrics :
Computational Models of Group Dynamics for National and International Security Applications (Mihaela Quirk)
Marketing metrics : the hard and the soft
How Soft Metrics Can Make Hard Choices Easier

Other posts you might enjoy reading :
Data mining for marketing campaigns : interpretation of lift
Howmany inputs do data miners need ?
Oversampling or undersampling ?
data mining with decision trees : what they never tell you
The top-10 data mining mistakes
Good enough / data quality

Reblog this post [with Zemanta]
Posted by: zyxo | February 6, 2010

Are Tennis Point Counts Unfair ?

Venus Williams plays Vera Dushevina on the ope...
Image via Wikipedia

Ever wondered about this silly scoring system in tennis ? Me too. I suppose we do not think about the same thing : whether they count 15-30-40 or 1-2-3 does not really make a difference. My concerns here are about the composite system:

  • a game ends at 40+ (or 3+ if you count 1-2-3) if the difference between the two players is 2 points
  • a set ends at 6 if the difference between the two players is 2 games
  • if at 6 the difference is only one game, an extra game is played. This results in either 7-5 or 6-6. In the latter case a tie-break is played. In a tie-break they simply count points up to 7, but there must be a difference of 2. So a tie-break is just a special sort of game.
  • In the last set of the grand slam tornaments tie-breaks are not played. In stead they go on with the games until there is a difference of 2 games.

Now the question : is this a good system ?

But first another question : who wins the match ? Obvious answer : the best of the two players.
OK, but if he is the best, why doesn’t he always win all the sets ?
Or, if he is the best, why doesn’t he always win all the games ?
Or, if he is the best, why doesn’t he always win all the points ?

The answer : variability, noise, variance, standard deviations, standard error, luck, chance …

So let’s rephrase the above.
Who wins the match ? The one who is on average the best in the match (is this true ?)
Who wins the set ? The one who is on average the best in the set (is this true ?)
Who wins the game ? The one who is on average the best in the game (is this true ?)
Who wins the point ? The one who is the best in that point (this is true !)

And : is this a good system ?

Let us look a a game.

a tennis game

tennis quality of the 7 points of a single tennis game

This first chart shows the tennis quality of players A and B during each of the 7 points of a game. Four times player A was the better one and scored. Only three times was player B the better one. Result at 40-30 player A wins the final point and he wins the game. Was he really the best ?
Let us calculate the average quality of the 7 points : A= 5.86 B=5.93. So actually B was the better player, but A was more lucky. Conclusion : in each game there is a portion of luck.

What do we do in statistics ? We repeat the experiments a number of times to “average out” this luck. This is exactly what happens in a tennis set. After 6 (6-0) to 13 (7-6) games we can expect that both players are lucky a number of times and it is only the tennis quality that determines the outcome.
However, consider following set :

Player B wins with 5 – 7, but scores in total three points less then player A. Obviously he was not the best player but nevertheless he won the set.

And a similar discordance between player quality and match outcome may exist.
Look at the mach between Stepanek and Carlovic in the first round of the Australian Open 2010.

Stepanek lost his match, although he was the better player, winning 26 games against only 24 for Carlovic.

It is clear : with the current counting system it is not always the one who played the best, who wins the match.
If we would want this to be the case the counting would be very simple : just keep on counting point per point, up to a maximal number. If chance plays a role … by chance in any point it would be like tossing a coin : the more you toss, the closer you come to 50% heads or tails.
As men play on the average 230.5 points in a match I suppose 150 points would be a good target.

Why don’t they never use a system like that ? Sports is not only fun for the ones who play it, but it is also entertainment for the supporters. The Romans already knew that : Bread & Circuses ! And when do we like a game or a movie ? When tension builds up towards the and and the final outcome is a surprise. This means that in a sports game chance, luck, surprise must be possible untill the end. Games and sets are meant to start chances all over again. Whether you lose a set with 6 – 0 or 6 – 7 , next set still begins at 0 – 0, meaning that a slight difference in your favor during the second set can undo the huge inequality in the first.

Wait ! they do ! In soccer they do, in basketball they do. In waterpolo they do, in handball they do … They just count the number of points.
But in soccer the number of goals are normally so few (1-2, 3-0 are common soccer results) that in most matches we can consider this for a great deal a “chance game”, especially when an erroneous decision of the arbiter can result in a penalty kick, with a large probability of scoring.

Basketball is something different. Its scoring system is just based on averaging. No sets, no games, just one mach with each point adding up to the total score and with a large number of points at the end of the match. So It is obviously the fairest system possible.

Reblog this post [with Zemanta]
Posted by: zyxo | January 31, 2010

Link list for january 2010

My links for january. Enjoy browsing !

Lifeless prions capable of evolutionary change
Why is’nt the milky way crawling with life (S.Hawking)
100 Job Search Tips From Fortune 500 Recruiters
What is engagement and how do we measure it ?
Three questions an executive should ask for the new year (and more)
When someone googles you, what do you want to happen ?
Angels and Demon’s anti-matter
The state of technology in 2010
Dolphins should be treated as ‘non-human persons
Future predictions of top scientists
Executives : the data is in your hands !
25-point website usability checklist
Google’s 10 toughest rivals
The tragedy of anti-data leadership and dataphobia
top-10 science/tech stories of the decade
Why most sales forecasts suck…and how Monte Carlo simulations can make them better
Google Blog – Helping computers understand language
top-15 chemical additives in your food
What if a jury could decide if you ar guilty by reading your mind?
hyper-heuristic decision tree induction
humans were once an edangered species
scientists develop walking robot maid
How to dual-boot Vista with Ubuntu
Avatar technology could bring back Clint Eastwood at 35 years
You won’t find consciousness in the brain
Emergence of a global brain – will it happen
11 ways to think outside the box
random rules for idea worth spreading

Posted by: zyxo | January 25, 2010

Will rich people become a different species ?

This Wall street journal blog post wonders if rich people will become another species, because they can afford all the new medical and technical remedies and life enhancements.

darth vader

It made me wonder.

They have to stay the same species.

If rich people want to profit from donor organs, they better stay the same species, otherwise they will face tissue incompatibility problems.

It is difficult to form a new species.

Speciation or the splitting of an existing species into two (or more) new ones is the result of genetical barriers. When different populations of the same species are genetically isolated, they can evolve into different species. This genetical isolation is the result of some sort of geographical isolation.
Humans populations are far from isolated. Globalisation proves the opposite.
Is it possible that rich people can isolate themselves sufficiently to form a new species ?
We all know some examples of isolated human populations : Australian aboriginals, North-American indians. In all those centuries they evolved to a different kind of people, but still remained people of the one human species. Imagine how difficult it would be in our modern world for human speciation to occur.

It is more likely that “species” will lose it’s meaning.

With that I mean, that, at least for humans and eventually, the most intelligent other mammals, the species concept will possibly dissapear.
Sounds a bit weird ?
Imagine what the current technological evolution will mean in the future : messing with genes, bionics. A lot of people already have false theeth, hips, knees, heart valves, cochlear implants. Experiments are going on with gene therapy, chip implants in animal and human brains, artificial arms & legs.
On the other hand we fabricate robots that behave (up to a limit) as human beings.
This all means that the border between biological and artificial life is thinning.
Add to that the in vitro fertilisation, cloning, and similar technologies and we are heading for a world where we will no longer need to make love on order to have children.

Where were we ? OK : we will make humans (still humans ?) in the laboratory, and will to chose among a lot of genetical enhancement options. During the life of our kids we’ll still be able to add all sorts of technological enhancements.
Can you imagine the diversity of humans at the end? It will be beyond anything earth ever saw.

At the same time rich people will have the same options for their pets. All amazing sorts of dogs, cats, snakes, rats, horses, which finally will sort of blend, so that some dogcats will look like rabbitmonkeys …

There is a saying that dog-owners come to look like their dog. At the end people with plenty of money will have the possibility to really give their dog their own face or the face of their beloved who passed away. With the necessary brain implants, this animal (?) will have the normal human intelligence.

Do you still know where the species is?

Enjoyed this post ? Then you might be interested in the following :
– Web 5.0: The telepathic web
– Robotic insects or cyber-insects ?
– Self reassembling Robot
– Human brain copy protection by AnyMind Inc.
– Humans 2.0

Reblog this post [with Zemanta]

Howmany customers does the grocer around the corner have ?
Howmany customers do you have on your website ?
Howmany cars are there that use a particular crossroad ?
Howmany birds of a particular species live in a determined patch of forest ?

Howmany ? Is there a way of finding out ?
(aside of the eventual meaningfulness of this question, I find it an intruiging one)

A first step is simple : Unique visitors in a given period of time.

– You just watch the grocery for an afternoon and count all people that enter. Make sure you do not count the returning one that forgot the suger twice !
– Get the unique visitor number of your website from Google Analytics ore whatever web analytics tool you use.
– Get a mojito from the bar at the corner and write down all license numbers you see, for one or two hours. Afterwards eliminate al returning ones.
– take a walk that covers the entire forest patch and count each individual of that particular bird species.

Simple, is it not ?
But did that get you the actual TOTAL number ? NO

What about the loyal grocery customers that came yesterday and will return tomorrow ? You missed them.
Not everyone visits your website all the time.
Some people leave their car at home and take a walk … to the forest where not every bird will show itself or will be singing.

Realise you only got a fraction of the number.

In biology they use something like capture-recapture.
1. first step : capture some birds, put a ring on one of their feet (identify customers, drop a coockie when they visit your website, write down the license numbers of passing cars)
2. second step : capture some birds, count the number with and without ring (count returning customers, count returning vs. first-time customers/cars)
3. do the simple math: identified/non-identified = marked/total_number
If the first time you captured 100 birds, the second time you captured the same number and 25 of them were already ringed, you can say that in your forest 1/4 of the birds are ringed, so the forest contains a total of 400 birds.
Idem for you website : if 50 % of your visitors are returning ones, you may say that you have twice the amount of coockies dropped as visitors.

These figures are correct … if we accept some assumptions, that we cannot accept :

– birds that are captured once become much more shy. They will be under-captured the second time.
– not all people have the same activity on the internet , or on the road. They do not all have the same probability of showing up. And some delete their coockies !
– The first day perhaps there was a tourist or two in the grocery who will never return !
– migration : some come, some go …
– with the same effect as migration : births, deads

Perhaps there is more to learn when we take the figures for a number of periods in succession, like say, capture birds or monitor the number of signons on your websitefor two weeks in a row.

Two things we can learn :
– an estimation of the total population ?
– what is the total number of identified individuals (ringed birds, website visitors who signed up) ?

We can assume that what we get should lay between two extremes :
1) No migration/births-deads
This first extreme should show us the actual, stable situation.

The chart shows two lines : the highly fluctuating line is the percentage of identified individuals, day per day, whereas the more stable line shows the cumulated data. These cumulated figures tend towards the real % of identified individuals in the population. Here we see that 24% carries an ID. By simply using the proportion we can easily calculate the total number of individuals.
OK, for birds in a forest it is simple, but for website visitors it is a bit more complicated. Not every visitor with an username/password for your website will sign on each time he visits, but let us assume this is the case anyway, for now.

The second chart shows the theoretical cumulative proportion of ID’d individuals. If each day 24% is ID’d, after 14 days nearly 100% of them will be captured, seen or have visited your site.

2) only migration/one-day flies
What about this second extreme ? This really means that you do not have any returning visitors, or that each bird you capture is some migrant passing trough.
So you never see ID’d or returning individuals. Not much information to show. Only the average number per day is interesting.
Although …
Let us take our second graph and add the line that corresponds with our second extreme :

The straight line shows the total number after the 1st, second, third, etc. day. Each day about the same number of individuals come along, but each day these are new ones, so they simply add up.

In real life you will find something in between the two lines (the yellow dots in the following chart). The closer your real-life line is to the curved one, the more stable your population of birds or website visitors. The closer to the straight line, the more volatile your population.

“Something between the two lines” actually means that you deal with two populations : your loyal returning customers (or you sedentary birds in the forest) and your one-time-customers( or birds accidentally passing through).
Considering this, it should be possible to reconstruct your actual, intermediate line, by combining the lines of these two populations… if only you should know them.
Fortunately there is something like excel, open-office calc, or whatever spreadsheet you may use. In stead of finding some complicated equation I made something simple, played a bit with the numbers to come up with the following chart :

The blue squares are the “actual” observations, the red diamonds represent the theoretical line for the “sedentary” population and the blue triangles show the one-pass-and-never-return population.
The two latter sum neatly up to match the actual observations.

What are the columns in my spreadsheet ?
1. day number (1, 2, 3, 4 …), is shown on the horizontal axis
2. volatile population (straight line) = one number, let’s call it A, multiplied by the day number from column 1
3. stable population (“theoritical” curve). This is more complicated. The first cell is a fraction (let’s say 25%) of the total number we will have seen on for example the 14th day. The second cell equals the first+the same fraction (25%) of the rest of the total number and so on.
4. the actual data = the cumulative number of individuals observed the first day, the first two days, the first three days etc.

Now you just have to tweak (play with) the three numbers : the one for the volatile population and the two (total number + fraction) for the stable population until the two populations sum up to (nearly) exactly the same data as column 4.
OK, this tweaking is not very scientific. You could do the necessary programming to obtain automatically the desired result or, if you are good at math, derive some equation to reach your goal more efficiently. At the end the result will be the same.

Did you enjoy this post ?  Then you might like the following :

Are you a good data miner ?
Men are more accurate than women … or lousy statistics ?
Good enough/data quality

Reblog this post [with Zemanta]
Posted by: zyxo | January 8, 2010

Does Avatar show the future?

2154 is the year when an Avatar fell in love with a cute Pandoran aboriginal. Well, I like her too 🙂

A blog post by Seth Grimes made me think. Seth has some problems with what we see in 2154 : mostly contradictions and anachronisms !

Here l borrow his ideas, add mine and some comments.

Anachronisms.

Remember we are talking 150 years from now. If we compare our current world, state of technology with that of 150 years ago a lot has changed : especially electronics has made the big difference. We are also very much aware that the speed of this technological evolution is continuously speeding up (remember Moore’s law : it is exponential).

So, according to Avatar’s creators, what will humans use 150 years from now ?
– a stupid weel chair
– napalm bombs
– manned air ships and ground battle machines
– very short range missiles
– helicopters with propellers
– mechanical man-machine interfaces
– ground troops

Contradictions

– humans are able to control the Avatar consciousness from a distance. They must have something like mind transmission. And yet they need physical contact to handle their equipment !
– Pandorans talk to eachother, have misunderstandings, quarrels and all we know whe, humans, have. And yet there are able to communicate by means of a direct connection to their animals and even their trees. Why not connect directly with eachother and form one connected mind?

How should it really look like in 2154 ?

– As far as I am concerned they should have something like anti-gravitation, like for someone with amputated legs, airplanes and the like.
– The same anti-gravitation device should be able to pull the giant tree out of the ground (with sufficiently soil around its roots) and plant it back somewhere else, so no need for napalm bombs and all that archaic shit.
– Already now are we, humans, able to read minds, let it be very, very basic and with huge machines (eMRI and the like). The next steps are brain-machine interfaces, wirelessly connected to the internet, which will make technology-enabled telepathy possible. So everything will be mind-controlled, no actual soldiers needed on the battle field.
– By then, Artificial Intelligence will have created the so-called singularity, a super-human intelligence. This fellow will sure find better ways of getting to “unobtainium” than bombing some tree.

Did you enjoy this post ?  Then you might like the following :

Web 5.0 : the telepathic web
Humans 2.0 ?
Psychons: elementary particles of the mind.
The human cyborg
Human evolution : the future of men

Reblog this post [with Zemanta]
Posted by: zyxo | January 1, 2010

Link list for december 2009

Enjoy browsing :

Dutch PhD student produces anti-noise to combat noise
tattoos in advertising
machine allows people to type with their minds
co-processors for the human mind
Eureqa calculates scientific laws
is social media worth your time ?
amazing pictures !
highcharts
why we do not care about information overload
Can “nice girls” negotiate ?
Google analytics illegal ! (say German regulators)
The idea that will make twitter more profitable than google
What matters now : 60 important words beautifully explained by 60 important people
coin tosses can be easily rigged
Brain cells reach a decision by computing probabilities
I am a God …
Ben Goertzel and the US military on the ethics of battlebots

Human enhancement : bioliberation versus biothreat
Scientists are drowning in data !
the three sexy skills of data geeks

text data quality (seth grimes) and Manya Mayes
10 things that make humans special
life before computers were invented
Shopping list of cyber criminals
5 ways humans could become obsolete
Golden ratio’s for beauty of female faces

3D fractals
SEO tools
How close are we to colonizing space ?
18 giga pixel photo of Prague
The known universe
Every 20 minutes we lose an animal species
Machine translate thought to speech in real time
Will uploaded minds in machines be alive ?
Resources for newcomers to R
Ecosystems on the run due to climate change
free services to schedule your tweets
Trending twitter topics of 2009
Data, not design is king in the age of google
Kissing the frog : a mathematicians’ guide to mating

Wind dispersal of dandelion seeds.
Image via Wikipedia

As a result of global warming, our ecosystems are running away at a speed of 420 meters per year.   That is about 500 human paces, 1.5 per day.

What does that mean ?

At first sight this is not a big deal.  What is 420 meters ?  This is only 42 kilometers in 100 years ! The animals and plants just have to follow !

Wait !

The animals, perhaps they can follow, they can walk, fly, crowl, dig …

What about the plants ? If they are lucky to have mechanisms like dispersing of their seeds by wind, or by birds (seeds in fruits), they should manage it.

No, Wait !

What if there are obstacles ?  Like seas, streams, mountains ?  Neither animals nor plants can follow their ecosystem, because the ecosystem will cease to exist.  An ecosystem that follows its preferred temperature and hits the seashore, will simply drown in the see.  You ever saw a forest crossing a sea?

What is the alternative ?

Adapt or die !
Species that follow their ecosystem do this because at the back end the environment becomes a bit too difficult to live in.  But when their ecosystem hits the sea, all that will stay behind is this unfavorable environment.  The only solution is to adapt to this new environment.  Which means : evolution, as quick as possible, before the environment becomes to harsh.

So I predict that we will witness accelerated evolutions going on all over the world.

Did you enjoy this post ?  Then you should read the following :
Human evolution : amazingly fast
The direction of evolution : speed matters
Evolution can occur in less than 10 years
The chicken or the egg ?

Reblog this post [with Zemanta]
Posted by: zyxo | December 29, 2009

10 Predictions for 2010 to 2020

Instrumental record of global average temperat...
Image via Wikipedia

What is to come ?

1) nano-stuff.  The potentials are huge and the technology is developing fast.  Exemples : A nano-window that washes itself or Tracking new cancer-killing particles with MRI

2) Artificial Intelligence.  While the data-mining hype in the 80-s was a failure because of computer processing limits, a new wind blows trough AI that wants to create a Superhuman Intelligence. a so-called Singularity.

3) Mind-machine communication.  This is still very basic but one success after the other is published.  Example:  people type only with thoughts.  But this means more than mind-machine communication : if you ad a second mind on the other side of the machine, you have mind-mind communication : technology-enabled telepathy !

4) electric cars : Not really future any more.  With the greener mind of a lot of people, the investments in wind- and solar energy it is just a matter of years before everyone will buy an electric one.  Mine must have photovoltaic panels on its rooftop to recharge the battery when I am shopping.

5) photo-voltaic windows : photovoltaic panels are ugly when you put them on your roof.  Photovoltaic windows are just like other windows and hence why should you not use them in stead of normal ones?  But before the technology becomes really mature, we can take pĥotovoltaism into account when designing our buildings in stead of putting the panels afterwards on our roof.

6) wireless database-driven / data mining medicine (not just doctors with gut feelings) : Do you know of examples where your doctor was wrong for months before the patient or his/her family decided to go to another doctor who saw in the blink of an eye (or after some blood tests) what the real problem was?  Now databases exist and medical software to assist the doctor.  So he should see you and, guided by his software, should ask the right questions, eliminating all impossible diseases, to come up either with the right one or with extra tests to perform in order to detect the real problem. Exit medicine men !

7) movies without movie-stars (Avatar squared) : Movie stars out of work !  1. Select the people you want to be the basis for your stars.  2. Film them in their real life.  3. Load them up in your computers.  4.  Feed the computers with the scenario/script.   5. Select the looks/feels/characteristics of your stars. 6. Describe the decors.  7. Run the software  8. Evaluate the result and eventually go back to step 4 or 5. 9. Do some editing.   10. Ship the movie.

8 ) the semantic web (Web 3.0).  It will become possible to tell your computer (smarthphone, whatever that’s connected to the internet) in your own language what you want.  It will respond not based on the words, but on their meaning (in context).

9) accelerated evolution : global warming changes the environment for a lot of species much faster than usual.  They will either follow their preferred ecosystem as it moves around, or if they encounter a serious obstacle an cannot move further, they will evolve lightning fast to the new conditions.

10) robots : Real humanlike robots like the japanese Asimo will stay too expensive for a normal human being.  But a lot of military applications are possible so things like : cockroaches offer inspiration for running robots, or flying insects and robots will go at war !

I’m not the only one to make previsions of the future :

the futurist
Institute for emerging ethics & technology
Lain Dale’s diary
Wall street pit
ReadWriteWeb
ZDnet
Darwin Central
True/Slant
The Security Blog

Did you like this post ?  Then you might enjoy the following :

Human brain copy protection by Anymind inc.
Job interview or brain scan ?
Adam and Eve : Robot scientists
New laws of robotics
Web 5.0 : Computer telepathy ?

Reblog this post [with Zemanta]
Posted by: zyxo | December 13, 2009

Statistics on 250 twitter tools

Do you know which are the most popular twitter tools ?

Curious as I was, how could you know such a thing ? Organize a poll ? I fear I do not have enough readers or followers on twitter to end up with sufficient data.

But I had two options left :

  1. count the number of times a twitter tool appears in lists of twitter tools. You know, there are lots and lots of lists of twitter tools on the internet. A tool that appears in every list must be very popular, at least for twitter tools listers.
  2. get the number of hits google returns you when you search for the twitter tool


I decided to use the second method.

But first I needed the names of “all” the twitter tools. So I started to get them from the various twitter tools lists. Soon I saw that this could be an exercise that goes on forever !
Neither my patience nor my time are endless, so I decided to stop after 15 lists and 250 twitter tools. Feel free to continue the exercise !

First of all here are the 15 lists :

Mashable
BashBosh
techcruising
Rssapplied
dailyseoblog
Brian Solis
techcrunch
The twitter toolbox
online marketeer
99 Essential Twitter Tools And Applications
Top twitter tools
Top twitter tools for business
My Top 10 Free Twitter Tools (and 3 Honorable Mentions)
47 Awesome Twitter Tools You Should be Using
Twittermania: 140+ More Twitter Tools!

( At the end of this post, I give the remaining url’s of the twitter tools lists that I did not use. )

And now the results.
But first there are two remarks to make.

  1. Most searches were straightforward because the tool has some typical “twitter-like” name so you – and google – cannot get confused to mix them up with some other already existing concept. Example : splitweet.
    But there were a lot of more ambiguous names, like “hellotxt” or “glue”. In those cases I used either the website name of the tool (getglue.com) or added “twitter” to the searchterm.
  2. I know the numbers that google returns are mere … “google numbers”. This means we do not exactly know what’s behind, unless we browse for example all 126.000.000 hits for bit.ly which is a bit too much for me. Also I noted that google never uses more than 3 meaningful digits, the rest are zeros. So these numbers are not very precise. But at least they give some overall picture, which is interesting to see, but, I am aware of, has not very much real meaning or values. Say it is just for fun and curiosity.

And here is the list of 250 twitter tools and their number of google search results : enjoy !
Sorry that I did not hyperlinked them all, just too much work !

1 twittersearch 402000000
2 friendorfollow 208000000
3 bit.ly 126000000
4 seesmic 15900000
5 twitter karma 15000000
6 twittangle 13070000
7 ping.fm 11900000
8 cotweet 9520000
9 twittercounter 8080000
10 tinyurl 7760000
11 brightkite 7470000
12 timer 6780000
13 hootsuite 6050000
14 wefollow 5330000
15 twitthis 4800000
16 headup 4230000
17 tweetmix 4200000
18 toro for twitter 4050000
19 twitterfeed 4030000
20 diigo 3730000
21 loopt 3550000
22 twitter grader 2800000
23 twitpic 2680000
24 tweetdeck 2170000
25 tweetmeme 1800000
25 twitterrific 1750000
27 splitweet 1040000
28 quitter 1020000
29 cheaptweet 841000
30 strawpoil 767000
31 hellotxt 765000
32 twitalyzer 669000
33 snipurl 654000
34 twitterfriends 650000
35 magpie 588000
36 twiggit 581000
37 hashtags 539000
38 twitwall 539000
39 tweepsearch 537000
40 emailtwitter 522000
41 Doesfollow 517000
42 twhirl 490000
43 twitgraph 476000
44 rememberthemilk 461000
45 twitxr 457000
46 hoopla 451000
47 tweetcloud 436000
48 retweetist 425000
49 destroytwitter 404000
50 twibs 397000
51 glue 393000
52 tweetree 375000
53 twellow 353000
54 twitt twoo 349000
55 spaz 326000
56 tweet2tweet 320000
57 twitterberry 291000
58 tinytwitter 290000
59 summize 284000
60 snitter 278000
61 twitterfon 271000
62 twitscoop 267000
63 retweetrank 261000
64 twitterfall 259000
65 favrd 256000
66 microplaza 253000
67 outwit 250000
68 digsby 247000
69 twitterreply 244000
70 twist 236000
71 flaptor twitter search 227000
72 twitter search firefox 222000
73 jott 219000
74 twitbin 218000
75 twittelator 211000
76 twitterkeys 189000
77 twitterlocal 188000
78 twitterbadge 179000
79 twitterholic 178000
80 powertwitter 176000
81 twibble 174000
82 twinbox 166000
83 tweetvisor 163000
84 easytweets 159000
85 tweetrank 157000
86 bubbletweet 154000
87 backtweets 150000
88 huitter 148000
89 tweetstats 145000
90 Itweet 142000
91 tweetbeep 142000
92 twitterfox 142000
93 tweetlinks 135000
94 slandr 131000
95 twitterless 126000
96 vakow 124000
97 twittervision 122000
98 twitdir 118000
99 twitzer 116000
100 twtpoll 114000
101 twitterfone 113000
102 twitter2go 111000
103 twittermail 110000
104 tweetvolume 108000
105 twitdom 103000
106 mrtweet 96300
107 twittytunes 93200
108 tweetlater 92900
109 peoplebrowsr 90100
110 twinfluence 88100
111 twitternotes 87500
112 twideoo 87000
113 mr milestone 86100
114 cursebird 86000
115 WP twitter tools 85700
116 tweetgrid 82600
117 tweetr 82000
118 twoogle 81000
119 twitterbar 80700
120 snaptweet 72700
121 tweetscan 71300
122 twitteroo 68700
123 hahlo 68500
124 tweetburner 68200
125 twuffer 66800
126 twittercal 65100
127 twittonary 65100
128 twitter updater 63400
129 tweetchat 62900
130 twitter100 62800
131 tweetake 60800
132 socialtoo 60600
133 nearbytweets 56900
134 monitter 56600
135 tweepler 55900
136 twtvite 55200
137 twilert 55100
138 tapulous 52300
139 tweetwire 50900
140 feedalizr 50700
141 secrettweet 49800
142 twitterhawk 48400
143 twitturly 48400
144 Xpenser 48400
145 grouptweet 48300
146 tweetcube 48000
147 tweet this 47900
148 tweetwheel 46700
149 linkbunch 46400
150 twiddict 46200
151 twittertise 44900
152 tweetsum 43500
153 twitstat 43200
154 followcost 43100
155 twitter sharts 41900
156 tweetrush 40300
157 untweeps 38700
158 twtqpon 37900
159 tweetsuite 35600
160 citytweets 35500
161 twistori 35100
162 twitpay 34800
163 twitterpatterns 33200
164 tweepular 32300
165 gps twit 31800
166 twonvert 30900
167 Matt 30500
168 livetwitting 30400
169 twitseeker 30000
170 twittergallery 30000
171 twitoria 29100
172 quotably 28400
173 mymilemarker 28100
174 tweetchannel 27800
175 twistory 27800
176 tweet pro 27400
177 twitzu 27000
178 justtweetit 26500
179 twitterIM 26100
180 gridjit 25700
181 twittercamp 25100
182 twubble 24800
183 socialwhois 24700
184 twittereyes 23500
185 twtrfrnd 23400
186 twittad 23200
187 twittersnooze 21900
188 brabblr 21400
189 twalala 20800
190 whostalkin 19700
191 twitsay 19500
192 twittearth 19400
193 istwitterdown 19300
194 twtcard 19200
195 pockettweets 19000
196 toptweet 18900
197 twerpscan 18600
198 nozbe 17500
199 twemes 17000
200 autopostr 16300
201 twixxer 16200
202 twitterdigest 16100
203 whoshouldifollow 15900
204 twithire 15800
205 madtwitter 15300
206 tweetwhatyouspend 15000
207 xefer 14500
208 twittertroll 13800
209 twitterlights 13000
210 twitority 12200
211 twitterfriends network browser 12100
212 feedtweeter 10100
213 tweetwasters 9870
214 tweetie for iphone 9680
215 twitrans 9520
216 twiffid 9200
217 mytweeple 9130
218 twitresponse 9110
219 itweet2 9020
220 microrevie 8970
221 twitterratio 8960
222 Nest.Unclutterer 8730
223 twply 7730
224 itwtr 6730
225 twitspy 6320
226 postica 6260
227 twitrand 5660
228 twinkle 5520
229 dreamtweet 4870
230 whatsyourtweetworth 4190
231 mycleenr 4090
232 WP twitip Id 3780
233 twitslikeme 3760
234 alphatwitter 3510
235 plodt 3180
236 readmytweets 3150
237 twittords 3020
238 vacatweet 3010
239 twenglish 2850
240 acamin 2420
241 tweetpad 2330
242 twitexplorer 2120
243 twi8r 1880
244 twitgeistr 1880
245 whofollowswhom 1660
246 tweeterology 1410
247 twitblocker 1150
248 twitalks 1040
249 twitterscan 39

 
Oops ! either I come one too short, or I made some mistake in my numbering …

And here are the url’s of the other twitter tools lists I did not use :

http://www.sociableblog.com/2009/03/18/100-twitter-tools-to-help-you-achieve-all-your-goals/
http://net.tutsplus.com/articles/10-awesome-ways-to-integrate-twitter-with-your-website/
http://www.folkd.com/go/top+10+twitter+tools

Top 10 Most Useful Practical Twitter Tools for The Twitter Professionals


http://www.1stwebdesigner.com/development/27-twitter-tools-to-help-you-find-and-manage-followers/
http://www.quickonlinetips.com/archives/2007/04/10-best-twitter-tools-for-wordpress-blogs/
http://www.seoptimise.com/blog/2009/10/30-twitter-tools-for-business.html
http://tendou86.blogspot.com/2009/01/top-10-twitter-tools.html
http://www.hellogiri.com/top-10-most-useful-twitter-tools-list-for-pc-mobiles-and-blogs/

Top 25 twitter tools for WordPress


http://www.newmediabytes.com/2008/01/18/best-twitter-tools-resources-and-clients-guide/
http://savethemedia.com/2009/02/17/top-twitter-tools-for-journalists/

Top 100 Most Influential Twitter Tools


http://www.google.be/search?hl=nl&client=firefox-a&rls=com.ubuntu:nl:official&q=TOP+10+LIST+OF+twitter+tools&start=30&sa=N
http://www.c4lpt.co.uk/recommended/
http://www.squidoo.com/twitterapps?utm_campaign=direct-discovery&utm_medium=sidebar&utm_source=pkmcr
http://pelfusion.com/tools/30-twitter-tools-for-managing-followers/

http://www.seodubai.org/2009/01/16/list-of-twitter-tools-that-you-must-have/
http://brendanhughes.ie/2009/06/21/top-10-twitter-tools-for-business/
http://www.smbceo.com/2009/03/25/top-27-twitter-applications/
http://www.smmguru.com/2008/10/22/the-master-list-of-twitter-tools-and-apps

Top 10 Twitter Tools For Musicians


http://www.socialmediatoday.com/SMC/80437
http://www.blogcatalog.com/topic/list+of+twitter+tools/
http://www.scgpr.com/wordpress/?p=492
http://www.socialmedialists.com/wiki/index.php?title=Twitter_Tools
http://www.twitadder.info/
http://www.thedailyanchor.com/2009/02/17/85-twitter-tools/
http://www.techtreak.com/downloads/10-awesome-twitter-tools-as-wordpress-plugins/
http://steve-wakefield.com/2009/10/my-top-10-twitter-tools-and-then-some/
http://www.thinktechno.com/2009/05/31/top-10-twitter-tools/
http://www.brandsamongmany.com/2009/03/09/the-ultimate-list-of-twitter-tools/
http://www.webuildyourblog.com/1289/increase-twitter-top-10-twitter-tools/
http://www.networkworld.com/slideshows/2008/060208-top-twitter-tools.html
http://www.warriorforum.com/blogs/dsmpublishing/8167-top-10-twitter-tools-everyone-should-own-their-online-business.html
http://www.girlopinion.com/2009/06/07/top-10-twitter-tools/
http://www.twitip.com/10-more-must-have-twitter-tools/
http://gnoted.com/100-twitter-tools-ultimate-power-collection/
http://freelancefolder.com/15-useful-twitter-tools-for-web-workers/

Twitter Tools & Resources For Jumpstarting Your Twitter Experience


http://www.placona.co.uk/blog/post.cfm/my-top-favourite-twitter-tools

If you enjoyed this post, then you might also be interested in the following :
top 10 lists of twitter tools
A bunch of tools for twitter
A second bunch of tools for twitter
Micro Email = twitmail

 

Reblog this post [with Zemanta]
Posted by: zyxo | December 6, 2009

Top 10 lists of twitter tools

A Twitter profile
Image via Wikipedia

Twitter started in March of 2006 as a very simple service to connect people by sending short messages of max. 140 characters.
Who could imagine at that time that not only twitter would become so popular, but that, thanks to their API the number of twitter tools and services would explode the way it did ?

On the internet we find a wealth of lists of twitter tools (I wrote also two of them). As the evolution rocks off the charts, and I wanted to assemble a new list I figured it would be interesting to make a meta-list : a list of lists of twitter tools.

We can find all sorts of lists of twitter tools. I figured the lists I wanted to list had to have something in common, so the list would make some sense.

It has become a list of top-10 lists :

1 BashBosh : Top 10 Tools for Twitter Freaks
2 Techcruising : Top 10 Twitter tools for a power user
3 Rssapplied : Top Ten Twitter Tools
4 Dailyseoblog : 10 twitter tools to effectively manage your followers
5 The twitter toolbox : Top 10 Tools For Your WordPress Blog
7 Itpro : Top 10 Twitter tools for business
8 Dooleyonline : My Top 10 Free Twitter Tools (and 3 Honorable Mentions)
9 Tutsplus : 10 Awesome Ways to Integrate Twitter With Your Website
10 Atniz : Top 10 Twitter Tool
11 Quickonlinetips : 10 Best Twitter Tools, Plugins, Widgets for WordPress Blogs
12 Tendou86 : Top 10 Twitter Tools
13 Hellogiri : Top 10 Most useful Twitter Tools list for PC, mobiles and blogs
14 Top10 Twitter Tools : Twitter Tools Top 10
15 Brendanhughes : Top 10 Twitter Tools for Business
16 Hypebot : Top 10 Twitter Tools For Musicians
17 Techtreak : 10 Awesome Twitter Tools as WordPress Plugins
18 Steve-wakefield : My Top 10 Twitter Tools… and then some!
19 Thinktechno : Top 10 Twitter Tools
20 Webuildyourblog : Increase your Twitter following with these top 10 Twitter Tools
21 Warriorforum : Top 10 Twitter Tools That Everyone Should Own For Their Online Business
22 Girlopinion : Top 10 Twitter Tools
23 Twitip : 10 MORE Must Have Twitter Tools

.

If you enjoyed this post, then you might also be interested in the following :
A bunch of tools for twitter
A second bunch of tools for twitter
Micro Email = twitmail

 

Reblog this post [with Zemanta]
Posted by: zyxo | November 30, 2009

Link list for november 2009

Enjoy browsing :

Douglas Hofstadters: musing on the singularity
a clever way of searching
how to really browse without a trace
you should follow me on twitter
The new era of inbound marketing
the twitter song
Top 10 Most useful Web Developers tools for Firefox
elevators to space
Sharing small snippets of information about your daily life could be generated automatically
who will edit your life ?
A Fractal Perspective on Enterprise 2.0 Adoption
10 things about google that you might not know
test your science knowledge with science cheerleaders (fun)
you think your child is smart ?
what is the meaning of “organism” ?
how ants make their nest
periodic table of marketing elements
Bill Bryson’s Notes from a Large Hadron Collider
The Über-Connected Organization: A Mandate for 2010
Is neighbor’s Wi-Fi signal free for me to use?
dark chocolate helps ease emotional stress
Why does’nt linux need defragmenting ?
Are solar cells warming up the earth ?
bounce rates
graphedge
six insane laws we will need in the future
explore your twitter friends and hashtags with mentionmaps
Top Ten list of excuses not to engage in co-creation
how to achieve something
how heavy is the internet ?
Intel wants a chip implant in your brain
in the brain, se7en is a magic number
We perform best when no one tells us what to do

Enjoyed this post ? Then you might be interested in the following :
link list for october 2009
link list for september 2009
link list for august 2009
link list for juny 2009
link list for may 2009

Posted by: zyxo | November 23, 2009

Thoughts on Traffic Jams

Traffic Jam in Delhi
Image via Wikipedia

I am sure everybody knows the feeling when you get stuck in a traffic jam. No need to say this is becoming a huge problem.
Why are there traffic jams ? Is it possible to prevent them ?

What is a traffic jam ?
Very simply put : you experience a traffic jam, when there is no space in front of you to move on. We all love an empty road ahead. But you do not really need an empty road in front of you. When the driver in front of you nicely drives on, he is constantly making the necessary space so that you can move on too.
So there are two factors : i) there is a car in front of you and ii) it is not moving.
(“Ants have no traffic jams !” Are they more intelligent ?)

How much space do you need ?
This is not so simple. It depends on your speed. You want enough space to have the time to stop when the one in front of you stops. Hence you only move on when there is more space before you than the minimum you feel save with. What you really want is not space, but time. A good -conservative- rule of thumb is 4 seconds or 2 crocodiles (just say : “one crocodile, two crocodiles”).

To put it the opposite way : When is there no traffic jam ?
First everybody must be moving, and second there has to be enough time between the cars.

How to prevent traffic jams?
Since there are two factors in play : space and speed (space/time) we can play with both.

i) The first is space : it is obvious that lowering the number of cars on a given time on a given road will be a good thing, making more room per car. So you need to prevent (some) people to take their car, for example by enhancing public transportation, by making it more expensive to drive a car (taxes).

ii) The second one is less intuitive : a general remedy to traffic jams is limiting the speed. Why ?
My first reaction is : this makes no sense at all! If at a high speed or a low speed you allways keep 4 seconds between two cars, this means that either way every 4 seconds there is a car. So at a lower speed the road cannot “transport” more cars per time-unit.
However, there is another consequence of driving slower : the space you need in front of you diminishes : 4 seconds at 70 km/hour means that you need 77.8 meters, but at 120 km/hour you need 133.3 meters. So the effect of speed limitation is that the road can contain a lot more cars : 12.8 per kilometer at 70 km/hour, compared to only 7.5 per kilometer at 120 km/hour.
So either lowering the number of cars or limiting the speed leads to the same consequence : it prevents saturation of the roads. However, from the moment on that the road is saturated, the same traffic jam misery will start again.

iii) A third solution would be to lower the distances between the cars without changing the speed. Sure, there would be a security problem, unless everybody becomes an extreme alert driver (like the formula 1 people). A (future ?) solution is electronics. We can easily imagine a device with sensors to keep automatically a minimum distance. In stead of our automatic cruise control, we could switch to automatic distance control : This already exists ! I remember there has been an experiment like this with trucks, with only one driver in the first one and the other trucks simply automatically follow everything the first did, just a few meters separated from one another. Here is a more recent article on a similar subject.

And what about the “mystery of traffic jams” or “phantom traffic jams” ?
This is not really a mystery or a phantom, it’s just the result of a saturation of the road and the behaviour of the drivers.

Anyway : the best way not to get stuck in traffic jams is to stay at home !

Reblog this post [with Zemanta]
Foltergeräte
Image via Wikipedia

Ben Goertzel tweeted the following 3 tweets today :

  • Option A: you are tortured (with no permanent damage) and then the memory of the torture is erased.
  • Option B: you are not tortured and then a false memory of torture is programmed into your brain.
  • Which do you choose, A or B?

No funny thoughts, rather one of those choices you really prefer to never have to make. But if YOU had to chose, which one would it be ? A or B ? Let me know please !

His first 9 responses were : A : 7, B : 3
My own response : B (no actual pain) but afterwards I would go to let myself hypnothize to remove the awful memories ! 🙂

Makes me wonder : After the facts : what is real ? The memories you have seem to be real, but if there is a way to put memories there without having experienced the real situation, for you there memories correspond to the real situation.
I am sure there are ways to put memories in someone’s head ! A tough interrogation may result in the subject actually believing he was there, he saw this or that or he actually did it ! (see these articles : (1) (2) (3)

Reblog this post [with Zemanta]
Posted by: zyxo | November 1, 2009

Link list for october 2009

The Google Technology Stack
Make your web data dissapear with vanish
Scientists develop nasal spray that improves memory
where are all the robots ?
computer program sketches faces of criminals
University of Southampton scientists develop computer telepathy(youtube)
Talk of Ben Goertzel on the singularity summit 2009
minimizing complexity in user interfaces
how to demo twitter
did dragons exist ?
did dragons exist (II)

New ‘consumer-intelligence’ technology will compile detailed profiles

the origin of new genes
The Number of Parallel Universes
The Past 5,000 Years Mark a New Epoch in Human Evolution
evolution in a bottle
nearby future view of real artificial intelligence
is the Higgs boson sabotaging its own discovery ?
if Matrix was programmed on windows XP
head or tails : not 50-50
Neuroimaging Of Brain Shows Who Spoke To A Person And What Was Said
The weirdest clouds you’ll ever see !
IBM’s twitter strategy
a robot skiing downhill
5 mind-blowing webstats you should know
what will the web look like in 5 years ?
muscle-based PC-interface

Enjoyed this post ? Then you might be interested in the following :
link list for september 2009
link list for august 2009
link list for juny 2009
link list for may 2009

Posted by: zyxo | October 30, 2009

Where is your soul ?

connectedbrains
Where is your soul located ?

(my working synonyms of soul : self, consciousness, spirit, identity).

First, and obvious answer : in your head.
According to Douglas Hofstadter in “I’m a strange loop” this is not entirely true.

Explanation :
1. what is my soul : a whole bunch of patterns in my brain (linked, hierarchical “thoughts”, patterns representing concepts). One of these patterns is special, because it groups everything that relates to “me”.
2. not every brain pattern that relates to me is in my own head. A whole lot is in the heads of my friends, my family. Although not so vast as the one in my own head.
If the sum of everything that relates to me is my soul, then I am distributed over the heads of a lot of people.

Does this sound a bit crazy ?
After all, I only have one mind, and everything about me that is in the mind of somebody else is not “me” but is what that other person thinks and knows about me.
So that is what I thought before I read the book.

But let us do a hypothetical experiment. (Douglas Hofstadter describes some experiments like that in his book, but this one here is my own).

Imagine one brain to start with, with twice the number of neurons of a normal brain.
Imagine we can manipulate physically each neuron as we like.
Imagine we take at random every second neuron and put it in a second (empty) head. When we finish, half of the neurons will be where they originally were, namely in the first head. The other half will be in the second head.
Imagine we left the original neuron-neuron connections intact, meaning that we replaced every “broken” connection by an artificial equivalent wireless connection.

The result :
fysically (or rather “locally”) we have two brains, each in it’s own head. Let us call them Adam and Eve.
functionally, they are still the same original superbrain because all neurons and connections are unchanged. In fact, we now have one brain with two bodies. What would this be like ? I assume that brain will control the two bodies, just like you and me control our two hands. Consequently there will be only one “me” (named AdamEve).

Now assume that some of the wireless connections are broken, or of lousy capacity, so that only part of the info is passed on from Adam’s neurons to Eve’s neurons and vice versa.
This means that all thoughts, concepts etc. formed by only Adam’s neurons will be stronger, and clearer in Adam’s head thatn in Eve’s and vice versa.
Result : the “shared” identity AdamEve will be weaker. In the same time two separate identities will probably develop : Adam and Eve.

Now suppose all wireless connections are replaced by words, sounds, expressions, gestures, emails, writings or whatever people in our real world use to communicate.
Result : the shared identity is very, very weak whereas the separated identities are very strong.
We all know such shared identities : a married couple, a football team, an army, a religion

But this is not what Hofstadter writes ! In stead of talking about shared identities, he speaks of pieces of identities that are scattered over the minds of many people. Or if we only consider two people : the two separated identities live in the two heads.

Let us recapitulate :
if there is no connection there are two completely separated identities.
If the “between-people” connection is equally strong as the “intra-people” connection (as in my split-brain thought experiment), according to Hofstadter we have two separate identities, living equally strong in both heads. According to me, we have only one shared identity and no separate identities.
If the “between-people” connection is weaker than the “intra-people” connection, according to Hofstadter we have two separated identities each living in two heads, but the one living in its own head is stronger than the one living in the others head. According to me we have two separated identities plus one, weaker, shared identity.

Enjoyed this post ? Then you might be interested in the following :
– Web 5.0: The telepathic web
– Robotic insects or cyber-insects ?
Psychons : elementary particles of the mind
– Human brain copy protection by AnyMind Inc.
– Humans 2.0

Reblog this post [with Zemanta]
Posted by: zyxo | October 26, 2009

Making hidden patterns visible

Data mining and other forms of analytics have one primary goal : making invisible paterns visible. The information is in the data, but it is invisible for you if you are not a “Homo analyticus”.
Some examples :

If this seems somewhat mysterious for you there is a simple way to make this all visible. Although it is not really data mining, it is a somewhat funny way of showing what it means to see “hidden patterns made visible

Reblog this post [with Zemanta]
Posted by: zyxo | October 19, 2009

Human evolution : amazingly fast !

Who said there that humans stopped evolving because their technical means took over the necessity ?

blue eyes

blue eyes

Apparently this is not true !

It seems that on the contrary since humans started spreading over the entire world some 50,000 years ago their evolution speed shifted to a higher gear.

Some examples of evolution that took place the last 50,000 years :

  • human skull structures of various ethnic groups evolved in different directions (remember we all came out of Africa, looking more or less alike)
  • people living in the Tibet mountains have a special gene which causes the oxigen level in the blood to increas with 10%
  • Scandinavian people have blue eyes : no blue human eye existed before the last 10,000 years
  • sub-saharan Africans developed already 25 genes protecting them against malaria, a disease that is only 35,000 years itself
  • a gene that enables men to digest lactose (a milk sugar) is 8,000 years old and only came into existence after people began to keep cows
  • and a lot of others

Why this speeding up ?

As I wrote earlier in order to have evolution you need i) diversity, ii)(re)production and iii)selection by the environment.
All three of them are present in our last millenia, so there is no reason why there should not be any evolution of the human race.

But there is something remarkable in our recent history : we came out of Africa and conquered the whole planet which means that :

  • number of people was considerably growing , and consequently also our diversity
  • we started to live not only in environments very different from the african plains, but also from eachother, from the ice sheet of greenland to the Sahara desert, to our modern Manhattan

So no wonder that this incredible shift in environments and density caused an incredible speeding up of our evolution. At least 7% of our genome has mutated recently (in the last 40,000 years).

Did you enjoy this post ? Then you might be interested in the following :
top-10 lists on evolution
The pope believes in evolution
Human evolution : the future of men
Evolution towards Intelligent Design
The end of evolution

Reblog this post [with Zemanta]
Posted by: zyxo | October 7, 2009

Web 5.0 : computer telepathy ?

“Telepathy on the Horizon: New Interface Allows Brain-to-Brain Communication”



Is that so ?

I thought not.

First of all, if you did not read about it or saw the video, it is a good time to do it now : (article, video)

What did they do ?

They connected brain A (a person who was thinking ‘lift left arm, lift right arm …’ to represent zero’s and one’s) to an EEG transmitter and then to an PC (pc1). This PC1 was connected via the internet to another PC2 which interpreted the transmitted brain patterns to ‘on’ or ‘off’ signals and used them to flash a light. The second person (brain B) saw the light, was also connected to an EEG transmitter and then to a PC3. This PC3 interpreted the brain B patterns to reproduce the original zero’s and ones.

OK, good technology, but definitely not telepathy.

Because telepathy is “transferring knowledge (understanded information) from one person’s brain to another person’s brain without using the normal means (gestures, speech, writing … to send and our five senses to receive). In the experiment the second person was not even aware of the information. He only saw the light flashing.
In his setup Dr. Christopher James at the University of Southampton has only used one direction of communication : “exporting” a meaningful pattern from the brain. He did this twice, once on both sides of the communication.
These one-directional computer-brain-interfaces are around several years now.

Real telepathy.

For real telepathy you should also be able to do it the other way around : put information back into someone’s brain without using this person’s senses. And that’s the tricky part. I am not aware of any experiment that managed to do such a thing.

Enjoyed this post ? Then you might be interested in the following :
– Web 5.0: The telepathic web
– Robotic insects or cyber-insects ?
– Self reassembling Robot
– Human brain copy protection by AnyMind Inc.
– Humans 2.0

Posted by: zyxo | October 2, 2009

Link list for september 2009

Here is my link list for september 2009.
Enjoy reading (but don’t forget to take a look at my own writings too :-))

Evolution of Darwin’s “on the origin of species”
Eleven carbon removal projects to stop global warming
Sustainable fertilizer : urine and wood ash, to grow big tomatoes.
Cities are organized like human brains
10 Awesome Websites That Help You Discover the Best Web Apps
the computer/lawyer
Men Losing Their Minds Over Women
Human brain could be replicated in 10 years
Memories exist, even when forgotten
The Hottest Tweets on Twaxed.com
Viral video about social media revolution
did you know 4.0
Wall Street’s Math Wizards Forgot a Few Variables
Scoring with Social Media: 6 Tips for Using Analytics
The most important writing lesson :”Nobody wants to read your shit”
cracking the brain’s numerical code
Schrodinger’s cat experiment proposed
First online under water observatory
Website analysis : internal search site analysis
computer ants to fight computer worms
information is beautiful

Posted by: zyxo | September 29, 2009

Data Mining : What is a good lift ?

… ever modeled the lift of a targeting model ?

In a previous post “Data mining for marketing campagns : interpretation of lift” I discussed the factors that influence de lift of a targeting model. Apart from the quality of the model, the lift is theoretically also influenced by
– the natural return = normal percentage of buyers among your customers during a specific period
– the size of your selection in % of the customer base

As a reaction to my post, Tim Manss, in his post I’ll show you mine if you show me yours… proposed to exchange lift figures in order to be able to have something of a benchmark to check the quality of targeting models.
It is indeed not easy to get these figures, because everybody wants to keep his or her secrets … well, secret.

So I decided to give away at least some info about the lift of my targeting models by calculating a model predicting their lift.

Here is what I did :

I took the lift figures of my models (a handful of dozens of them) together with the natural return and 4 different selection sizes : 10%, 5% 1% and 0.5%
And with this simple dataset I calculated a linear regression (I actually used the logarithms of these data).

What turned out ?

– There was of cause a lot of noise : R-squared = 0.45 which means that more than half of the variance is unexplained noise. Which also means that different targets have different predictability.
– the natural return showed no statistical significant meaning
– so the only relevant predictor is the selection size.

Here is the equation and the corresponding chart (lift=ordinate, selection size=axis)

ln(lift) = 3.06291 – 0.4829 * ln(selection_size)
Lift as a function of the selection size

Lift (vertical) as a function of the selection size (horizontal)

So, I showed mine… what about yours ? 🙂

Other posts you might enjoy reading :
data mining with decision trees : what they never tell you
The top-10 data mining mistakes
Good enough / data quality
Data mining for marketing campaigns : interpretation of lift
Are you a good data miner ?

Two interesting articles of Gregory Piatetsky-Shapiro (KDnuggets) on lift modeling :
Measuring lift quality in database marketing
Estimating campaign benefits and modeling lift

Posted by: zyxo | September 19, 2009

Are you a good data miner ?

Tough question. What is a good data miner ?

One way of finding out is to look at the job descriptions, for example this one : Credit Suisse Data Miner Job Description

M.George distinguished five areas of expertise necessary to be a good data miner :

  1. techniques : to be able to do it
  2. analytics : to be able to decide what and how to do it
  3. business : to understand your customers
  4. communication : make your findings clear to others
  5. project management : manage everything and everyone from start to end

But all that still stays a bit abstract.
In what follows I will try to be somewhat more to the point.


Let us start with the data
.

You have to be a bit of a detective just to find your data. Find the people who know where the data is, Find out how you can access the data. Find out who can give you access rights to the data, Find out the corresponding key variables to join the various tables into one flatfile …

Then you have to be a programmer to put all that info to use : sql, sas, BI tools, R, whatever not only to get your raw data, but also to get them usable : what to do with missing values ? which derived variables wil you calculate ? etc…
A lot of technical skills needed.

But there is not only the data, there is also the problem to solve. So you need to be an analyst.

As an analyst you have to make decisions about doing the things right and doing the right things :

  • take a step backwards, know where to start, where to stop
  • question everything : allways ask yourself where you are wrong, not good enough, to complicated, not efficient enough, …
  • question everything : when they ask for numbers, ask them to explain their problem and how these numbers will solve it. Propose better, cheaper, nicer solutions …

And now comes the fun part : you have to be a number cruncher
You love data, charts, statistics (not the theory, but what you can do with it). You love to explain to people why something happens, to show them relationships between numbers, the conclusions that you derive from your numbers …
You know the data mining techniques, the statistical techniques and what you can and cannot do with them, their advantages and drawbacks, how to interprete the results, how to present the results in an uderstandable way (remember :
the others are stupid and lazy,
so you have to make it simple and easy

).

Unfortunately there is also the business (profits, costs, ROI …)
They expect you to deliver usable results in a short time. An accountant must deliver numbers that are correct, a data miner is lucky : nothing has to be absolutely correct. When it is good enough, deliver ! (Think “Microsoft software quality” !).
They sometimes say : a data mining model is never finished, only the data miner stopped working on it. This is very true, so keep that in mind and know when to stop and deliver !

Of cause every data mining project is, wel … a project. So you have to be a project manager too.
As a project is per definition something with a start and an end, you should have somewhere a description (accepted by all involved parties) of “WHEN CAN YOU CONSIDER THE PROJECT AS FINISHED”. This description is the only thing you need, because it has to contain all the conditions that have to be fulfilled (goals, deliverables, quality metrics …).

What helps you to deliver more quickly is to stick on the following rule : do the same thing twice, but never do them three times. This means that for anything you will have to do more than two times you should find a solution to get it done automatically : write a program, download a program, write an excel macro, anything.
this means you also have to be a bit of a software engineer !
This automatisation/industrialisation holds for anything : data extraction, modelling, model result reporting, monitoring of your model quality, monitoring of the data quality etc …

And last but not least : you have to be a learner.
Never think you know it all, allways look for new ways, read articles, go to symposia, find out how ohters do it, look for ways to deliver as much quantity and quality as possible whithout working too much 🙂

Other posts you might enjoy reading :
Oversampling or undersampling ?
data mining with decision trees : what they never tell you
The top-10 data mining mistakes
Good enough / data quality
Data mining for marketing campaigns : interpretation of lift

Posted by: zyxo | September 6, 2009

Data mining : use a gel to obtain ROC and LIFT

… or do they talk about another ROC and another LIFT ?

roclift

Other posts you might enjoy reading :
Data mining for marketing campaigns : interpretation of lift
Howmany inputs do data miners need ?
Oversampling or undersampling ?
data mining with decision trees : what they never tell you
The top-10 data mining mistakes

Reblog this post [with Zemanta]

« Newer Posts - Older Posts »

Categories

Design a site like this with WordPress.com
Get started