Kingsley Tagbo wrote:
Naive Bayes results are easier to understand than Neural Networks for example, because the algorithm results are based on statistics and mathematics and can be explained using probability calculations, while Neural Networks on the other hand cannot.
This comment is really pretty silly. Neural networks are based just as much on statistics and mathematics as any other technique including naive Bayesian methods or logistic regression.
It isn't even true that the mathematics is all that much more complicated.
What IS true is that if you use toooo many hidden nodes and over-train your network, you get something that you can't explain very well by simple inspection of the network weights. The same thing happens in real data-mining problems with almost any linear combination technique or even with decision trees. Ultimately decisions get made by the system in some context without which the decision making machinery makes no sense.
ON the other hand, it isn't all that hard to use a decision machine of some kind to find specific examples and then to reverse engineer why a particular decision is made. If you have a model with a gazillion inputs and internal states, you will have many examples on the borderline and you won't be able to point out exactly what pushed it over the edge, but you still will be miles ahead of just looking at internal weights since you will have the context in which a decision is made.
The simplest example that shows why context is important is a simple two-variable classifier. Let's take the following fictitious classifier:
0.02 * x1 + 1041 * x2 - 21 > 0
This classifier is completely opaque as it stands. We can't tell what x1 and x2 mean and we can't tell if 0.02 is a large weight or if 1041 is a small weight. In the first instance, we need contextual information such as the meaning of the inputs. In the second instance, we need even more contextual information such as the distribution of the inputs. With this much information we can at least guess which of the inputs is more significant to the problem at hand. If we have 20 examples each that were correctly and incorrectly classified, then we can really begin to draw conclusions.