Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

What Is Naive Bayes Algorithm in SQL Server 2005 (Yukon) Analysis Services?

Last post 12-23-2004, 11:35 by Kingsley Tagbo. 2 replies.
Sort Posts: Previous Next
  •  12-20-2004, 20:07 3628

    What Is Naive Bayes Algorithm in SQL Server 2005 (Yukon) Analysis Services?

     Question:


    What Is Naive Bayes Algorithm in SQL Server 2005 (Yukon) Analysis Services?

     Answer:


    Naive Bayes is an algorithm based on statistics and used to estimate the probability of a class value 
    during classification and prediction.

    Naive Bayes implementation in SQL Server 2005 (Yukon) assumes that the input attributes are independent of each other.

    The algorithm classifies or predicts the value of a class relatively quickly based on the probabilities of each distinct value in an attribute.

    Naive Bayes results are easier to understand than Neural Networks for example, because the algorithm results are based on statistics and mathematics and can be explained using probability calculations, while Neural Networks on the other hand cannot.

     Resource(s):

    1. http://www.kdkeys.net/ShowPost.aspx?PostID=2084

    2. http://blog.visual-basic-data- mining.net/archive/2004/03/21/169.aspx



    Sign-up For Data Mining Source Code Newsletter

  •  12-23-2004, 9:50 3654 in reply to 3628

    Re: What Is Naive Bayes Algorithm in SQL Server 2005 (Yukon) Analysis Services?

    Kingsley Tagbo wrote:
    Naive Bayes results are easier to understand than Neural Networks for example, because the algorithm results are based on statistics and mathematics and can be explained using probability calculations, while Neural Networks on the other hand cannot.

    This comment is really pretty silly.  Neural networks are based just as much on statistics and mathematics as any other technique including naive Bayesian methods or logistic regression.

    It isn't even true that the mathematics is all that much more complicated.

    What IS true is that if you use toooo many hidden nodes and over-train your network, you get something that you can't explain very well by simple inspection of the network weights.  The same thing happens in real data-mining problems with almost any linear combination technique or even with decision trees.  Ultimately decisions get made by the system in some context without which the decision making machinery makes no sense.

    ON the other hand, it isn't all that hard to use a decision machine of some kind to find specific examples and then to reverse engineer why a particular decision is made.  If you have a model with a gazillion inputs and internal states, you will have many examples on the borderline and you won't be able to point out exactly what pushed it over the edge, but you still will be miles ahead of just looking at internal weights since you will have the context in which a decision is made.

    The simplest example that shows why context is important is a simple two-variable classifier.  Let's take the following fictitious classifier:

    0.02 * x1 + 1041 * x2 - 21 > 0

    This classifier is completely opaque as it stands.  We can't tell what x1 and x2 mean and we can't tell if 0.02 is a large weight or if 1041 is a small weight.  In the first instance, we need contextual information such as the meaning of the inputs.  In the second instance, we need even more contextual information such as the distribution of the inputs.  With this much information we can at least guess which of the inputs is more significant to the problem at hand.  If we have 20 examples each that were correctly and incorrectly classified, then we can really begin to draw conclusions.

  •  12-23-2004, 11:35 3660 in reply to 3654

    Re: What Is Naive Bayes Algorithm in SQL Server 2005 (Yukon) Analysis Services?

    You are right of course! Neural Networks is based on Mathematics as well! I should update my comment to say that the output from a Neural Network classification exercise could be harder to explain than the output from a Naive Bayes classification exercise.

    I do have some information on Neural Networks at http://www.kdkeys.net/ShowForum.aspx?ForumID=1022

    The explanation of Sigmoid Functions does have some math in it and when I explain how the backpropagation algorithm works, I will include more math as well.

    Thanks for pointing out this issues to us!


    Sign-up For Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed