Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Business Analyst Training
Live, Online, Video Courses
Instructor-Led + Hands-On
BusinessAnalystBootCamp.Com

SQL + Database Training
Live, Online, Video Classes
Instructor-Led + Hands-On
SQLBootCamp.Com

Software Developer Training
Live, Online, Video Courses
Instructor-Led + Hands-On
SoftwareDevelperBootCamp.Com

IT CAREER COACH
Hands-On Experience Coaching
IT Skills Training
IT-Career-Coach.NET

IT Professional Newsletter
"Free" IT Career Success Tips
How To Accelerate Your Career
IT Career Newsletter

Ask IT Career Questions
"ASK" A Burning IT Career
Question Or Get Answers
Ask A Burning IT Question Now!

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

EUCLIDEAN DISTANCE IN K-MEANS CLUSTERING : DATA MINING TUTORIAL

Last post 02-02-2005, 19:00 by Kingsley Tagbo. 0 replies.
Sort Posts: Previous Next
  •  02-02-2005, 19:00 3993

    EUCLIDEAN DISTANCE IN K-MEANS CLUSTERING : DATA MINING TUTORIAL

    The Euclidean distance between two points/objects/items in a dataset, defined by point X and point Y is defined by Equation 1A below.
    Equation 1A
    EUCLIDEAN
    DISTANCE(X,Y) = ( |X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )
    1/2

    where |Z| represents the absolute value of Z, X is the first data point, Y is the second data point, N is the number of characteristics or attributes in data mining terminology or fields in database terminology and EUCLIDEAN DISTANCE(X,Y) is the distance between data point X and data point Y using a mathematical calculation known as the EUCLIDEAN DISTANCE.

    Equation 1A defines the Euclidean distance between two rows of data or two points/items/objects in a dataset/database or in space, where each datapoint has N attributes or N Fields (an attribute or field is a characteristic of the item, e.g. a datapoint could define a person in a database where the attributes or fields of the datapoint are Age, Height, Weight, Income).

    The first data point in Equation 1A above is represented by X and the other datapoint by Y.

    Exampe 1A:
    Find the Euclidean distance between two datapoints named John and Henry in a dataset of people, where each person is defined by 3 attributes or fields; Age, Height, Weight. The data points are defined as:

    John
    Age = 20, Height = 170, Weight = 80

    Henry
    Age = 30, Height = 160, Weight = 120


    EUCLIDEAN DISTANCE(John, Henry) = ( |X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )
    1/2

    Given that N represents the number of attributes which is 3 (Age, Height and Weight) and that X represents the first datapoint John and Y represents the second datapoint Henry, then

    EUCLIDEAN DISTANCE(John, Henry) =
    ( |X1-Y1|2 + |X2-Y2|2 + ... + |X3-Y3|2 )1/2

    If X1,Y1 = Age,  X2,Y2 = Height and X3,Y3 = Weight then
     
    EUCLIDEAN DISTANCE(John, Henry) =
    ( |20-30|2 + |170-160|2 + ... + |80-120|2 )1/2

    = 42.46


    Sign-up For Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed