|
|
Subscribe By Email
|
Subscribe By RSS Feed
EUCLIDEAN DISTANCE IN K-MEANS CLUSTERING : DATA MINING TUTORIAL
-
02-02-2005, 19:00 |
-
Kingsley Tagbo
-
-
-
Joined on 10-19-2002
-
Saint Louis, Missouri
-
Diamond Member
-
-
|
EUCLIDEAN DISTANCE IN K-MEANS CLUSTERING : DATA MINING TUTORIAL
The Euclidean distance between two points/objects/items in a dataset, defined by point X and point Y is defined by Equation 1A below. Equation 1A EUCLIDEAN DISTANCE(X,Y) = ( |X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )1/2
where |Z| represents the absolute value of Z, X is the first data point, Y is the second data point, N is the number of characteristics or attributes in data mining terminology or fields in database terminology and EUCLIDEAN DISTANCE(X,Y) is the distance between data point X and data point Y using a mathematical calculation known as the EUCLIDEAN DISTANCE.
Equation 1A defines the Euclidean distance between two rows of data or two points/items/objects in a dataset/database or in space, where each datapoint has N attributes or N Fields (an attribute or field is a characteristic of the item, e.g. a datapoint could define a person in a database where the attributes or fields of the datapoint are Age, Height, Weight, Income).
The first data point in Equation 1A above is represented by X and the other datapoint by Y.
Exampe 1A: Find the Euclidean distance between two datapoints named John and Henry in a dataset of people, where each person is defined by 3 attributes or fields; Age, Height, Weight. The data points are defined as:
John Age = 20, Height = 170, Weight = 80
Henry Age = 30, Height = 160, Weight = 120
EUCLIDEAN DISTANCE(John, Henry) = ( |X1-Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )1/2
Given that N represents the number of attributes which is 3 (Age, Height and Weight) and that X represents the first datapoint John and Y represents the second datapoint Henry, then
EUCLIDEAN DISTANCE(John, Henry) = ( |X1-Y1|2 + |X2-Y2|2 + ... + |X3-Y3|2 )1/2
If X1,Y1 = Age, X2,Y2 = Height and X3,Y3 = Weight then EUCLIDEAN DISTANCE(John, Henry) = ( |20-30|2 + |170-160|2 + ... + |80-120|2 )1/2
= 42.46
|
|
|
|
|