Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

K-MEANS MULTI-THREADED CLUSTERING ALGORITHM WITH SOURCE CODE BY JCONWELL

Last post 09-30-2005, 16:03 by jconwell. 0 replies.
Sort Posts: Previous Next
  •  09-30-2005, 16:03 6051

    K-MEANS MULTI-THREADED CLUSTERING ALGORITHM WITH SOURCE CODE BY JCONWELL

     I've takes the K-Means clustering code and changed it around a bit to be more optimized.  Changes I added include:

    • Make the clustering calculations run on multiple threads.  The multi-threaded clustering makes the clustering run anywhere from 40% - 60% faster depending on how many vectors your clustering and how many dimensions each vector contains.  I've found that one thread per processor (two per physical processor if running with hyper-threaded procs) was optimal.  Basically I divide the vectors by the number of threads its gona run against, then kick off each thread to go figure out which cluster each vector that thread is responsible for belongs to.  When the thread is finished, it returns an array of cluster indexes, which it uses to put the vector in the correct new cluster. 
    • I also changed the multi-dimensional arrays to be jagged arrays.  This helps with performance as well as cleaning up the code a bit because you don’t have to create a new double[] and copy the values from main vector array.  You can just pass the nth instance of the vector in the jagged array.  Also, the CLR has optimization built into to work with straight arrays, but not multi-dimensional arrays.  So jagged arrays (regular array of regular arrays) can take advantage of these optimizations.
    • I've also made the unit test class easier to create different sized vectors with different dimensions.

    John (Turbo)

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed