June 2008 - Posts

30 June 2008
Multi-lable Classification

hi,Smile

I am developing application for multi-lable classification, Pl help me. suggest good algorithms for it. If possible provide source code also

my emai id : krutikvyas@yahoo.com

27 June 2008
Is this the correct way to z-scoring?


Please let me know if this is the right way to z-scoring?

 ITEM_PROFIT / ( @GLOBAL_MAX(ITEM_PROFIT) -@GLOBAL_MIN(ITEM_PROFIT))


but how can I arrange the Set Global Node so it run automatically each time I do the above transformation in Derive Node?

I am doing this for preparing data for clustering. So another question is some of item profit is minus (- 0.5 for example, comparing to average +5$ profit), is this a problem?

 

Many thanks.

 

26 June 2008
huge database

Hi,

 If I got a (purchasing transaction) table with more than 100 million rows and 50 columns, how am I supposed to explore the data?

Now I am using sampling like 500K rows from begining, otherwise I can't do anything like GRI, K-Means because I don't know how long that will take..

Any ideas?

Thanks,

26 June 2008
How to detect DUPLICATED records

Urgent question seeking help

How to detect DUPLICATED records and output them?

How to find if a variable is fully $null$? (recognized as typeless so can't use Data Audit node)

26 June 2008
how can we perform spatial data mining in Clementine 11.1????

How can we perform spatial data mining in Clementine 11.1???

and also please help me to understand the use of dimension files.......How many dimensions can Clementine support???

 Thankyou!

25 June 2008
how to enumerate all fields (and detect if they contain negative numberic information)

Hi,

 Will I be able to detect and output all the fields which contains negative value (<0)?

 

For example,


A B    C    D  E
1  1    txt  7   -1
7  -1   txt  0   5
5  2    txt  5   7

Can I use expression to output B and E are detected?

 Is that something including @field+1 or @Next(field)? Where can I find documenation about this?

 Any ideas appreciated.

 

Regards,

 Chris

24 June 2008
Clementine v10 Error - X4001

Hi

While performing memory-intensive operations in Clementine v10 (e.g merging millions of records from multiple sources or diplaying a 1000 record table from a large data file etc.), an error occurs, with the following error message appearing: "X4001". Unusually for Clementine, there is no further information about the error, although it seems to be memory related.

The log file entry for the error is as follows:

 2008/06/24 15:39:17 [2716-2716]: X4001: XMemory

I tried increasing the Memory usage multiplier in the options.cfg file to 80% and then 100% of physical memory, and adding extra RAM, but the issue still persists. The stand-alone computer should be able to handle it - its an Intel Pentium 4 CPU 3.06GHz, 3GB of RAM.

Can anyone help with a resolution please?

Ronan

24 June 2008
Require good case studies for cluster analysis.

Hi,Smile

 I am so much relived after seeing such a strong and knowledgfull community of data miners!!!!

am a beginer level data miner working in India.

Iam not able to interpret the use of CARMA algorithm.If any one can post relevant case studies with the explanation of data set it will be of great help and also please help me in getting some good case studies for cluster analysis.

If possible post the stream :)

Thanks in advance.

22 June 2008
how to transform transaction-per-line data to feed into Apriori

Hi Dear All,

 I want to use Clementine to find association rules for data such as the retail market basket data from an retail store. However it is in a variable-length file with each line contains all items within a transaction separated by space.

http://fimi.cs.helsinki.fi/data/

 How can I use Clementine (or Excel) to prepare it for the Apriori node which support either one item+TransactionID per line or truth table?

 

Any ideas appreciated.

Thanks,

 

Chris

20 June 2008
HLP: Visual Basic Data Mining with Decision Trees Source Code
Hello
I'm new here, I was searching for a C4.5 or an ID3 implementation in C#, and after a bit of searching around I found out the visual basic data mining net is not available so if anyone has downloaded the code before or has a link to it could anyone point me in the right direction.
Thank you.
18 June 2008
Filling values

Hi

I am using clementine 11.1, the problem is filling the values from the prevoius records

X by filler node result
200 200 200
0 200 200
0 0 200
0 0 200
0 0 200
300 300 300
250 250 250
0 250 250
0 0 250
0 0 250
0 0 250
0 0 250

The original x field contains 0 values which has to be replaced by previous values filled.

If i use the filler node only one record gets filled, the filled record is not getting filled for the next record based on the condition that x = 0

But i am not getting the result as shown in the result column.

Can any one helpme regarding in this issue.

Thanks

10 June 2008
I need Apriori algorithm, FP-Tree algorithm, Patricia tree algorithm and CFP-Tree algorithm source code in C#

hi,

I need Apriori algorithm, FP-Tree algorithm, Patricia tree algorithm and CFP-Tree algorithm source code in C#

Could anyone please send it to me? It's urgent.

e-mail: min9709@gmail.com

09 June 2008
URGENT HELP PLEASE

hi..all

am makin a research in data mining

am new in this field i want to know needed software to use to compare between algorithms of classification

navie ,cba,c 4.5,cba,boosted,bagged,sjep,nep,gnep

because i work on jumping emerging patterns

how and where to get algorithms and apply on different datasets

i want to know in detail the procedure of working on this algorithms practically

god bless all of u for help

07 June 2008
C# simple Apriori code

 this is a simple C# code, please take a look at it and if you have any comments or any updates feed it back to me. 

 

using System;
using System.Collections.Generic;
using System.Text;
using System.Collections;
using System.IO;


class Program
{
    static void Main(string[] args)
    {
        string file = @"big_test.csv";
        string sup = "6";
       
        if (args.Length > 0)
        {
            file = args[0];

        }
        if (args.Length == 2)
        {
            sup = args[1];

        }


        double support = double.Parse(sup);

        CSVReader cr = new CSVReader();
        ItemSet data = cr.Read(file);


       

        Program p = new Program();
        ItemSet a = p.apriori(data, support);
        for (int i = 0; i < a.Count; i++)
        {
            ItemSet cur = (ItemSet)a.arrIdea;
            for (int j = 0; j < cur.Count; j++)
            {
                ItemSet now = (ItemSet)cur.arr[j];
                for(int k=0;k<now.Count;k++)
                {

                    Console.WriteLine("ID : " + ((DataItem)now.arr[k]).Id + ",the value is :(" + ((DataItem)now.arr[k]).ItemName + ")  ");

                }
                Console.WriteLine("  Number of apperances:" + now.ICount);
            }

        }

        Console.Read();
    }

    private void RuleG(ItemSet a)
    {
    }

    private ItemSet FindOneColSet(ItemSet data, double support)
    {
        ItemSet cur = null;
        ItemSet result = new ItemSet();

        ItemSet set = null;
        ItemSet newset = null;
        DataItem cd = null;
        DataItem td = null;
        bool flag = true;
       
        for (int i = 0; i < data.Count; i++)
        {
            cur = (ItemSet)data.arrIdea;
            for (int j = 0; j < cur.Count; j++)
            {
                cd = (DataItem)cur.arr[j];
                for (int n = 0; n < result.Count; n++)
                {
                   
                    set = (ItemSet)result.arrNo;
                    td = (DataItem)set.arr[0];
                    if (cd.Id == td.Id)
                    {
                        set.ICount++;
                        flag = false;
                        break;

                    }
                    flag = true;
                }
                if (flag)
                {
                    newset = new ItemSet();
                    newset.Add(cd);
                    result.Add(newset);
                    newset.ICount = 1;
                }
            }

 

        }
        ItemSet finalResult = new ItemSet();
        for (int i = 0; i < result.Count; i++)
        {
            ItemSet con = (ItemSet)result.arrIdea;
            if (con.ICount >= support)
            {

                finalResult.Add(con);
            }


        }
        //finalResult.Sort();   
        return finalResult;


    }


    private ItemSet apriori(ItemSet data, double support)
    {

        ItemSet result = new ItemSet();
        ItemSet li = new ItemSet();
        ItemSet conList = new ItemSet();
        ItemSet subConList = new ItemSet();
        ItemSet subDataList = new ItemSet();
        ItemSet CurList = null;
        ItemSet subList = null;
        int k = 2;
        li.Add(new ItemSet());
        li.Add(this.FindOneColSet(data, support));

        while (((ItemSet)li.arr[k - 1]).Count != 0)
        {
            Console.WriteLine(k - 1);
            conList = AprioriGenerate((ItemSet)li.arr[k - 1], k - 1, support);
            for (int i = 0; i < data.Count; i++)
            {
                subDataList = SubSet((ItemSet)data.arrIdea, k);
                for (int j = 0; j < subDataList.Count; j++)
                {
                    subList = (ItemSet)subDataList.arr[j];
                    for (int n = 0; n < conList.Count; n++)
                    {
                        ((ItemSet)subDataList.arr[j]).Sort();
                        ((ItemSet)conList.arrNo).Sort();
                        CurList = (ItemSet)conList.arrNo;
                        if (subList.Equals(CurList))
                        {
                            ((ItemSet)conList.arrNo).ICount++;

                        }
                    }

                }

            }

            li.Add(new ItemSet());
            for (int i = 0; i < conList.Count; i++)
            {
                ItemSet con = (ItemSet)conList.arrIdea;
                if (con.ICount >= support)
                {

                    ((ItemSet)li.arr[k]).Add(con);
                }


            }

            k++;
        }
        //for (int j = 0; j < li.Count; j++)
        //{
        //    for (int h = 0; h < li.Count; h++)
        //    {
        //        if (((ItemSet)li.arr[j]).Equals((ItemSet)li.arrCool))
        //        {
        //            li.arr.RemoveAt(j);
        //            li.Count = li.arr.Count;
        //        }
        //    }
        //}
        for (int i = 0; i < li.Count; i++)
        {
            result.Add((ItemSet)li.arrIdea);
        }
        return result;

 

    }

    private ItemSet AprioriGenerate(ItemSet li, int k, double support)
    {

        ItemSet curList = null;
        ItemSet durList = null;
        ItemSet candi = null;
        ItemSet result = new ItemSet();
        for (int i = 0; i < li.Count; i++)
        {
            for (int j = 0; j < li.Count; j++)
            {
                bool flag = true;
                curList = (ItemSet)li.arrIdea;
                durList = (ItemSet)li.arr[j];
                for (int n = 2; n < k; n++)
                {

                    if (((DataItem)curList.arr[n - 2]).Id == ((DataItem)durList.arr[n - 2]).Id)
                    {

                        flag = true;

                    }
                    else
                    {
                       
                        flag = false;
                        break;


                    }


                }

                if (flag && ((DataItem)curList.arr[k - 1]).Id < ((DataItem)durList.arr[k - 1]).Id)
                {

                    flag = true;
                }
                else
                {
                    flag = false;
                }
                if (flag)
                {
                    candi = new ItemSet();


                    for (int m = 0; m < k; m++)
                    {
                        candi.Add((DataItem)durList.arr[m]);

                    }
                    candi.Add((DataItem)curList.arr[k - 1]);

 

 

                    if (HasInFrequentSubset(candi, li, k))
                    {
                        candi.Clear();

                    }
                    else
                    {
                        result.Add(candi);
                    }
                }

            }
        }
        return result;

    }

 

    private bool HasInFrequentSubset(ItemSet candidate, ItemSet li, int k)
    {
        ItemSet subSet = SubSet(candidate, k);
        ItemSet curList = null;
        ItemSet liCurList = null;

        for (int i = 0; i < subSet.Count; i++)
        {
            curList = (ItemSet)subSet.arrIdea;
            for (int j = 0; j < li.Count; j++)
            {

                liCurList = (ItemSet)li.arr[j];
                if (liCurList.Equals(curList))
                {
                    return false;

                }

            }
        }
        return true; ;
    }
    //????   
    private ItemSet SubSet(ItemSet set)
    {
        ItemSet subSet = new ItemSet();

        ItemSet itemSet = new ItemSet();
        //???2n??   
        int num = 1 << set.Count;

        int bit;
        int mask = 0; ;
        for (int i = 0; i < num; i++)
        {
            itemSet = new ItemSet();
            for (int j = 0; j < set.Count; j++)
            {
                //mask?i??????????   
                mask = 1 << j;
                bit = i & mask;
                if (bit > 0)
                {

                    itemSet.Add((ItemSet)set.arr[j]);

                }
            }
            if (itemSet.Count > 0)
            {
                subSet.Add(itemSet);
            }


        }

 

        return subSet;
    }

 

    //????   
    private ItemSet SubSet(ItemSet set, int t)
    {
        ItemSet subSet = new ItemSet();

        ItemSet itemSet = new ItemSet();
        //???2n??   
        int num = 1 << set.Count;

        int bit;
        int mask = 0; ;
        for (int i = 0; i < num; i++)
        {
            itemSet = new ItemSet();
            for (int j = 0; j < set.Count; j++)
            {
                //mask?i??????????   
                mask = 1 << j;
                bit = i & mask;
                if (bit > 0)
                {

                    itemSet.Add((DataItem)set.arr[j]);

                }
            }
            if (itemSet.Count == t)
            {
                subSet.Add(itemSet);
            }


        }

 

        return subSet;
    }
}

public class DataItem
{
    public int Id;
    public string ItemName;
    public void Add(string item,int id)
    {
        ItemName=item;
        Id=id;
    }
}

public class ItemSet
{
    public int Count=0;
    public int ICount=0;
    public ArrayList arr = new ArrayList();
    public void Add(ItemSet input)
    {
        arr.Add(input);
        Count++;
    }
    public void Add(string input)
    {
        arr.Add(input);
        Count++;
    }
    public void Add(DataItem input)
    {
        arr.Add(input);
        Count++;
    }
    public void Sort()
    {
        DataItem temp = null;
        for (int i = 0; i < this.Count-1; i++)
        {
            for (int j = i+1; j < this.Count; j++)
            {
                if (((DataItem)this.arrIdea).Id > ((DataItem)this.arr[j]).Id)
                {
                    temp = (DataItem)this.arrIdea;
                    this.arrIdea = this.arr[j];
                    this.arr[j] = temp;
                }
            }
        }
    }
    public bool Equals(ItemSet input)
    {
        if ((input.arr == null) || !(input.arr.GetType().Equals(this.arr.GetType())))
        {
            return false;
        }
        else if (input.arr.Count != this.arr.Count)
        {
            return false;
        }
        else
        {
            for (int i = 0; i < arr.Count; i++)
            {
                if (((DataItem)arrIdea).ItemName != ((DataItem)input.arrIdea).ItemName)
                    return false;
            }
            return true;
        }
    }
    public void Clear()
    {
        arr.Clear();
        Count = 0;
        ICount = 0;
    }
}

public class CSVReader
{
    public ItemSet Read(string file)
    {
        StreamReader csvfile = new StreamReader(file);
        ItemSet General = new ItemSet();
        ItemSet items =new ItemSet();
        DataItem set = new DataItem();
        string Line="";
        string temp = "";
        int start = 0;
        int id = 0;
        int tcn=0;
        Line = csvfile.ReadLine();
       
        while (!csvfile.EndOfStream)
        {
           
            Line = csvfile.ReadLine();
            tcn++;
            items = new ItemSet();

            while (Line.IndexOf(",") != -1)
            {
                set = new DataItem();
                temp = Line.Substring(0, Line.IndexOf(","));
               
                Line = Line.Substring(Line.IndexOf(",") + 1);
                set.Add(temp,id);
                items.Add(set);
                id++;
                start = Line.IndexOf(",");
            }
            temp = Line;
            set.Add(temp,id);
            items.Add(set);
            id = 0;
            temp = "";
            start = 0;
            General.Add(items);
        }
        //General.Add(items);
        Console.WriteLine(General.Count);
        return General;
    }
}

06 June 2008
Problem in Calculation

Hi

Can any one help in the below calculation as i am using clementine 11.1

The following calculation to be done for *** Cap ( results been shown)

for ex: for 2nd record the value under *** cap 7.5 is done by adding

2nd record of Cap + first record of *** cap

i.e. 3.75 + 3.75 

As ID changes to a new value, fresh summation should be calculated. 

Sl NoIDCap*** Cap  
113.753.75  
213.757.5 "=3.75+3.75"
31411.5 "=4+7.5"
41516.5 "=5+11.75"
513.7520.25 "=3.75+16.5"
613.7524 "=3.75+20.25"
71630 "=6+24"
813.7533.75 "=3.75+30"
923.753.75  
10247.75 "=3.75+4"
1125.2513 "=5.25+7.75"
1223.7516.75 "=3.75+13"
132622.75 "=6+16.75"
142426.75 "=4+22.75"
1523.7530.5 "=3.75+26.75"
162434.5 "=4+30.5"

Thanks

05 June 2008
Implementatio for CLARANS

Hi, Iam currently worling on CLARAN'S &FUZZY CLARAN'S.

If any one has implemented these algorithms in C,C++ OR MATLAB please mail to vinaymajety@yahoo.com.

I will be grateful for the same.

                                               thanking you.

                                                                 
                                                                                 yours sincerely,

                                                                                    vinay.
 

04 June 2008
SVM(Support Vector Machine) with clementine

Hello,
Please I want to ask you if you know how we can try SVM algorithm with Clementine.
If you have a tutorial witch explain the method, I would be grateful if you send it to me.

Thanks a lot.

03 June 2008
Reuired source code of Frequent Episode Mining

Hello,

I am working on my project and I want to use the Frequent episode mining technique. Does any one have the source code of this technique in any programming language, then send me. I will be thankful to you.

 (please send code on: n_4_naveedali@hotmail.com)

Regards,

Naveed.