Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

Unknown data value in symbolic field

Last post 07-28-2008, 19:30 by taka. 3 replies.
Sort Posts: Previous Next
  •  07-10-2008, 6:19 8120

    Unknown data value in symbolic field


    Hi Dear All,

     I am trying to run a Sequence node over about 20,000,000 rows of transaction format data. The ID field (Customer ID) is a unique string so I had to specify as "typeless" (also what the Type node decides after it read data). Time field and content field are all right and value appears in Type node. However I receive this error:

    Unknown data value in symbolic field
    Check the type for this field is correctly instantiated?


    If I reduce the amount by sampling to say 100,000 rows it worked, possibly because the max set size is not exceeded so recognized it as set.

    I have to increase Max Set Size to a crazy figure, set Customer ID as set and <read+> and seems it is working.

    Why CARMA node (reading a typyless Transaction ID) can work without this problem?


    Many Thanks,

  •  07-10-2008, 10:40 8124 in reply to 8120

    Re: Unknown data value in symbolic field

    Looks like I got null value in the CustomerID field which caused the error..

     

    Anyone please recommend a best-practice value of Sequence node for "max sequence in memory" and how many rows should be feed into the Clementine server?

     I choose 100,000,000 as max sequence and 20,000,000 rows however my client became very slow and I had to cancel it.

     

  •  07-24-2008, 12:29 8174 in reply to 8124

    Re: Unknown data value in symbolic field

    Actually it is because content fields exceed default max set size..

     

    Again, anyone can tell me how to practically run association rules over hundreds of millions of records?

  •  07-28-2008, 19:30 8182 in reply to 8174

    Re: Unknown data value in symbolic field


    Try to use Apriori.I did run over sales 170,000,000 transaction data.

    If you want to use Sequence Node, pay attention to expert options and reduce minimum rule support and minimum rule confidence value.
    I do not know your data structure. So I am not ready to point out what should be done before running Sequence node.

    In my experience, Apriori and CARMA work better with running large data.


    Taka

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed