Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

Is this the correct way to z-scoring?

Last post 07-08-2008, 16:50 by TimManns. 4 replies.
Sort Posts: Previous Next
  •  06-27-2008, 3:04 8078

    Is this the correct way to z-scoring?


    Please let me know if this is the right way to z-scoring?

     ITEM_PROFIT / ( @GLOBAL_MAX(ITEM_PROFIT) -@GLOBAL_MIN(ITEM_PROFIT))


    but how can I arrange the Set Global Node so it run automatically each time I do the above transformation in Derive Node?

    I am doing this for preparing data for clustering. So another question is some of item profit is minus (- 0.5 for example, comparing to average +5$ profit), is this a problem?

     

    Many thanks.

     

  •  06-27-2008, 3:22 8079 in reply to 8078

    Re: Is this the correct way to z-scoring?

    Is there any better way if I want to remove the maximum one record in Select Node? Now I use discard

    ITEMP_PROFIT=@MAX(ITEM_PROFIT) and this will remove multiple rows (if bigger values are found)

    and I don't know how to automate enable Set Global node to produce @GLOBAL_MAX  with my varied sample inputs from stream.

  •  06-30-2008, 15:07 8088 in reply to 8079

    Re: Is this the correct way to z-scoring?

    Attachment: hunterdong2.zip

    Hello,

    A lot of the questions you have been asking in the forum are covered in training courses (and partly the User Guide), and I'd recommend attending a 2 or 3 day course on data manipulation because the majority of data mining involves transforming the data.

    To do this particular type of analysis you need to use the aggregate node and a merge node in order to join back with with original data source.  See the attached zip file for an example.  This type of analysis will push back as SQL on a large database.

    Cheers

    Tim

  •  07-03-2008, 19:45 8101 in reply to 8078

    Re: Is this the correct way to z-scoring?

    hunterdong:



    but how can I arrange the Set Global Node so it run automatically each time I do the above transformation in Derive Node?

     

    Perhaps you could use the script which contain a command to execute the Set Global Node before executing other executable node that involves the Derive Node

  •  07-08-2008, 16:50 8109 in reply to 8101

    Re: Is this the correct way to z-scoring?

    Using the set globals node will not pushback to a database as SQL, so is not an option if you want efficient processing.  It also requires executing the stream multiple times.  he result will be static, so if your dta changes the 'maximum' value will not unless you re-run the Set Globals nodes (sometimes this ia a good thing, but that's for another post :) ).  The Set Globals method to acquire maximum and minimum values (or means etc) is fine for small datasets, but I recommend using the aggregate where possible.

    The example I provided in my previous post will work for large scale databases (especially where the aggregated key field is indexed), is dyamanic, and requires just one stream execution.  I daily use this method on many millions of rows, for example to rank data based upon specific date ranges.

    Cheers

    Tim

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed