Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Business Analyst Training
Live, Online, Video Courses
Instructor-Led + Hands-On
BusinessAnalystBootCamp.Com

SQL + Database Training
Live, Online, Video Classes
Instructor-Led + Hands-On
SQLBootCamp.Com

Software Developer Training
Live, Online, Video Courses
Instructor-Led + Hands-On
SoftwareDevelperBootCamp.Com

IT CAREER COACH
Hands-On Experience Coaching
IT Skills Training
IT-Career-Coach.NET

IT Professional Newsletter
"Free" IT Career Success Tips
How To Accelerate Your Career
IT Career Newsletter

Ask IT Career Questions
"ASK" A Burning IT Career
Question Or Get Answers
Ask A Burning IT Question Now!

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

New to Clementine

Last post 01-26-2010, 21:07 by TimManns. 8 replies.
Sort Posts: Previous Next
  •  01-06-2010, 23:58 9537

    New to Clementine

    I have a question; please don't think it silly.....

    I took the Intro To PASW (Clementine 12.0), which didn't go into much detail about building models. In fact, by the third day of the class, we were a little more than half way through the training materials.

    Is there a source for learning to build models using Clementine other than taking a course from SPSS?

    Does anyone have a model/stream that they've created  that also includes a description of the process they used in creating the model; for instance why the use of this node or that node?

    I will take the modeling course, but I'm in a position where I have to build a model in a matter of two weeks. I feel like the water bird trying to swallow a frog and the frog has it's hands around my throat.

    I'm looking for an example. If you can share, it would be gratefully appreciated.

    Thanks

  •  01-08-2010, 7:36 9543 in reply to 9537

    Re: New to Clementine

    My Friend,

    It sounds like you’re a bitter overwhelmed now.

    Here is my advice for what its worth:

    1.       Your data has to be checked for the standard things, i.e. missing values, bad data, and other problems with the data.  If you do this first, it makes the process a whole lot easier than to find out that your model is screwed up after you find it later.       

    2.       What do you want to find out from your data?  What are you trying to model?  Ask yourself these questions.  Don’t worry about the modeling procedure yet, you have to define the variable or information you want to model.  Write down the model on paper to see if it makes sense?  Example: I want to know why some individuals buy certain products when they received a direct mail piece and why some do not.  My variable here is Yes = 1, they purchased, No=0, they did not.  If you can think through this information that’s half the battle.                         

    3.       It sound like you may not have a lot of modeling in your back ground, don’t take that as a negative, so I would start with a decision tree model.  There easy to understand compared to some of the other models in PASW.  If you understand regression then start there.

    4.       Define the model.

    5.       Last, start reading blogs, i.e. Tim Mann has a very good blog.  Don’t worry if the information is over head, start reading.  It doesn’t come by osmosis.

         Triener

        A raccoon in the wilderness of data.

  •  01-08-2010, 22:11 9544 in reply to 9543

    Re: New to Clementine

    Thanks for your response, I do appreciate it. I will find Tim's blog. I am a database architect and the modeling I do is for the design of databases and warehouses.
  •  01-10-2010, 15:32 9552 in reply to 9544

    Re: New to Clementine

    Welcome!

    I'd recommend looking at the user guide and following the 'drug1n' example, or just opening the 'drug1n' example stream found in the demos directory (when you install Clementine by default there will be a demos directory, but this can be excluded in the install options). 

    It shows a very simple example of building a predictive model to estimate which drug to give heart patients. 

    As mentioned in the previous post, it is very important to consider data transformations such as missing/null values, distribution of values, and aggregations and sums that can help impart some useful information for any predictive model.  The better you can prepare the data, then the easier and more succesful the results are likely to be.  Consider ways to clean the data and add value rather than tweaking model options (which are less likely to make a big difference).

    Cheers

    Tim
    http://timmanns.blogspot.com/

  •  01-10-2010, 17:04 9553 in reply to 9552

    Re: New to Clementine

    Thanks Tim.

    I am going through the Applications Examples/Demo tutorial now.

    I value and appreciate your advice.

  •  01-11-2010, 15:43 9554 in reply to 9553

    Re: New to Clementine

    Hi Tim,

    Appreciate your blog as I have found it of great assistance. I am new to data mining too and have ordered from Amazon Data Mining Techniques and Data Preparation for Data Mining. Have done the introduction to Clementine and Data mining and the on-line tutorials.

    However I must admit that I am still not confident on data preparation. I understand that you can use mathematical functions to normalise numerical values but what about sets.

    I work on the airline industry and our customers belong to certain tiers. Now if I look at a previous campaign of responders 70% will belong to A (being the entry level tier), B say 15%, C at 10 and D at 5%.

    I have two questions:

    1 If only say 20% of customer responded to a campaign should the training and test set both have a 20/80 split of 1s and 0s indicating response?

    2 How do I normalise such a set above when there are 4 possible values for a member and not uniformly distributed?

    I plan to do Data Modelling course with SPSS within the next 2 months.

  •  01-11-2010, 17:25 9555 in reply to 9554

    Re: New to Clementine

    A few comments / questions;

    a) do you always know before hand which tier a customer is in?
     -> if you do know the customer tier before campaign response, then I'd suggest splitting your customer base by tier and building a model for each tier.  Then you simply use a 'if then else' derive ndoe to pick the score from the model that applies to that customer.  Often tier customers behave qualitatively differently (different expectations for customer service, impact of delays to flights if business etc), so there are plenty of reasons to consider this.

    b) Maybe don't do 1's or 0's, instead consider using 1.0 and 0.0 and make sure it appears as a 'Range' in the type node before modelling.  You then don't have to worry about balancing and can simply use the decimal score output to rank the customers.  If you want to have a categorical output (yes/no type of thing) then (because you have a 20/80 split in your data naturally) split the score where any score below 0.25 is false (no response) and a score above 0.25 is true (response).

    c) You don't *have* to sample (aka balance) the data to match the natural occurance when building your models, but you will need enough records/customers from each tier.  It is common the do this simply because the default split for categorical outputs is 50/50 (which most data mining tools do).  See previous post on this topic: http://timmanns.blogspot.com/2009/11/building-neural-networks-on-unbalanced.html, and also read Abbott Analytics.  Dean has made some great posts on this topic too; http://abbottanalytics.blogspot.com/2009/11/stratified-sampling-vs-posterior.html  When you test and validate your models *always* do so against natural occurance data (not sampled) because this is how the data will be when you run the campaign.

    Hope that helps

    Tim

  •  01-26-2010, 3:14 9589 in reply to 9555

    Re: New to Clementine

    Hi, I am new to Clementine,too!

     I really have to find someone expert on this platform, in order to enlighten me if it can do the job that I want because at the moment there is not time for me to search and do everything from scratch. I found out that Clementine is a very good tool for solving data mining problems and my problem is the following.I have a dissertation project for my Bachelor degree on multiple-level association rules mining as described in "Mining Multiple-Level Association Rules in Large Databases "By Jiawei Han and Yongjian Fu IEEE paper and I would like to find out if Clementine can help me in the procedure, because I did not see any implemented algorithm for multiple level inside, just the basic apriori. Is there any way to implement something in Clementine that solves the problem or I must implement a new algorithm using code in some language like java? At the moment, I just managed to do a connection with the SQL server with my database just to play with the platform, but the whole object oriented logic of the platform confuses me, in order to extract a conclusion. The basic concept of the multiple level algorithm is the following: 

    it takes as an input:

    1) the transaction itemsets for example in the form:(transid,item1,item2,item3,..,itemN) 

    2)An xml tree with all the possible categories that items can belong to.

    The output should be the association rules found on every level(or in cross-level as an extension of this logic) of the tree by imposing apriori.

    I hope there is someone with experience on the subject that can help me and guide me in the procedure despite any obscurities of my description!Thanks in advance.  

  •  01-26-2010, 21:07 9594 in reply to 9589

    Re: New to Clementine

    Clementine provides an open mechanism for adding any custom executable.  It was named CEMI (clementine extrenal module interface), but i haven't used it in a few years.  CEMI was recently updated (well I think in version 12).

    For example you can write a new Apriori algorithm in C or Java (compile it into an executable) and use CEMI to add it to Clementine.  Then you can still do all the database access, data manipulation and analysis in Clementine.  The new executable will appear as a node in Clementine (same as the default nodes) and the data in Clementine will get passed to the executable as it is defined in the CEMI configuration.

    I do have an old example in this post;
    http://www.kdkeys.net/forums/post/4551.aspx
    and also here;
    http://www.kdkeys.net/forums/post/6509.aspx

    Cheers

    Tim

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed