Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Business Analyst Training
Live, Online, Video Courses
Instructor-Led + Hands-On
BusinessAnalystBootCamp.Com

SQL + Database Training
Live, Online, Video Classes
Instructor-Led + Hands-On
SQLBootCamp.Com

Software Developer Training
Live, Online, Video Courses
Instructor-Led + Hands-On
SoftwareDevelperBootCamp.Com

IT CAREER COACH
Hands-On Experience Coaching
IT Skills Training
IT-Career-Coach.NET

IT Professional Newsletter
"Free" IT Career Success Tips
How To Accelerate Your Career
IT Career Newsletter

Ask IT Career Questions
"ASK" A Burning IT Career
Question Or Get Answers
Ask A Burning IT Question Now!

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

questionnaire data

Last post 02-09-2010, 1:15 by hunterdong. 2 replies.
Sort Posts: Previous Next
  •  02-01-2010, 9:23 9642

    questionnaire data

    Hi dear all,

     I have got questionnaire data (SPSS format) with over 2,000 columns. A few questions:

     1. Is this usual to have thousands of variables? (in the questionnaire some questions are multiple choice with a few hundreds of options. I don't know how they carried out the survey! Must be a very long page in IE...)

        My answer currently is Yes.

     2. How can I store this into database (e.g. SQL Server supports 1024 columns only)?

       SPSS file looks like:

       UID, BoughtA, BoughtB, BoughtC, BoughtD........HateWalmart, HateBestBuy,HateTesco

       ResponderA,1,0,0,1................1,1,1

       ResponderB,0,0,0,0...............1,0,1

     

         My plan is to transform it to a single table:

        UID, Question, Answser

        ResponderA,Bought,A

        ResponderA,Bought,D

        ResponderA,Hate,Walmart

        .......

        ResponderB,Hate, BestBuy

    There are lots of columns are negative (didn't bought, didn't go to), does this need any special consideration?

    3. Is it possible to find/write a de-restructure node to consolidate the data to fewer columns?

        My guess is: Possibly not if the questionnaire have multiple choice questions.

    Thanks guys!

  •  02-07-2010, 21:45 9682 in reply to 9642

    Re: questionnaire data

    Hi,

    Re: question 1 & 2;
     - why do you want to change the format of the data?  It looks like a good format to me.  There is one record per customer and lots of variables.  It is easy to run PCA, a neural net, or decsion tree to pick useful variables.   Personally I would prefer the existing format.

    Re  Question 3:  A decision tree could be used to prune the variables and pick only predictive variables, but you didn't describe what you are trying to accomplish.  Are you predicting customer churn, future spend, product forecasting?  Is there a output variable?

    Cheers

    Tim

  •  02-09-2010, 1:15 9689 in reply to 9682

    Re: questionnaire data

    Thanks Tim.

     It is only because the questionnaire is periodically (one file per month), so we want to store it in database, but databases don't support more than 1000 columns, so if I want to use database rather than SPSS file, I guess I either have to split each question into one separate database table, or de-transpose them, or coding multiple choices (10 choices as something like 1010000001) into one database field (then I am afraid Clementine is not capable to cope with such tight coding)

    Primarily we want to force Clementine to be a statistical reporting tool (it doesn't appear to be one?) and ask it to generate: 5 people chose Walmart, 5 people chose BestBuy, 10 people bought Nappies, 20 people showed interest in Beers, yoy % increase is 100%

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed