-
29 February 2008
apriori's algorithm code completly written in Matlab
-
Hi Every one,
Could you please if it's possible, to get apriori's algorithm code completly written in Matlab, please if any body have the code, to send me please via my email belal_master2004@yahoo.com
Kind Regards,
Bilal
-
28 February 2008
Hi i need help on K-means algorithm
-
hi i am sandeep and i am a grad student ..
i am doing a project on the text categorization so i need to implement the k-means on the data set so whats my doubt is..
Is it preferred to to implement the k-means algorithm for text categorization?
and if so please can any one give me the k-means algorithm in the Vb.net..
if this algorithm is not preferred then can you suggest me any other algorithm?
Thank You
-
26 February 2008
Expression - Finding a Space
-
I would like to use a Derive node to find the space in a field. Please could anyone help with what Expression I need to use. The formula in Microsoft Access is InStr([postcode]," " and in Excel is it =SEARCH(" ",A1,1)
Any help would be greatly received.
Kind Regards
Carly
-
12 February 2008
SetToFlag node problem
-
I've connected a SetToFlag node to a source node (SPSS data file), but when I try to execute, I get an error saying that "there are no executable nodes." Do I need to do something to my source node to make it executable, or do I need a downstream output node from the SetToFlag node?
Thanks!
-
06 February 2008
pls help me for getting the source code in c++ for data mining for general store
-
hello...
pls help to get the source code of data minig in c++ for general store...
pls it is urgent...
or help any related way if possible..
-
06 February 2008
decision tree algorithm using gini indices
-
hi all...can any one help me with the visual basic source code for decision tree algorithm in data mining using gini indices...
its very urgent.
.i need it for my project for database classification...
i will be very thankful if u help me with the code....
-
06 February 2008
What do you think about my clustering program?!
-
Cosa ne pensate del mio programma per il rgruppamento?!
It works reading from an Excel file 2003 the 2 first columns: the first is X, the second is Y...but you need to install also the software MVSP. In About, Help you can find the instruction! Tell me what do you think and help me to do it better!
Il programma legge le prime 2 colonne di un foglio Excel (no versione 2007): la prima deve essere riempite con le ascisse, la seconda con le ordinate...calcolate poi le varie matrici delle distanza, cliccare su salva e aprire il file salvato con MVSP (si tova un versione di prova on-line). Comunque in About, help trovate le istruzioni necessarie!
Vi prego, fatemi sapere cosa ne pensate...così mi aiutate a migliorarlo, grazie!!
Nicola.
-
05 February 2008
Anyone tried SLRM? Can't make it work, maybe I don't understand it...
-
So, after reading the help in Clementine and seeing the demo, I concluded that for SLRM you must have:
1) A "PRODUCT_OFFERED", string field with different product offers.
2) A "ACCEPTED", flag field with T/F value, stating if the client has accepted the offer or not.
So now you should have a base with, for example, 5 different and balanced (10-30% each category, suppose) values for PRODUCT_OFFERED, and in each category, T/F values for "ACCEPTED" balanced according to reality or whatever balancing method you would like to use.
Check pm_selflearn, the SLRM there... the model states that for all categories it has an almost perfect precision (reminds me of that old and utopic drug_a-drug_b_drug_x decision tree). But after seeing the results... I don't really get it! I run the model with the third base (pm_customer_train3.sav) and checked results... it's completely wrong. It has 3 hits, out of 33 records, and the three of them are on only one category, "Mortage".
I don't get it. The demo even states that if the confidence is 0.95, then you should have a 95% chance of hit. There are seven records with 0,957 in confidence and same prediction: Savings. None of them is a hit, not even one.
Then I was going to try this, but I realized I didn't have the T/F "ACCEPTED" field, I was working on open market. So I checked again the demo database and saw every single record had "T" for accepted. I thought that SLRM would still work and decide, even if everyone is a buyer, what product will they buy. I made my training database with 500,000 records and 750 fields, execute, and I got an error when the records finished loading on memory or whatever it is they load. I don't remember the error...
I decided to try with a smaller sample, 50,000 and 650 fields and got a model, but the graphics show 0 precision. Anyway, I tried to see the results on a table, but I couldn't, error: SLRM ERROR: Internal error. "(PMML not valid)". Same error I got when working with the demo stream, when I didn't change the name of some fields with non-simple characters. I work with fields with more than 8 characters (lenght), I understand there are certain programs that can't work with fields like that. If this is the reason of the error... have I just got a model and now can't execute it because Clementine didn't told me BEFORE generating it that I had fields with more than 8 characters?
Am I getting something wrong? Or is this new SLRM feature (Clem 11.1) not quite right yet?
Please help! I'm thrilled to find out if I'm a complete ignorant or SPSS has released a very, very erroneous model...
Thanks for reading,
A
-
04 February 2008
Database access extremely slow?
-
After doing some tests it seems like Clementine's database access is extremely slow. Has anyone experienced the same? Here is the test I conducted:
Stream setup:
import node -> filter node -> select node -> screen output
The select node selects about 300,000 from the 20,000,000 rows in the dataset, by an ID number (integer). The stream is executed on a client, everything else is on the server (SQLServer database and raw data in text file format). I've created an index on the ID number in the database.
1) Import node = plain text file input node: execution time 103 seconds
2) Import node = database input node: execution time 280 seconds
When executing 2), the all nodes except for the output node turn purple, showing me that this part is performed within the database.
Now, in a 3rd test, I read the data from the same SQLServer database via a direct SQL statement in WinSQL (a database access utility), execution time: just 42 seconds.
With such slow execution, I find that database connections in Clementine are not very useful. Or is there a way to speed up it up? Do you have any experience with this?
Regards, Ken
-
04 February 2008
LDBSCAN
-
I'm looking for an implementation of the LDBSCAN algorithm:
http://dollar.biz.uiowa.edu/~street/LDBSCAN.pdf
Is this implemented by someone and available?
-
04 February 2008
Help with function estimation / 2 datasets
-
I have a function estimation problem at hand for the experimental research that I'm doing. Here is what I wish to do:
I have a 500 example(instance) dataset of 5 attributes and a numerical
label (which I can use in a discretized version if need be) available
from literature. Then I have a second dataset from my own experiments,
20 examples with same attributes and numerical label. I would like to
find the patterns (function estimation in some manner) in the original
dataset and then first check how well it applies to my experiment data.
I have a good idea of how to do this up to this point, though I would
appreciate suggestions for learning algorithms. After doing this, I
want to:
Modify the original model to better account for my experimental
data. Can you give me a few suggestions/methods on how to approach
this? I was thinking of using a combined dataset with different weights
for instances (higher for my experiment points). Which learning
algorithms would be suitable for this? Any alternative approaches?
-
03 February 2008
Looking for Amazon website data mining whitepapers
-
Hello
Do anyone know of whitepapers, articles or books where the data mining capabilities used in the Amazon website are discussed?
Thanks a lot.
-
02 February 2008
A DATA MINING APPROACH FOR TRANSACTION LEVEL DATABASE INTRUSION DETECTION
-
hi,
need help regarding this project... the algorithm is GSP... please post the source code for implementing GSP(generalized sequential pattern mining) algorithm in JAVA.............. nd give sum suggestions for finding malicious transactions using dependency rules.