-
31 May 2007
Hierarchical Clustering
-
Hello,
Please can anyone give me a link where I can get Hierarchical Clustering code - Single Link or Complete Link (Java or C++).
DonCarlos
-
27 May 2007
Implementation based project using WEKA
-
Hello all,
I' v an implementation based project using weka for a manufacturing company data ( Inventory, Sales, and accounting data) and the available time is too short so, i need a help of the following :
- Sample similer projects
- Sample Project Report
note: i' v to implement ( 2 classification algorithms Baysian and Rule - 2 clustring algorithms - 1 association algorithm)
thank you for all
-
22 May 2007
Prior probability in C5.0
-
Hello everybody, I'm working with C5.0 in See5 to assign the actual class to sets of objects. For the training samples I know a lot of features and the actual and the former class. For the rest of data I don't know the actual class, only the features and the former class.
Do you know how employ the former class in the prediction? The most likely is the actual class was the same as the former class, but I don't know the manner to implement it. Some idea?
Thanks
-
20 May 2007
¿Am I under the "Curse of Dimensionality"?
-
Right now my Clementine is running a process, I've been working 12 hours straight with just some breaks for eating and going to the bathroom.
My objective is to assign a probability number to 1.750.000 records, being the probability of adquiring a product next month. Each month, only 5.000 of those records adquires te product. I have 150 usable fields, with many different kind of distributions, storage class, types, and so on. I even have one set field with 100 different values.
One of the things i've done is combine 4 months of history, getting about 500 fields (there is no relevant history for some). I derived new fields (a very, very long duty), for example, the mean of the 4 months, the delta X between the mean of the first 3 months and the last month, set values with for example 8 values for describing historical behaviours of flag fields (like getting 1 - 0 - 0 - 1 or 0 - 1 - 1 - 0, for example), and i arrived to about 250 fields. When using feature selection i could screen half of those 250 (importance: Important) and end up with 125 fields for a neural network.
I can use the neural network to get 88% accuracy with the 5.000 buyers and 20.000 sample non-buyers... but of course, when i test the model on the whole database it's just useless, i get many many non buyers with a near-to-1 value in the probability field (softmax method).
I really don't know what to do!!! I can't even try to find correlations between fields because there are so many, it's a very stressing work, imagine how could someone end if he sees a graph between every field, having 125 fields. And believe me i've tried, but i've got nothing.
I thought of using PCA Factor but using it without any data preparation i get no improvement in the modeling. And if i want to normalize in a scale from 0 to 1 every single variable............................... remember i told you i have them of all flavours, i would get mad. I can't even bin because i don't get stable categories and i can't trust in getting the same categories for another period of months.
I've already described the magnitude of my database... am i doing something wrong? What would you do in my place? Even if i managed to reduce the dimensionality of the fields and take it to a reasonable number for a single person (me) to explore... is it really possible to generate a good model in the context i described?
I'll repeat it: i have data from every month. About 1.750.000 records. About 5.000 of those buy the product "A" (for example, an insurance policy) every month. I have to assign a probability to each record for buying the product in the next month, a probability good enough so that if i say that these 100.000 clients have %50 chance of buying it, about 50.000 buys.
If you watch Lost... and remember Hurley needing someone to tell him he was cursed... well... i'm feeling just like him. I can't deal with this.
Thank you very much.
-
14 May 2007
Difference between id3 and c4.5???
-
Hi, mates,
Can you tell me what's the difference between id3 and c4.5 algorithms. You can write me useful links, too.
Thanks, Nebojsha
-
12 May 2007
NN industrial process
-
Hi
The names Rob. Im currently at university as an undergraduate looking at using NN to predict an industrial process output. I was wondering if anyone could help me or guide me in the right direction?
The problem i have is that the process output is measured at irregular intervals, while the process inputs such as temperature and pressure are measured every minute. Due to the fact that these input variables greatly affect the process output, i was wondering if anyone has come accross this problem and how that dealt with it?
I have thought about using an hourly average of each input variable prior to the output. i have also considered using the actual value of each input at the time of the output.
I thank you in advance for any help received
-
12 May 2007
Neural Network?Industrial Process
-
Hi
The names Rob. Im currently at university as an undergraduate looking at using NN to predict an industrial process output. I was wondering if anyone could help me or guide me in the right direction?
The problem i have is that the process output is measured at irregular intervals, while the process inputs such as temperature and pressure are measured every minute. Due to the fact that these input variables greatly affect the process output, i was wondering if anyone has come accross this problem and how that dealt with it?
I have thought about using an hourly average of each input variable prior to the output. i have also considered using the actual value of each input at the time of the output.
I thank you in advance for any help received
-
06 May 2007
2 questions about Neural Networks.
-
Hi, I have two questions regarding Neural Networks.
1) Suppose today I generate a Neural Network model, and in a couple of months I get new data, and I want to train my generated model with the new records.Do I have to start over? Do I have to append the records and make one big pass with the node, generating a new network but with more data?
The Neural Network node help says that if the "Continue training existing model" option is selected, then the model previously generated should re-train itself. But I don't know to what model it refers... Should I place the previously generated model just before the Neural Network node?
2) I have one flag output. Using the "Difference Method" to calculate confidence: can i manually calculate a "probability" of being true or false? Suppose i get "TRUE" with 0.122 confidence. (X - 0.5) . 2 = 0.122 then X = 0.561. Could i say the probabilty of that record being a TRUE is 56% and the probabily of being FALSE is 44%? What is the difference between this manual method and the SoftMax method?
Thanks, I hope someone can reply!!!
-
06 May 2007
Apriori Algorithm
-
hi,
I need Apriori algorithm source code in VB.NET
Could anyone please send it to me? It's urgent.
e-mail:kubrasen84@gmail.com
-
06 May 2007
Apriori algorithm code in VB.NET
-
hi,
I need apriori algorithm code in vb.net.
Could anyone please send it to me? It's urgent
my e-mail:kubrasen84@gmail.com
-
04 May 2007
mining during temporal relationship(wanna help)
-
I am going to begin research in algorithms of mining during temproal relationship. But I still don't have any idea about this area. Please give me the names of those research papers that will help me know more about this area. And as I want to apply the algorithm to some practical fianancial operations, I also want to know something about application of mining such contain/during relationship. It would be better if you can also give me some other suggestions. Thank you very much.
-
04 May 2007
Who can tell me how to see the source code of Weka?
-
It's said that weka is an open software and its source code can be seen by the user. How can I see it?Please give me some instructions. Thank you very much!