-
30 January 2008
data mining queries
-
hi... i am doin project on data mining...please give me information like how to start this project...which algorithm i should use...please give me some source code in c++ if pos..i m planning to use oracle in backend..and c++ with vc++ environment....please help me out..
my email id is : namrata_buddhadev@yahoo.co.in
-
30 January 2008
Mining Related Queries from Search Engine Query Logs
-
Hello, I'm Soledad. What I'm trying to do is to find related querys.
The related queries are based on the query log of previously issued queries by human users, which can be discovered using association rule mining model (i`m using weka right now).
I want to extract kind of patterns with the related querys, for example,
- a person who searchs "pizza" also search "movie rental"
- a person who search "restaurant" also search "cinema"
- a person who searchs "maps" also search "travel"
Users can use the suggested related queries to tune or redirect the search process.
Any suggestions or any related work to investigate?
Thanks. Sole
-
27 January 2008
Algorithm for fuzzy match of login id / username
-
Hi,
I just joined and started working on a project that I'm wondering if it's already been done. I have a DB that stores info about users, things like login ID, firstname, last name, employee ID, email, etc... I've been asked to devise and algorithm to do some type of fuzzy match so that whenever we import a new user, we can compare the login id vs the data elements to see if it's the same person. Things like: jdoe has an 80% probability of matching an entry with first name john and last name doe.
So we would have a set of rules and pattern matching based on 5 or 6 data elements.
Does anyone know if this has been done and any references or open source code to help?
Thanks!
Jim
-
24 January 2008
c#'s codes for clustering method
-
Hi, I'm Nicola, an Italian university student...I'm doing a project for the
university and I need the source codes for the nearest neighbour, farthest
neighbour, UPGMA in C#...(if you haven't this code in c#, you can send me
these in C) may someone help me?!
-
23 January 2008
$CC value in generated C5.0 models
-
A value called $CC is assigned to each prediction that is made by a C5.0 model. Can someone help me understand exactly what this value means and how it is calculated?
-
23 January 2008
Huge telco communication network/link/association analysis - how have you done it?
-
Hello,
I'm currently starting the design and preliminary data manipulation steps of a project in which we examine 'social networks' amongst our mobile phone customers. In identifying the services and behaviour of a group of customers that frequently communicate with each other, we can observe influential trends and adoption of new technologies. Churn/attrition of one member of a social group may also influence other members to leave. Such analysis can also help identify preferences and suggest suitable freebies and retention offers for customers.
I'm making good progress, but wonder if I am missing anything. So, I'm keen to hear from anyone that has conducted similar analysis. It is a very computationally expensive form of data analysis, and I am interested to know how other data miner's have tackled this problem. For each customer I am aiming to identify the most common associated (called) customers, and build a profile of usage for each (by joining with previously conducted behavioural based usage based segmentation).
Any tips or feedback would be appreciated.
Tim
-
22 January 2008
fyi - Concatenating a value to a null always results in null
-
Something that caught me out the other day. It is the correct and logical outcome, but can be unexpected.
If you concatenate any number of strings together and one of the string values is null, then the result of the concatenation will also be null. Clementine behaves this way, as does SQL and some programming languages.
If you want the result of a concatenation to equal the combination of any non-null values, then first use the Filler node to replace nulls with an empty string ("").
Cheers
Tim
-
22 January 2008
Imbalance in churn analysis
-
Hi to all! I'm working on a academic thesis about churn analysis in bank retail business. The goal is to compare the performance of 3 different forecasting models (logistic regression, classification trees and ANN).
My dataset counts 112454 cases, but only 2106 of them are churners. How do you handle this issue? I really don't know how to build a good training set and validation set from this data.. oversampling? downsizing? In what rate?
I'm using spss 16, but I also have clementine.
Thank you!!
Daniele
-
09 January 2008
download source code
-
Hi
I would like to ask how can i download source codes from this site. I need to download source code for algorithms of mining frequent sequences. It's very urgent.
Thanks for all
-
08 January 2008
Outlier Detection in Streaming Data
-
Dear All,
I am reading some papers about the Anommaly/outlier Detection in streaming Data.
I want to see how these different algorithm works. then may be i will be able to understand in better way.
i want to use Clustering or NN . Any suggestion please ? I need some code in VB or C++. If I get one one working program then I will become able to try all the other algoritms.
thanks in Advance.
MANZOOR.
-
02 January 2008
How Cluster feature (CF) tree works? (part of BIRCH algorithm)
-
Hi,
Can anyonu explain the inserting algorithm of CF tree?
None of documents i've found explain it well.
Thanks