DataMining on Chat Logs using the NaiveBayes example

starstarstarstarstarstarstarstarstarstar Rating: 0/5 (0 vote cast) print

Dear Mr. Tagbo,
Firstly, i would like to appreciate your knowledge on mining & other programming languages. You really inspire me. Congrats for putting up this site. I am new to data mining and am trying to do a mock project. The aim is to do mining on the chat logs of a website offering online courses.
The aim is to derive a pattern between the students grades and the (1) Faculty/Moderator in that particular chat session, (2) The number of participants, (3) The time of the chat session and (4) any other valid variables.
How could I modify the NavieBayes algorithm example to do this project, this would serve as a base for developing further complicated patters.
Any help would be appreciated.

 : germic     Reply  

Replies (11)

profile
KINGSLEY TAGBO

I have a tool that I developed for Visual Basic Data Mining with Naive Bayes. It includes the complete source code for a Visual Basic 6 Naive Bayes Algorithm implementation.It mines data in a Microsoft Access or SQL Server Database. I am workign on a C# Naive Bayes implementation, but that is not ready.It looks like the tool would be useful for Data Mining Chat Logs using Naive Bayes.The steps that you would take would be to categorize the data very clearly as in the example that I gave to you. Look at the Microsoft Access Database included with the Visual Basic 6 database and create a table containing sample data and then train the Algorithm on it. Then create a nothe rtable with unknown Student Grades and using the Training Rules, predict the possible grades of the students.Thanks


Dear Mr. Tagbo,Thanks again, I tried the VB software but there was error in Richtext box Flexgrid, so I added them and everything seems to work fine till the last screen where there is nothing displayed and clicking on Run orSave gives a object referenceerror (No: 424 94 - Invalid use of Null) onMSHFLEX.RecordSet.(1) How do i correct this error?(2) To connect to your Testing_database, I use the connection string option and create a new DNS and then reference it, this works but is this the correct way?(3) IT WOULD BE HELPFUL IF YOU COULD TELL HOW TO MODIFY THE SHOPPING CART CODE TO DO THIS DATAMINING ON CHAT LOGS, SO MY XML COULD READ SOMETHING LIKE THIS...1JOHN,SLOT1,50A ..Where JOHN is a faculty, SLOT1 is the time (total 4 slots), 50 is the strength (either 25%, 50%,75% or 100%).Either a Apriori or Naive Bayes algorithm is ok.Hope you could find the time to help me. Thanks a lot!

: germic    Reply

Hi Mr. Kingsley,What is the advantage of using Naive Bayes over Apriori algorithm in this particular instance of finding pattern in the grades after mining the chat logs?Thanks!

: germic    Reply

profile
KINGSLEY TAGBO

[quote user="germic"]Hi Mr. Kingsley,What is the advantage of using Naive Bayes over Apriori algorithm in this particular instance of finding pattern in the grades after mining the chat logs?Thanks![/quote]It looks to me like Naive Bayes will be a good candidate if what you want is to train your algorithm on the Chat Logs of existing teachers and students so that with a giving a new teacher's Chat Logs, you can predict what the students grades are likely to be if they remain unknown.With Apriori Algorithm, you are trying to find strong credible associations. E.g., if you are tracking several measures of chat logs and you will like to see which of these measures are strongly associated, you may discover that 3:00 PM in the evening is strongly associated with students starting a class. I think that Naive Bayes is were you want to go.Thanks


profile
KINGSLEY TAGBO

Firstly, please post the exact error message(s) that you receive.Secondly, please make sure that you are running Visual Basic 6 with ADO 2.6. Download MDAC 2.6 from the Microsoft Web Site if you do not have that installed.I have executed the algorithm many times, so, I know that it runs okay and that you are likely using libraries that are out of date.Thirdly, what type of database is your data in? Is it Ms Access, SQL Server, Oracle, Sybase, etc. and what version of the database are you using? Sql Server 2000, or Ms Access 97, or Ms Access 2000, etc.Finally, can you post the structure of the table that houses your data, so that I can see all the fields and their names and the type of data (nominal or discrete).Make sure you read about Naive Bayes Algorithm from http://www.kdkeys.net/datamining-on-chat-logs-using-the-naivebayes-example/#link-7004and http://www.kdkeys.net/datamining-on-chat-logs-using-the-naivebayes-example/#link-7005Thanks


Hi,I figured out the problem and now it works fine. Thanks a lot.

: germic    Reply

profile
KINGSLEY TAGBO

[quote user="germic"]Hi,I figured out the problem and now it works fine. Thanks a lot.[/quote]Hi :Can you please tell us what you did to resolve the problem, so that others with the same problem can benefit from your solution?Thanks


When you open the souce code VB 6 gives an error that it cannot load the Rich Text Box and the MSHierarchical Flex Grid controls.
So, I then added those components (the ocx are present in windowssystem32 or winntsystem32) and then pasted then onto the respective screen.
The RichText Box in the welcome screen and the MSHFlxGd in the last screen where the report is generated.
If you get an license error while pasting the controls, then download the VB6CLI patch from Microsoft site (http://www.kdkeys.net/datamining-on-chat-logs-using-the-naivebayes-example/#link-7007) and then things works fine.

: germic    Reply

profile
KINGSLEY TAGBO
[quote user="germic"]

When you open the souce code VB 6 gives an error that it cannot load the Rich Text Box and the MSHierarchical Flex Grid controls.
So, I then added those components (the ocx are present in windowssystem32 or winntsystem32) and then pasted then onto the respective screen.
The RichText Box in the welcome screen and the MSHFlxGd in the last screen where the report is generated.
If you get an license error while pasting the controls, then download the VB6CLI patch from Microsoft site (http://www.kdkeys.net/datamining-on-chat-logs-using-the-naivebayes-example/#link-7007) and then things works fine.

[/quote]

Thanks a lot, we appreciate the information.

profile
KINGSLEY TAGBO

Dear Germic,Working with Naive Bayes Algorithm isrelatively simple. However, I will need you to clarify your question a bit.What are you going to classify or predict in the chat logs? For Example, you one can use Naive Bayes to Predict or Classify if a Game would take place based on weather conditions.The temperature, humidity, wind can be variables mined and classified as a 'Yes' or 'No' decision to Playing an Outdoor game.We will also have to categorize the data better, eg.Temperature (Mild, Cold or Hot)Humidity (Low, Normal, High)Wind (Low or High)PlayOutside? (YES or NO), with Play Outside being the decision that has to be made based on the temperature, humidity and wind.The Questions for you are:What are the categories of Student Grades (E.g. A, B, C, ...)Who are the Faculty Moderators (E.g. John, Jones, Anna)What is the range of the number of participants or is it always a fixed value that doesn't vary much (8 to 30 participants,vs. only 25 or 30 participants).What are the possible values of time for Chatting(anytime between 8:00 AM and 11:00 PM) or only (3:00 PM or 6:00 PM).I guess that Students Grades is the outcome you are trying to analyze ,but you have to make it clear.Thanks,


Dear Mr.Tagbo,Thanks again for your clear example.(1) The categories of students grades would vary from 1 to 3 with a .25 interval (ex. 1,1.23,1.50...3,5) but for the sake of example we could assume it as A,B,C D.(2) The Faculty Moderators as you said could be assumed as F1,F2,F3,F4.. (which could be mapped to names later, the faculties's resourcefulness could be the deciding factor here so perhaps F1, F2.. could point to the knowledge of the faculty and probably map this to the names later)(3) The range of participants varys. Likefrom around8 to 30.(4) Time of chatting would be anytime from 8:00AM to 11:00PM(I would like to make an analysis something similar toIf student participates in a chat session during X time, with F1 faculty and N number of students then his possible grades would be B.)Hope this expectation is right!Thanks again for imparting your knowlege.

: germic    Reply


Post A Reply

 Questions & Answers