Data Mining

Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java,
and other programming languages
Welcome to Data Mining Sign in | Join | Help
in Search

Data Mining Source Code Newsletter

Business Analyst Training
Live, Online, Video Courses
Instructor-Led + Hands-On
BusinessAnalystBootCamp.Com

SQL + Database Training
Live, Online, Video Classes
Instructor-Led + Hands-On
SQLBootCamp.Com

Software Developer Training
Live, Online, Video Courses
Instructor-Led + Hands-On
SoftwareDevelperBootCamp.Com

IT CAREER COACH
Hands-On Experience Coaching
IT Skills Training
IT-Career-Coach.NET

IT Professional Newsletter
"Free" IT Career Success Tips
How To Accelerate Your Career
IT Career Newsletter

Ask IT Career Questions
"ASK" A Burning IT Career
Question Or Get Answers
Ask A Burning IT Question Now!

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed

Real World ASP.NET Search Engine For Text Mining, Text Indexing and Text Searching - Part 1

Last post 04-14-2005, 16:36 by Kingsley Tagbo. 0 replies.
Sort Posts: Previous Next
  •  04-14-2005, 16:36 4548

    Real World ASP.NET Search Engine For Text Mining, Text Indexing and Text Searching - Part 1

    1. Introduction

    This article explains how a Search Engine can search a Website for words and phrases that are exact matches or likely matches to all of the words or some of the words a user is searching for. It explains how a Text Search Engine is incorporated into a Website to search Web pages in the form of Weblogs, Forums and Picture Galleries. The Text Search Engine I am referring to is the ‘Community Server Search Engine.’ This Search Engine powers Text searches on a number of high traffic Websites based on the CommunityServer Website including the popular ASP.NET Forums Website with almost 1 million posts, MSDN Blogs and Xbox Forums. The Search Engine was created with C#, ASP.NET and SQL Server. CommunityServer is also used by Fortune 100 companies, small start-up businesses and schools.

    CommunityServer Search Engine does not incorporate Web crawling as used by Google, MSN, Yahoo and other Search Engines for spidering content hosted on alien websites and databases. Community Server’s ASP.NET Search Engine assumes that the content in the Web pages are already available on a database, hence, it does not spider a Web page to access it. Note: While the tasks common to Search Engines include

    · Spidering a Website

    · Indexing static or dynamics text in Web pages

    · Searching the indexed text

    Community Server’s ASP.NET Search Engine only indexes text and searches indexed text saved on a database. It does not spider and index text from a Web page.

    2. About CommunityServer's Search Engine

    The Search Engine used in this article was developed by TelligentSystems for a popular ASP.NET application known as CommunityServer. The search engine source code can be downloaded from CommunityServer.Org as part of the CommunityServer 1.0 or 1.1 source code download under an open source license that allows you to modify the code while giving credits to the application provider TelligentSystems . CommunityServer's Search Engine was created using C# and SQL Server.

    CommunityServer's Search Engine indexes and searches the text of 3 types of Web pages:

    The text rendered on Webogs, Forums or Photo Galleries is stored in the CS_Posts table in the CommunityServer database.

    3. Why A Real World ASP.NET Search Engine Article?

    Learning how a real world ASP.NET Search Engine works is important because

    • Searches by users on individual Websites form a sizeable part of the overall number of searches performed by users all over the world.
    • Text searching has become an important feature of most websites. Many sites thrive on the traffic created by the big three Search Engines (Google, MSN and Yahoo). On arriving at a website, an online user expects a local Search Engine to ferry the user to the set of pages that contain the most exact matches to the user’s topics of interest.
    • Local searches on a Website have to be very quick. The days of long running search queries are over. If a Website cannot quickly and reliably retrieve a match for the user’s search, the user typically hops to a competitor. A search for the word dotnetnuke on the CommunityServer Forums at www.asp.net returned a result set of 1105 matches in less than 1 second. A very good benchmark considering that the site has more than 500,000 pages or posts.
    • Giving the reasons above, you will think that there is a plethora of readily usable real world information on ‘How To Create And Add Your Own Search Engine To A Website’, but this is not true. You will find some good information on ‘how to search text on Web pages saved as files on a hard drive’, but not a lot on how to search text on web pages saved as text on a database.
    • A lot of Web pages are rendered from text generated from a database both for dynamic pages and the so-called static pages. SQL Server has emerged as a common and scalable database platform for business and personal data management and this article clearly explains how you can integrate a database platform like SQL Server or any other relational database with ASP.NET, another popular web development framework.
    • There will be instances when you have to write your own Search Engine for mining text in formats and media which are not readily accessible to other Search Engines. Furthermore, you may need your own custom Searching solution for several different reasons. This article offers you a template for a creating a flexible real world Search Engine. Understanding the architecture of this Search Engine could be valuable to you in designing a Search Engine for your Website’s custom searching needs.

    4. Text Searching and Text Indexing

    Text searching in CommunityServer's ASP.NET Search Engine is done mainly by the Search Class.

    The Search class consists of two major functions public override void IndexPosts(int setSize, int settingsID) which indexes the text on a Web page and protected override SearchResultSet GetSearchResults(SearchQuery query, SearchTerms terms) which searches text which has been indexed and saved to the database.

    Both functions used in text indexing and searching are members of the Search class and classes deriving from the Search class including the WeblogSearch class, ForumsSearch and GallerySearch class.

    The WeblogSearch class implements these methods from the base Search class:

    protected override ArrayList GetPermissionFilterList()

    protected override SearchResultSet GetSearchResults(SearchQuery query, SearchTerms terms)

    public override void IndexPosts(int setSize, int settingsID)

    protected override double CalculateWordRank(Word word, Post post, int totalBodyWords)

    The IndexPosts method indexes the text of a Weblog in a website (each website has an associated SettingsId) in the database for search queries created by the GetSearchResults method.

    To make a search on a Weblog, the GetSearchResults method executes the two statements below:

    WeblogDataProvider wdp = WeblogDataProvider.Instance()

    SearchResultSet results = wdp.GetSearchResults(query, terms)

    5. Summary

    This article explains how text is indexed for searching in Community Server. It presents text indexing and text searching, compares the Search Engine in review to the big three Search Engines (Google, MSN Search and Yahoo Search), explains why text indexing and Site Search Engines are important. 

    A detailed explanation of how text from Forums, Weblogs and Photo Galleries is indexed and searched for matches to some or all of the words in a user's search query is available in the next article at http://www.kdkeys.net/forums/4569/ShowPost.aspx.

    6. About The Author

    Kingsley Tagbo is an freelance writer and consultant.

    You can reach him via his blogs at http://www.kdkeys.net/blogs/kingsleytagbo or http://www.kdkeys.net/blogs/kingsley.tagbo



    Sign-up For Data Mining Source Code Newsletter

Announcing The Data Mining Source Code Newsletter!

Subscribe By Email | Subscribe By RSS Feed