CS572 Term Project, Fall 2005

Intelligent WWW Spider
Kung-En Dean Lin
dean@cs.iastate.edu


Dsecription

The web sites are a huge source of information. Since the size of the Web continues to grow, searching useful information is a difficult job. In this project, I will do an analysis between different algorithms to search information. First of all, I will do the basic search algorithm such as BFS or DFS or we called unintelligent spider, because BFS and DFS are just visiting every node on the Internet. Other the other hand, I will do an intelligent spider using word's similarity such as thesaurus dictionary. Therefore, the heuristic function can find explicitly pages what we want. Finally, I will come out a result to compare both of them.


Features

Basically, the project is extending the lab1 of ComS 572. In Lab1, we only use heuristic function to decide what links we are going to visit. However, the value of hecuristic function depends on a string pattern. It means that the program fixed the pattern. Also, Lab1 is working on intranet only. In this project, I implement it on Internet. The program searchs entire Internet to find the goal page which contains exactly string what we give. First of all, the user gives a starting web site to search a string. The program will query Merriam-Webster Online dictionary to find all possible thesaurus of words. Finally, the program uses Best First Search to decide what page we are going to visit first.


Presentation

The presentation is scheduled on Dec. 7 2005. Presentation file will be available soon.


Performaance

Since the matching strings are vary, the program doesn't consider all possibilities. Therefore, the performace review will only work on certain strings.

Searching String: "Graduate Program"

Starting Website: http://www.iastate.edu

Using thesaurus
No thesaurus
Created Nodes
Visited Nodes
Created Nodes
Visited Nodes
115
9
126
9

Code

You can download the code here.


Last Update on Dec. 3 2005