CS572 Term Project, Fall 2005
Intelligent WWW Spider
Kung-En Dean Lin
dean@cs.iastate.edu
Dsecription
The web sites are a huge source of information. Since the size of the Web continues to grow, searching useful information is a difficult job. In this project, I will do an analysis between different algorithms to search information. First of all, I will do the basic search algorithm such as BFS or DFS or we called unintelligent spider, because BFS and DFS are just visiting every node on the Internet. Other the other hand, I will do an intelligent spider using word's similarity such as thesaurus dictionary. Therefore, the heuristic function can find explicitly pages what we want. Finally, I will come out a result to compare both of them.
Features
Basically, the project is extending the lab1 of ComS 572. In Lab1, we only use heuristic function to decide what links we are going to visit. However, the value of hecuristic function depends on a string pattern. It means that the program fixed the pattern. Also, Lab1 is working on intranet only. In this project, I implement it on Internet. The program searchs entire Internet to find the goal page which contains exactly string what we give. First of all, the user gives a starting web site to search a string. The program will query Merriam-Webster Online dictionary to find all possible thesaurus of words. Finally, the program uses Best First Search to decide what page we are going to visit first.
The presentation is scheduled on Dec. 7 2005. Presentation file will be available soon.
Since the matching strings are vary, the program doesn't consider all possibilities. Therefore, the performace review will only work on certain strings.
Searching String: "Graduate Program"
Starting Website: http://www.iastate.edu
Using thesaurus |
No thesaurus |
||
Created Nodes |
Visited Nodes |
Created Nodes |
Visited Nodes |
115 |
9 |
126 |
9 |
Code
You can download the code here.