Indexation with an interface WEB.

This program derive of the doli projet ( ) ( sorry it's in french )
In french
There's in the tarball a which make the indexed database ( a ASCII database ). The format of database is really simple :
[files:nbfichier to index]
number=name du fichier
[words:number of word]
word=>number of document,weight:number of document,weght ...
[rejected:number of word rejected because too frequent]
When the database is created , we could use it by some different program. For the WEB interface we begin to put the result of indexation in a SQL database ( It seem stupid to make a ASCII database to put in a sql database , but I want to use the ascii database for other application ). To use the , you have to get use File::Basename; // Celui-ci est deja peut etre sur votre systeme. use HTML::Entities; (you could find it on the ) class_bd.php3 A class php to acces to a MySQL databse parseindex.php3 A script php to insert the ascii database in the sql database search.php3 The script to query the database. search.sql The description of the databse.

PRINCIPE of the web interface.

The database is constitue of two table
  1. url the set of the url indexed.
  2. word the word with the document where they are and it weight in the document.


Modify the file class_bd.php3 . Create the table with the file search.sql
Go in a directory where the server WEB could access for example at ( /home/httpd/html/ under RedHat ).


You could index an external web site ( for example the HOWTO ( french ) on the web site freenix ). When get all the file by the wget command. (wget -m -np If you want to index the file on your local server, make the same: ( wget -m -np http://localhost/ ... ) When you have all the file you could make the ASCII database.

Make the ascii database.

To make the ascii database , you have to get a file with all the file you want: ( find . -name '*.html' > listfile.txt ) After make the ASCII database with the program perl listfile.txt > base.index
Put the file class_bd.php3 and parseindex.php3 in the directory with base.index ( the web server must be access to theses files ).
Go with your favorite web browser to acess to the parseindex.php3.
Enter in the entry the name of the ascii database ( base.index ).
After , you could make a search with search.php3
Sorry, I'm french I hope my explication help you. Please feedback all problems.


Download l'archive

charles vidal.