Indexation with an interface WEB.
This program derive of the doli projet
( http://www.linux-france.org/prj/vidalc/ )
( sorry it's in french )
In french
There's in the tarball a index.pl which make the indexed database ( a ASCII database ).
The format of database is really simple :
[files:nbfichier to index]
number=name du fichier
...
[words:number of word]
word=>number of document,weight:number of document,weght ...
....
[rejected:number of word rejected because too frequent]
word
....
When the database is created , we could use it by some different program.
For the WEB interface we begin to put the result of indexation in a SQL database
( It seem stupid to make a ASCII database to put in a sql database , but I want to
use the ascii database for other application ).
To use the index.pl , you have to get
use File::Basename; // Celui-ci est deja peut etre sur votre systeme.
use HTML::Entities;
(you could find it on the http://www.cpan.org/ )
class_bd.php3 A class php to acces to a MySQL databse
parseindex.php3 A script php to insert the ascii database in the sql database
search.php3 The script to query the database.
search.sql The description of the databse.
PRINCIPE of the web interface.
The database is constitue of two table
- url the set of the url indexed.
- word the word with the document where they are and it weight in the document.
INSTALLATION.
Modify the file class_bd.php3 .
- $DB_HOST='localhost'; the host with the databse.
- $DB_USER='xxxxxx'; the name of the user .
- $DB_PASSWD='xxxxx'; his passwor
- $DB_SID='xxxxx'; the name of the database.
Create the table with the file search.sql
Go in a directory where the server WEB could access for example at
( /home/httpd/html/ under RedHat ).
WHAT DO YOU WANT TO INDEX.
You could index an external web site ( for example the HOWTO ( french ) on the web site freenix ).
When get all the file by the wget command.
(wget -m -np http://www.freenix.org/unix/linux/HOWTO/)
If you want to index the file on your local server, make the same:
( wget -m -np http://localhost/ ... )
When you have all the file you could make the ASCII database.
Make the ascii database.
To make the ascii database , you have to get a file with all the file you want:
( find . -name '*.html' > listfile.txt )
After make the ASCII database with the index.pl program
perl index.pl listfile.txt > base.index
Put the file class_bd.php3 and parseindex.php3 in the directory with base.index ( the web server
must be access to theses files ).
Go with your favorite web browser to acess to the parseindex.php3.
Enter in the entry the name of the ascii database ( base.index ).
Wait....
After , you could make a search with search.php3
Sorry, I'm french I hope my explication help you.
Please feedback all problems.
Download
Download l'archive
charles vidal.
vidalc@linux-france.org