PHP Classes
elePHPant
Icontem

Htdig site indexing and searching interface: Interface with Ht:/Dig indexing and search engine.

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Info   Screenshots Screenshots   View files View files (10)   DownloadInstall with Composer Download .zip   Reputation   Support forum (2)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings  
2005-02-07 (10 years ago) RSS 2.0 feedStarStarStar 54%Total: 6,272 All time: 324 This week: 1,209Down
Version License Categories  
htdiginterface 1.0.0BSD LicenseSearching
Description Author  

This class is meant to interface with the Ht:/Dig programs to be able to index and search Web pages from PHP. It features:

- Setup a suitable configuration file from a few user defined parameters.
- Index Web pages to build the search databases.
- Search the indexed database to capture the matches into a PHP data structure ready to be used to display the results in a PHP generated page.

Picture of Manuel Lemos
Name: Manuel Lemos <contact>
Classes: 38 packages by
Country: Portugal Portugal
Age: 46
All time rank: 1
Week rank: 3 Down1 in Portugal Portugal Equal

Details provided by the author  
/*
 * README
 *
 * Purpose:  Basic instructions to use this class.
 *
 * @(#) $Header: /home/mlemos/cvsroot/htdiginterface/README,v 1.1 2005/02/08 06:14:30 mlemos Exp $
 *
 */

PHP interface for Ht:/Dig versions 3.1.x or 3.2.x:

This class provides an interface to the Ht:/Dig package of programs to
simplify the process of configuration, indexing and searching a site.

Despite Ht:/Dig can work with an existing configuration files, this class
can only work properly if you use a configuration file generated by the
class.

The class sets certain configuration directives to work with special
result page template files that are necessary to let the class parse the
search results and extract the information returned by htsearch program.

The special template files are supplied within this class package. There
are also example scripts to perform each of the steps to configure, index
and search a site with Ht:/Dig.

To make this class work properly, please follow these steps:

1. The htdig_setup_configuration.php example script demonstrates how to
setup the class so it can create a suitable configuration file for
Ht:/Dig.

You can tell it to supersede the default Ht:/Dig configuration file or
generate a new file in a different path.

You may generate as many different configuration files as you want,
possibly one configuration file for each site that you may be hosting in
the same server. In this case, you may want to specify different
directories for the database files that will contain each site index.

The script should call the GenerateConfiguration function to tell the
class to create the configuration file.

This function takes an array of values for any Ht:/Dig options that you
may want to set to customize the indexing and searching processes of your
site.

The GenerateConfiguration function merges your custom options with some
options that the class needs to set to make the search results page
parsing work properly. Those options set the file names of the output
results templates to: htdig_header.html, htdig_nomatch.html,
htdig_syntaxerror.html and htdig_template.html .

The GenerateConfiguration function just takes a special option named
template_path to specify an alternative directory for the template files
if you want to put them in the current directory of your site index and
search page script.

2. The next step after creating a suitable configuration file is to start
the process of crawling a site to build the index database files.

The htdig_build_databases.php example script demonstrates how to start a
crawling session. It calls the class function named Dig that wraps around
the htdig, htmerge and htfuzzy commands.

This function can be called as often as you want, eventually using
different configuration files, if you want, to index different sites. This
is something that you probably will schedule to be done once a day on low
traffic hours for each of your sites.

Scheduled crawling can be done using tools like cron or equivalent in your
operating system, using PHP CGI or CLI versions to run the crawler script
off the Web server.

The Dig function calls Ht:/Dig programs in a way that they will create
temporary index database files during the indexing process. Only when the
process is ended, the final index database files replaced with the
contents of temporary files.

This way you can run a crawling process at the same time the site is being
searched by your users using database files from the previous crawling
session.

3. Once your site is indexed at least once, you can start using the class
to provide an interface to search your site pages. Take a look at the
htdig_search.php script for an example site search page. You can use this
example script as base for your customized site search page.

The example script presents a simple search form. When the form is
submitted, it calls the Search function and outputs the results split into
pages with links to navigate between each pages of search results. The
number of results per page is configurable.
Screenshots  
  • htdig.gif
  Files folder image Files  
File Role Description
Files folder imagetemplates (4 files)
Accessible without login Plain text file configuration.php Conf. Common configuration settings
Plain text file htdig.php Class Ht:/Dig interface class file.
Accessible without login Plain text file htdig_build_databases.php Example Example script to build Ht:/Dig databases about the indexed pages.
Accessible without login Plain text file htdig_search.php Example Example search page script.
Accessible without login Plain text file htdig_setup_configuration.php Example Example script to setup a Ht:/Dig configuration file.
Accessible without login Plain text file README Doc. Basic instructions to use the Ht:/Dig interface class

  Files folder image Files  /  templates  
File Role Description
  Accessible without login Plain text file htdig_header.html Data Ht:/Dig search result header template file.
  Accessible without login Plain text file htdig_nomatch.html Data Ht:/Dig no match search result template file.
  Accessible without login Plain text file htdig_syntaxerror.html Data Ht:/Dig syntax error search template file.
  Accessible without login Plain text file htdig_template.html Data Ht:/Dig result template file.

 Version Control Unique User Downloads Download Rankings  
 0%Total:6,272All time:324
 This week:0This week:1,209Down
User Ratings User Comments (1)  
 All time
Utility:70%StarStarStarStar
Consistency:72%StarStarStarStar
Documentation:56%StarStarStar
Examples:63%StarStarStarStar
Tests:-
Videos:-
Overall:54%StarStarStar
Rank:1353
 
Excellent
5 years ago (kishore kumar)
80%StarStarStarStarStar