Tutorial on how to install and configure htDig search for your web site. The Linux Information Portal includes informative tutorials and links to many Linux sites. WWW Search Engine Software. Contribute to roklein/htdig development by creating an account on GitHub. Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search these documents.

Author: Nizilkree Kejas
Country: Madagascar
Language: English (Spanish)
Genre: Sex
Published (Last): 11 January 2005
Pages: 259
PDF File Size: 18.64 Mb
ePub File Size: 18.87 Mb
ISBN: 420-9-86847-641-4
Downloads: 26492
Price: Free* [*Free Regsitration Required]
Uploader: Arashizragore

The most recent exception to this was version 3.

ht://Dig — Internet search engine software

It uses catdoc to parse Word documents, and ps2ascii to parse PostScript files. The default search results wrapper file, that contains the header and footer together in one file.

While there is theoretically nothing to stop you from indexing as much as you wish, practical considerations e. If you wish to keep secure and non-secure areas on your site separate, and avoid having unauthorized users seeing documents from secure areas in their search results, that takes a bit more effort.

Thus, a search for a filename will match znd link description, and the file will show up in search results.

Frequently Asked Questions

Come on in and find out. What happens is ht: Of course this will require more memory to read the larger file.


Assuming your configuration file is called cc. This database, together htdiig information on the URL associated with each document, is created every time you request a re-indexing of the site, and is merged with the results of previous index runs to create the foundation for the search engine. This usually has to do with the default document size limit.

The search results will then give a list of URLs for all pages that match the search terms. The config input parameter doesn’t need to be hidden either, and you may want to define it as a pull-down list to select different databases see question 4. Older versions of ht: Amongst other things, you can modify the hhdig for the search database, specify a list of URLs and extensions to be bypassed while indexing, enable or disable the fuzzy logic algorithms, limit the amount of content stored in the search database and control the maximum amount of anc read over an HTTP connection.

An alternative is to use an external parser with the xpdf 0. Don’t go overboard, though, as you don’t want to overflow qnd bit integer about 2 billionand you don’t want to allocate much more memory than you need to store the largest document.

You can’t, and you shouldn’t. This describes the setup for an Apache server.

htDig – Web Site Search

What’s the latest version of ht: Constructing a local search using ht: Alternatively, create your own file and tell ht: You need to find out the reasons for the rejection of these documents. Other input parameters may similarly pose a problem. If you change the search. You can either mail the ht: The scores calculated this way aren’t quite as good, but htsearch can process hits much faster when it doesn’t need to look up the db.


htdig(1) – Linux man page

Development is beginning on htdig4 as well as a few interim releases of htdig3. You would also need to configure the script to indicate where all of the document to text converters are installed.

However, some users still prefer to stick with acroread, as it works well for them, and is a little easier to set up if you’ve already installed Acrobat. First of all, htdig doesn’t look at directories itself. It causes htmerge to fail with a “Word sort failed” error.

As above, this hhtdig has to do with the default document size. When you run htsearch with no customization, on a large database, and it gets a lot of hits, it tends to take a long time to process those hits.