After
installing mnoGoSearch on FreeBSD yesterday and realising that the context provided in search results only shows the top of the page, it's time to try
ht://Dig today.
This one is a little easier to do than mnoGoSearch, because it doesn't use MySQL; it has its own builtin database. First, compile and install:
cd /usr/ports/textproc/htdig
make build install clean
Now you need to create the config file and customise it to fit your site:
cd /usr/local/etc/htdig
cp htdig.conf.sample htdig.conf
emacs htdig.conf
Change
database_dir from (...)/database to (...)/database_
foo, where
foo is some identifier that will remind you what you are searching. I'm assuming you are at least considering installing many ht://Dig instances on your server, and you need a different
database_dir for each one. If you only want one search engine, you don't need to change this.
Change
start_url to point to the root of your website.
Change
maintainer so you don't get the silly default e-mail address when you do log analysis.
Change
exclude_urls to include everything you don't want to index. If you're indexing a
Python Community Server, I recommend adding
referers.py format=rss, which will prevent referrer ranking pages and comment XML feeds from being indexed.
Change
bad_extensions. I added .xml, because I don't want to have rss.xml files showing in the search results. Also .fttb, so
Radio templates don't show up. Put in anything here that you have on the server but will look silly in the results.
Now you can make the database directory and build the database:
cd /usr/local/share/htdig/
mkdir database_foo
rundig
Now put the CGI in your webspace, make sure it's executable, and try it out:
cp /usr/local/share/apache/cgi-bin/htsearch /path/to/cgi-bin/htsearch.cgi
See the mnoGoSearch install notes (linked above) for details on how to get a .cgi file to run.
Enjoy!
Did I get anything wrong? Drop me a line (

) and I'll correct it.
Update: The version in the ports tree doesn't do phrase matching. Here's how to install from source:
log in as root
wget http://www.htdig.org/files/htdig-3.2.0b3.tar.gz
tar -vzxf htdig-3.2.0b3.tar.gz
cd htdig-3.2.0b3
./configure --prefix=/usr/local && gmake && gmake install && gmake clean
edit /usr/local/conf/htdig.conf as before
rundig (with fingers crossed).
More detail here later.