Zen: tsearchd

tsearchd - full text search daemon for PostgreSQL

This is a tsearch2 daemon ( for PostgreSQL 8.1.X )

tsearchd is a research product, which was developed by authors of tsearch2 as experiment with inverted indices. Don't use it in production !

The main idea is to create inverted index for large read-only archive and preserve full compatibility with tsearch2 (parsers, dictionaries, ranking), so one could combine results (UNION) of searching on two archives - read-only archive by tsearchd and current one by tsearch2 and rank results.

Inverted index is created from tsvector column and consists of keyword:posting list pairs: word:id1,…idN, where id1…idN - are primary key identifying document (should be integer). Currently, we use Berkeley DB version 1.85 to store inverted index. In future, we'll implement inverted index in PostgreSQL.

Search itself is very fast. However, calculating of rank requires reading tsvector for all rows in results, so it could slow down select.

1. tsearchd daemon should be started before creating creating inverted index.

Usage: tsearchd [-D] [-a ADDRESS] [-p PORT] [-m MEMCACHE_MB] -d DIRECTORY

     ADDRESS     - ip address (127.0.0.1 by default)
     PORT        - listen port (PG_PORT+1 by default)
     -D          - use for debug (a lot of garbage)
     MEMCACHE_MB - buffer size in megabytes.
     DIRECTORY   - directory for inverted index with name tsearchd.idx

NOTICE:

Currently, -p PORT, -a ADDRESS shouldn't be specified, tsearch always uses localhost:PG_PORT+1

2. tsearchd requires additional key (positive integer) to identify document - it could be a primary key.

3. Operations

                            Table "public.txt"
  Column   |   Type   |                     Modifiers                     
-----------+----------+---------------------------------------------------
 body      | text     | 
 tid       | integer  | not null default nextval('txt_tid_seq'::regclass)
 fts_index | tsvector | 
Indexes:
    "tid_idx" UNIQUE, btree (tid)

  A) CREATE inverted index

     select tdbulkindex ('select tid, fts_index from txt'); 
     select tdflush();

  B) SEARCH

     Returns id's of documents matched the query.
  
     select count(*) from tdsearch(to_tsquery('oil'));

  C) RANKED SEARCH

     select txt.tid, rank(fts_index,to_tsquery('oil')) as rank
     from txt, tdsearch(to_tsquery('oil')) as idx where tid=idx 
     order by rank desc limit 10;

INSTALLATION

   make&make install

NOTE: If you don't want to mix tsearchd with already installed tsearch2, you can use tsearch2-cur.sql without installing (make install) into system directory.

You could compile tsearchd from anywhere if you already have postgresql 8.1.X installed:

 USE_PGXS=1 make

Then you could run demo in subdirectory demo.

tsearchd

Projects

Links

tsearchd - full text search daemon for PostgreSQL

INSTALLATION