Full-Text Search in PostgreSQL: A Gentle Introduction
Prev	Fast Backward		Fast Forward	Next

Appendix A. FTS Complete Tutorial

This tutorial is about how to setup typical FTS application using PostgreSQL.

We create our configuration pg, which will be default for locale ru_RU.UTF-8. To be safe, we do this in transaction.

begin;
CREATE FULLTEXT CONFIGURATION public.pg  LOCALE 'ru_RU.UTF-8'  LIKE english WITH MAP AS DEFAULT;

We'll use postgresql specific dictionary using synonym template dictionary and store it under PG_ROOT/share/dicts_data directory. The dictionary looks like:

postgres    pg
pgsql       pg
postgresql  pg

CREATE FULLTEXT DICTIONARY pg_dict OPTION  'pg_dict.txt' LIKE synonym;

CREATE FULLTEXT DICTIONARY en_ispell 
OPTION 'DictFile="english-utf8.dict",
        AffFile="english-utf8.aff",
        StopFile="english-utf8.stop"'
LIKE ispell_template;

Use the same stop-word list for snowball stemmeren_stem, which is available on default.

ALTER FULLTEXT DICTIONARY en_stem SET OPTION 'english-utf8.stop';

Modify mappings for Latin words for configuration 'pg'

ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword
                             WITH pg_dict,en_ispell,en_stem;

We won't index/search some tokens

DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;

Now, we could test our configuration.

select * from ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');


end;

We have a table pgweb, which contains 11239 documents from PostgreSQL web site. Only relevant columns are shown.

=# \d pgweb
           Table "public.pgweb"
  Column   |       Type        | Modifiers
-----------+-------------------+-----------
 tid       | integer           | not null
 path      | character varying | not null
 body      | character varying |
 title     | character varying |
 dlm       | integer           |

First we should take care about default FTS configuration - we want our public.pg to be default. To do so, we need to redefine search_path, since we already have predefined default full-text configuration (for ru_RU.UTF-8 locale) in pg_catalog.

=# \dF
pg_catalog | russian_utf8          | ru_RU.UTF-8     | Y
public     | pg                    | ru_RU.UTF-8     | Y

=# show tsearch_conf_name;
    tsearch_conf_name
-------------------------
 pg_catalog.russian_utf8

=# SET search_path=public, pg_catalog;

=# show tsearch_conf_name;
 tsearch_conf_name
-------------------
 public.pg

The very simple full-text search without ranking is already available here. Select top 10 fresh documents (dlm is last-modified date in seconds since 1970), which contains query create table.

=# select title from pgweb where textcat(title,body)  @@  
          plainto_tsquery('create table') order by dlm desc limit 10;

We can create index to speedup search.

=# create index pgweb_idx on pgweb using gin(textcat(title,body));

For clarity, we omitted here coalesce function to prevent unwanted effect of NULL concatenation.

To implement FTS with ranking support we need tsvector column to store preprocessed document, which is a concatenation of title and body. We assign different labels to them to preserve information about origin of every word.

=# alter table pgweb add column fts_index tsvector;
=# update pgweb set fts_index = 
    setweight( to_tsvector( coalesce (title,'')), 'A' ) || ' ' ||
    setweight( to_tsvector(coalesce (body,'')),'D');

Then we create GIN index to speedup search.

=# create index fts_idx on pgweb using gin(fts_index);

After vacuuming, we are ready to perform full-text search.

=# select rank_cd(fts_index, q)as rank, title from pgweb, 
   plainto_tsquery('create table') q  
   where q @@ fts_index order by rank desc limit 10;

Prev	Home	Next
Appendixes	Up	FTS Parser Example