| Full-Text Search in PostgreSQL: A Gentle Introduction | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Fast Forward | Next | |
This tutorial is about how to setup typical FTS application using PostgreSQL.
We create our configuration pg, which will be default for locale ru_RU.UTF-8. To be safe, we do this in transaction.
begin; CREATE FULLTEXT CONFIGURATION public.pg LOCALE 'ru_RU.UTF-8' LIKE english WITH MAP AS DEFAULT;
We'll use postgresql specific dictionary using synonym template dictionary and store it under PG_ROOT/share/dicts_data directory. The dictionary looks like:
postgres pg pgsql pg postgresql pg
CREATE FULLTEXT DICTIONARY pg_dict OPTION 'pg_dict.txt' LIKE synonym;
Register ispell dictionary en_ispell using ispell_template template.
CREATE FULLTEXT DICTIONARY en_ispell
OPTION 'DictFile="english-utf8.dict",
AffFile="english-utf8.aff",
StopFile="english-utf8.stop"'
LIKE ispell_template;Use the same stop-word list for snowball stemmeren_stem, which is available on default.
ALTER FULLTEXT DICTIONARY en_stem SET OPTION 'english-utf8.stop';
Modify mappings for Latin words for configuration 'pg'
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword
WITH pg_dict,en_ispell,en_stem;We won't index/search some tokens
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
Now, we could test our configuration.
select * from ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');
end;
We have a table pgweb, which contains 11239 documents from PostgreSQL web site. Only relevant columns are shown.
=# \d pgweb
Table "public.pgweb"
Column | Type | Modifiers
-----------+-------------------+-----------
tid | integer | not null
path | character varying | not null
body | character varying |
title | character varying |
dlm | integer |First we should take care about default FTS configuration - we want our public.pg to be default. To do so, we need to redefine search_path, since we already have predefined default full-text configuration (for ru_RU.UTF-8 locale) in pg_catalog.
=# \dF
pg_catalog | russian_utf8 | ru_RU.UTF-8 | Y
public | pg | ru_RU.UTF-8 | Y
=# show tsearch_conf_name;
tsearch_conf_name
-------------------------
pg_catalog.russian_utf8
=# SET search_path=public, pg_catalog;
=# show tsearch_conf_name;
tsearch_conf_name
-------------------
public.pg
The very simple full-text search without ranking is already available here. Select top 10 fresh documents (dlm is last-modified date in seconds since 1970), which contains query create table.
=# select title from pgweb where textcat(title,body) @@
plainto_tsquery('create table') order by dlm desc limit 10;We can create index to speedup search.
=# create index pgweb_idx on pgweb using gin(textcat(title,body));
For clarity, we omitted here coalesce function to prevent
unwanted effect of NULL concatenation.
To implement FTS with ranking support we need tsvector column to store preprocessed document, which is a concatenation of title and body. We assign different labels to them to preserve information about origin of every word.
=# alter table pgweb add column fts_index tsvector;
=# update pgweb set fts_index =
setweight( to_tsvector( coalesce (title,'')), 'A' ) || ' ' ||
setweight( to_tsvector(coalesce (body,'')),'D');Then we create GIN index to speedup search.
=# create index fts_idx on pgweb using gin(fts_index);
After vacuuming, we are ready to perform full-text search.
=# select rank_cd(fts_index, q)as rank, title from pgweb,
plainto_tsquery('create table') q
where q @@ fts_index order by rank desc limit 10;