This tutorial is about how to setup typical FTS application using PostgreSQL.
We create our configuration pg, which will be default for locale ru_RU.UTF-8. To be safe, we do this in transaction.
begin; CREATE FULLTEXT CONFIGURATION public.pg LOCALE 'ru_RU.UTF-8' LIKE english WITH MAP AS DEFAULT;
We'll use postgresql specific dictionary using synonym template dictionary and store it under PG_ROOT/share/dicts_data directory. The dictionary looks like:
postgres pg pgsql pg postgresql pg
CREATE FULLTEXT DICTIONARY pg_dict OPTION 'pg_dict.txt' LIKE synonym;
Register ispell dictionary en_ispell using ispell_template template.
CREATE FULLTEXT DICTIONARY en_ispell OPTION 'DictFile="english-utf8.dict", AffFile="english-utf8.aff", StopFile="english-utf8.stop"' LIKE ispell_template;
Use the same stop-word list for snowball stemmeren_stem, which is available on default.
ALTER FULLTEXT DICTIONARY en_stem SET OPTION 'english-utf8.stop';
Modify mappings for Latin words for configuration 'pg'
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword WITH pg_dict,en_ispell,en_stem;
We won't index/search some tokens
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
Now, we could test our configuration.
select * from ts_debug('public.pg', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software: PostgreSQL 8.2. '); end;
We have a table pgweb, which contains 11239 documents from PostgreSQL web site. Only relevant columns are shown.
=# \d pgweb Table "public.pgweb" Column | Type | Modifiers -----------+-------------------+----------- tid | integer | not null path | character varying | not null body | character varying | title | character varying | dlm | integer |
First we should take care about default FTS configuration - we want our public.pg to be default. To do so, we need to redefine search_path, since we already have predefined default full-text configuration (for ru_RU.UTF-8 locale) in pg_catalog.
=# \dF pg_catalog | russian_utf8 | ru_RU.UTF-8 | Y public | pg | ru_RU.UTF-8 | Y =# show tsearch_conf_name; tsearch_conf_name ------------------------- pg_catalog.russian_utf8 =# SET search_path=public, pg_catalog; =# show tsearch_conf_name; tsearch_conf_name ------------------- public.pg
The very simple full-text search without ranking is already available here. Select top 10 fresh documents (dlm is last-modified date in seconds since 1970), which contains query create table.
=# select title from pgweb where textcat(title,body) @@ plainto_tsquery('create table') order by dlm desc limit 10;
We can create index to speedup search.
=# create index pgweb_idx on pgweb using gin(textcat(title,body));
For clarity, we omitted here
coalesce function to prevent
unwanted effect of NULL concatenation.
To implement FTS with ranking support we need tsvector column to store preprocessed document, which is a concatenation of title and body. We assign different labels to them to preserve information about origin of every word.
=# alter table pgweb add column fts_index tsvector; =# update pgweb set fts_index = setweight( to_tsvector( coalesce (title,'')), 'A' ) || ' ' || setweight( to_tsvector(coalesce (body,'')),'D');
Then we create GIN index to speedup search.
=# create index fts_idx on pgweb using gin(fts_index);
After vacuuming, we are ready to perform full-text search.
=# select rank_cd(fts_index, q)as rank, title from pgweb, plainto_tsquery('create table') q where q @@ fts_index order by rank desc limit 10;