2009-03-13

New cool text search features for 8.4+

  • We added support of filtering dictionaries Contrary to standard behaviour output from filtering dictionary is always passes to the next dictionary (if any), which allows useful lexeme preprocessing, for example, remove accents without any issues in ts_headline() function.
  • unaccent dictionary/function. Uses suffix tree for performance (about 25 times faster than variant with built-in translate() function).
  • Add prefix search support to the synonym dictionary. Star sign '*' at the end of definition word indicates, that definition word is a prefix and to_tsquery() function will transform that definition to the prefix search format. Notice, it is ignored in to_tsvector().
> cat $SHAREDIR/tsearch_data/synonym_sample.syn
postgres        pgsql
postgresql      pgsql
postgre pgsql
gogle   googl
indices index*

=# create text search dictionary syn( template=synonym,synonyms='synonym_sample');
=# select ts_lexize('syn','indices');
 ts_lexize
-----------
 {index}
(1 row)
=# create text search configuration tst ( copy=simple);
=# alter text search configuration tst alter mapping for asciiword with syn;
=# select to_tsquery('tst','indices');
 to_tsquery
------------
 'index':*
(1 row)
=# select 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');
 ?column?
----------
 t
(1 row)

=# select to_tsvector('tst','indices');
 to_tsvector
-------------
 'index':1
(1 row)