Status: Commited to CVS HEAD
Funded by Georgia Public Library Service and LibLime, Inc.Theasaurus - is a collection of words with included information about the relationships of words and phrases, i.e., broader terms (BT), narrower terms (NT), preferred terms, non-preferred, related terms,etc.
Tsearch2's thesaurus dictionary is an extension of synonym dictionary to support phrases. Basically,thesaurus dictionary replaces all non-preferred terms by one preferred term and, optionally, preserves them for indexing. Preserving NPT allows to use relationships (BT, NT) at query time. Thesaurus used when indexing, so any changes in thesaurus require reindexing ( don't confuse with query rewriting, which used in query stage and rules could be changed online without reindexing ).
Thesaurus is a plain file of the following format:
input word(s) : indexed word(s) ...............................
Colon (:) symbol used as a delimiter.
Note: thesaurus dictionary looks for the most longest match !
tsearch2 comes with thesaurus template, which could be used to define new dictionary:
INSERT INTO pg_ts_dict (SELECT 'tz_simple', dict_init, 'DictFile="/path/to/tz_simple.txt",' 'Dictionary="en_stem"', dict_lexize FROM pg_ts_dict WHERE dict_name = 'thesaurus_template');
Here:
Now, it's possible to use tz_simple in pg_ts_cfgmap, for example:
update pg_ts_cfgmap set dict_name='{tz_simple,en_stem}' where ts_name = 'default_russian' and tok_alias in ('lhword', 'lword', 'lpart_hword');
tz_simple:
one : 1 two : 2 one two : 12
To see, how thesaurus works, one could use to_tsvector, to_tsquery or plainto_tsquery functions:
=# select plainto_tsquery('default_russian',' one day is oneday'); plainto_tsquery ------------------------ '1' & 'day' & 'oneday' =# select plainto_tsquery('default_russian','one two day is oneday'); plainto_tsquery ------------------------- '12' & 'day' & 'oneday'