Tsearch2 with UTF-8
- Get myspell dictionary from OO site (el_GR.zip)
Hmm, this dictionary is just a list of words without ispell tags !
ok, meanwhile will use available
ispell dicts.
dict.el.utf8.tar.gz
is what I used.
So, skip convertation to ispell format.
- Convert to ispell format.
Copy el_GR.zip myspell to my2ispell subdirectory in tsearch2 distribution.
gmake ZIPFILE=el_GR LANGUAGE=el_GR
- Convert dictionary files to UTF-8.
iconv -t utf-8 -f iso-8859-7 el_GR.aff > el_GR_utf8.aff
iconv -t utf-8 -f iso-8859-7 el_GR.dict > el_GR_utf8.dict
- Move them to some directory (/tmp/greek.utf8)
( mkdir -p /tmp/greek.utf8 && mv el_GR_utf8.aff el_GR_utf8.dict /tmp/greek.utf8 )
- Configure postgres and test database (make sure, you have installed
PostgreSQL 8.2dev and tsearch2 from distribution).
initdb -D /usr/local/pgsql-dev/data.el_utf8 --locale=el_GR.utf8
pg_ctl -D /usr/local/pgsql-dev/data.el_utf8 start
createdb test
psql test < /usr/local/pgsql-dev/share/contrib/tsearch2.sql
- Load tsearch2 configuration (tscfg-el.utf8.sql).
psql test < tscfg-el.utf8.sql
That's all ! Now we could test installation.
Please, notice, tsearch2
configured (tscfg-el.utf8.sql) for testing purposes only !
test=# set client_encoding='ISO-8859-7';
SET
test=# select dict_name,dict_initoption from pg_ts_dict where dict_name = 'el_ispell';
el_ispell | DictFile="tmp/greek.utf8/el_GR_utf8.dic",AffFile="tmp/greek.utf8/el.u8.aff"
test=# select * from pg_ts_cfgmap where ts_name='utf8';
ts_name | tok_alias | dict_name
---------+--------------+-------------
utf8 | nlhword | {el_ispell}
utf8 | nlword | {el_ispell}
utf8 | nlpart_hword | {el_ispell}
(3 rows)
test=# select lexize('el_ispell','αγαπηθείτε');
lexize
-----------
{αγαπηθώ}
(1 row)
test=# select lexize('el_ispell','τελευταίως');
lexize
--------------
{τελευταίος}
(1 row)