unaccent

Unaccent

This module provides unaccent text search dictionary and function to remove accents from input text.

Unaccent dictionary is a filtering dictionary, i.e. its output is always passed to the next dictionary (if any), contrary to the standard behaviour. Currently, it supports most important accents from european languages. Edit accents.src file (should be UTF-8 encoded) to modify accents.

Compatibility: PostgreSQL version 8.4+

Installation:

  cd unaccent && make && make install
  psql DB_NAME < unaccent.sql

Examples:

1. Unaccent dictionary does nothing and returns NULL. (lexeme 'Hotels' will be passed to the next dictionary if any)

=# select ts_lexize('unaccent','Hotels') is NULL;
 ?column? 
----------
 t
(1 row)

2. Unaccent dictionary removes accent and returns 'Hotel'. (lexeme 'Hotel' will be passed to the next dictionary if any)

=# select ts_lexize('unaccent','Hôtel') is NULL; 
 ?column? 
----------
 f
(1 row)

3. Simple configuration for french language

CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
ALTER TEXT SEARCH CONFIGURATION fr
	ALTER MAPPING FOR hword, hword_part, word
	WITH unaccent, french_stem;

=# select to_tsvector('fr','Hôtels de la Mer');
    to_tsvector    
-------------------
 'hotel':1 'mer':4
(1 row)

'Hôtels'-> 'Hotels' -> 'hotel'
unaccent   french_stem     


=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column? 
----------
 t
(1 row)
=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
      ts_headline       
------------------------
  <b>Hôtel</b> de la Mer
(1 row)

Functions:

	text unaccent(text) - remove accents in input text
=# select unaccent('Hôtels');
 unaccent 
----------
 Hotels
(1 row)