| Full-Text Search in PostgreSQL: A Gentle Introduction | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Chapter 2. FTS Operators and Functions | Fast Forward | Next |
Function ts_debug allows easy testing your full-text
configuration.
ts_debug( [cfgname | oid ],document TEXT) RETURNS SETOF tsdebugIt displays information about every token from document as they produced by a parser and processed by dictionaries as it was defined in configuration, specified by cfgname or oid.
tsdebug type defined as
CREATE TYPE tsdebug AS (
"Alias" text,
"Description" text,
"Token" text,
"Dicts list" text[],
"Lexized token" textFor demonstration of how function ts_debug works we
first create public.english configuration and
ispell dictionary for english language. You may skip test step and
play with standard english configuration.
CREATE FULLTEXT CONFIGURATION public.english LIKE pg_catalog.english WITH MAP AS DEFAULT;
CREATE FULLTEXT DICTIONARY en_ispell
OPTION 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
StopFile="/usr/local/share/dicts/english.stop"'
LIKE ispell_template;
ALTER FULLTEXT MAPPING ON public.english FOR lword WITH en_ispell,en_stem;=# select * from ts_debug('public.english','The Brightest supernovaes');
Alias | Description | Token | Dicts list | Lexized token
-------+---------------+-------------+---------------------------------------+---------------------------------
lword | Latin word | The | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {}
blank | Space symbols | | |
lword | Latin word | Brightest | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {bright}
blank | Space symbols | | |
lword | Latin word | supernovaes | {public.en_ispell,pg_catalog.en_stem} | pg_catalog.en_stem: {supernova}
(5 rows)In this example, the word 'Brightest' was recognized by a parser as a Latin word (alias lword) and came through a dictionaries public.en_ispell,pg_catalog.en_stem. It was recognized by public.en_ispell, which reduced it to the noun bright. Word supernovaes is unknown for public.en_ispell dictionary, so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.en_stem is a stemming dictionary and recognizes everything, that is why it placed at the end the dictionary stack).
The word The was recognized by public.en_ispell dictionary as a stop-word (Section 1.3.6) and will not indexed.
You can always explicitly specify what columns you want to see
=# select "Alias", "Token", "Lexized token"
from ts_debug('public.english','The Brightest supernovaes');
Alias | Token | Lexized token
-------+-------------+---------------------------------
lword | The | public.en_ispell: {}
blank | |
lword | Brightest | public.en_ispell: {bright}
blank | |
lword | supernovaes | pg_catalog.en_stem: {supernova}
(5 rows)