| Full-Text Search in PostgreSQL: A Gentle Introduction | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Chapter 2. FTS Operators and Functions | Fast Forward | Next |
CREATE FUNCTION parse( parser, document TEXT ) RETURNS SETOF tokenout
Parses the given document and returns a series of records, one for each token produced by parsing. Each record includes a tokid giving its type and a token which gives its content.
postgres=# select * from parse('default','123 - a number');
tokid | token
-------+--------
22 | 123
12 |
12 | -
1 | a
12 |
1 | numberCREATE FUNCTION token_type( parser ) RETURNS SETOF tokentype
Returns a table which defines and describes each kind of token the parser may produce as output. For each token type the table gives the tokid which the parser will label each token of that type, the alias which names the token type, and a short description for the user to read.
postgres=# select * from token_type('default');
tokid | alias | description
-------+--------------+-----------------------------------
1 | lword | Latin word
2 | nlword | Non-latin word
3 | word | Word
4 | email | Email
5 | url | URL
6 | host | Host
7 | sfloat | Scientific notation
8 | version | VERSION
9 | part_hword | Part of hyphenated word
10 | nlpart_hword | Non-latin part of hyphenated word
11 | lpart_hword | Latin part of hyphenated word
12 | blank | Space symbols
13 | tag | HTML Tag
14 | protocol | Protocol head
15 | hword | Hyphenated word
16 | lhword | Latin hyphenated word
17 | nlhword | Non-latin hyphenated word
18 | uri | URI
19 | file | File or path name
20 | float | Decimal notation
21 | int | Signed integer
22 | uint | Unsigned integer
23 | entity | HTML Entity