Last edit
Changed: 6c6
< A list of current glitches in parser (available from CVS HEAD).
to
> A list of current issues in parser (available from CVS HEAD).
Added: 7a8,21
> Multiple consecutive slashes ('////'): broken
> <pre>
> test=# select * from parse('~//downloads////qq');
> tokid | token
> -------+------------
> 12 | ~
> 12 | /
> 19 | /downloads
> 12 | /
> 12 | /
> 12 | /
> 19 | /qq
> (7 rows)
> </pre>
Changed: 36c50
< ~ in path: broken ?
to
> ~ in path: broken (FIXED)
Changed: 44c58
< version: broken
to
> version: broken (FIXED)
This is a completely rewritten parser for tsearch2 with full UTF8 support. Parser uses finite-state automata technique and expected to be flexible and compatible with old tsearch2 parser (fixed some errors).
A list of current issues in parser (available from CVS HEAD).
Multiple consecutive slashes ('////'): broken
test=# select * from parse('~//downloads////qq'); tokid | token -------+------------ 12 | ~ 12 | / 19 | /downloads 12 | / 12 | / 12 | / 19 | /qq (7 rows)
We consider '_' as space symbol
test=# select * from parse('a_b_c'); tokid | token -------+------- 1 | a 12 | _ 1 | b 12 | _ 1 | c
XHTML tag: broken (FIXED)
test=# select * from parse('<br/>'); tokid | token -------+------- 12 | < 1 | br 12 | />
word…: broken (FIXED)
test=# select * from parse('etc...'); tokid | token -------+------- 19 | etc.. 12 | .
~ in path: broken (FIXED)
test=# select * from parse('~/downloads/Harry_Potter.avi'); tokid | token -------+----------------------------- 12 | ~ 19 | /downloads/Harry_Potter.avi
version: broken (FIXED)
test=# select * from parse('-1.2.3'); tokid | token -------+------- 20 | -1.2 12 | . 22 | 3
but see below:
test=# select * from parse('version-1.2.3'); tokid | token -------+--------------- 15 | version-1.2.3 11 | version 12 | - 8 | 1.2.3
Backslash(\) handling: broken (BRR)
select * from parse('a \ b '); tokid | token -------+------- 1 | a 12 | 1 | b 12 |