tsearch2UTF8Test

Last edit

Changed: 6c6

< A list of current glitches in parser (available from CVS HEAD).

to

> A list of current issues in parser (available from CVS HEAD).

Added: 7a8,21

> Multiple consecutive slashes ('////'): broken
> <pre>
> test=# select * from parse('~//downloads////qq');
> tokid | token
> -------+------------
> 12 | ~
> 12 | /
> 19 | /downloads
> 12 | /
> 12 | /
> 12 | /
> 19 | /qq
> (7 rows)
> </pre>

Changed: 36c50

< ~ in path: broken ?

to

> ~ in path: broken (FIXED)

Changed: 44c58

< version: broken

to

> version: broken (FIXED)


Testing new tsearch2 parser with full UTF8 support

This is a completely rewritten parser for tsearch2 with full UTF8 support. Parser uses finite-state automata technique and expected to be flexible and compatible with old tsearch2 parser (fixed some errors).

A list of current issues in parser (available from CVS HEAD).

Multiple consecutive slashes ('////'): broken

test=# select * from parse('~//downloads////qq');
 tokid |   token    
-------+------------
    12 | ~
    12 | /
    19 | /downloads
    12 | /
    12 | /
    12 | /
    19 | /qq
(7 rows)

We consider '_' as space symbol

test=# select * from parse('a_b_c');
 tokid | token 
-------+-------
     1 | a
    12 | _
     1 | b
    12 | _
     1 | c

XHTML tag: broken (FIXED)

test=# select * from parse('<br/>');
 tokid | token 
-------+-------
    12 | <
     1 | br
    12 | />

word…: broken (FIXED)

test=# select * from parse('etc...');
 tokid | token 
-------+-------
    19 | etc..
    12 | .

~ in path: broken (FIXED)

test=# select * from parse('~/downloads/Harry_Potter.avi');
 tokid |            token            
-------+-----------------------------
    12 | ~
    19 | /downloads/Harry_Potter.avi

version: broken (FIXED)

test=# select * from parse('-1.2.3');
 tokid | token 
-------+-------
    20 | -1.2
    12 | .
    22 | 3

but see below:

test=# select * from parse('version-1.2.3');
 tokid |     token     
-------+---------------
    15 | version-1.2.3
    11 | version
    12 | -
     8 | 1.2.3

Backslash(\) handling: broken (BRR)

select * from parse('a \ b ');
 tokid | token 
-------+-------
     1 | a
    12 |   
     1 | b
    12 |