2009-07-27

Rb-tree and GIN again !

Yesterday I again was beaten by GIN problem (PostgreSQL 8.4) with sorted data, see my post about rbtree. I tried to index following table, which represents interlinks between wikipedia articles.

wplinks=# \d links2
     Table "public.links2"
 Column |   Type    | Modifiers
--------+-----------+-----------
 id     | integer   |
 idout  | integer[] |
 idin   | integer[] |

To my surprise timings for indexes creation were very different - ~ 6x !

wplinks=# vacuum analyze links2;
VACUUM
Time: 65080.606 ms
wplinks=# create index idout_idx on links2 using gin(idout);
CREATE INDEX
Time: 6033572.537 ms
wplinks=# create index idin_idx on links2 using  gin(idin);
vacuum analyze;
CREATE INDEX
Time: 35946958.226 ms

Today, I tried rbtree version of GIN and got very nice result - 18x better creation time for idin and almost 4x better than old idout.

wplinks=# create index idout_rbtree_idx on links2 using  gin(idout);
CREATE INDEX
Time: 1560819.955 ms
wplinks=# create index idin_rbtree_idx on links2 using  gin(idin);
CREATE INDEX
Time: 1994567.418 ms

Notice, that 8.4 already have Tom's hack to protect GIN against skewed data. Certainly, we need rbtree in PostgreSQL !