Gevel

Gevel

Gevel contrib module provides several functions useful for analyzing GIN and GiST indexes.

Online version of this document is available http://www.sai.msu.su/~megera/wiki/Gevel

Caution: This module was designed for developers of GiST/GIN based indices !

Authors

  • Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
  • Teodor Sigaev <teodor@sigaev.ru>, Moscow, Moscow University,Russia

License

Stable version, included into PostgreSQL distribution, released under BSD license. Development version, available from this site, released under the GNU General Public License, version 2 (June 1991)

Downloads

Old version of gevel is available from http://www.sai.msu.su/~megera/postgres/gist/gevel/

Development version is available from CVS:

cvs -d :pserver:anoncvs@sigaev.ru:/root login 
(just enter an empty password)
cvs -d :pserver:anoncvs@sigaev.ru:/root co gevel

Installation

% cd PGSQLSRC/contrib
% tar xzvf gevel.tar.gz
% make
% make install
% make installcheck
% psql regression < gevel.sql

Examples

All examples below require installing contrib/rtree_gist module and already installed gevel module.

cd contrib/rtree_gist
make
make install
make installcheck
psql regression < gevel.sql

Now you have regression database with two tables and indices:

regression=# \pset tuples_only 1
Showing only tuples.
regression=#  \d
          List of relations
 Schema |  Name   | Type  |  Owner   
--------+---------+-------+----------
 public | boxtmp  | table | postgres
 public | polytmp | table | postgres
(2 rows)
regression=# \di
             List of relations
 Schema | Name | Type  |  Owner   |  Table  
--------+------+-------+----------+---------
 public | bix  | index | postgres | boxtmp
 public | pix  | index | postgres | polytmp
(2 rows)

  • gist_stat(INDEXNAME) - show some statistics about GiST tree
regression=# select gist_stat('pix');
 Number of levels:          2
Number of pages:           30
Number of leaf pages:      29
Number of tuples:          3129
Number of leaf tuples:     3100
Total size of tuples:      137676 bytes
Total size of leaf tuples: 136400 bytes
Total size of index:       245760 bytes
  • gist_tree(INDEXNAME,MAXLEVEL) - show GiST tree up to MAXLEVEL
regression=# select gist_tree('pix',0);
 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)

Designation (from left to right):

    • 0 - page number
    • (l:0) - tree level
    • blk: 0 - block number
    • numTuple: 29 - the number of tuples
    • free: 6888b - free space in bytes
    • (15.63%) - occupied space in percents
  • gist_tree(INDEXNAME) - show full GiST tree
regression=# select gist_tree('pix');
 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)
    1(l:1) blk: 13 numTuple: 180 free: 244b(97.01%)
    2(l:1) blk: 11 numTuple: 175 free: 464b(94.32%)
    3(l:1) blk: 2 numTuple: 101 free: 3720b(54.43%)
    4(l:1) blk: 17 numTuple: 111 free: 3280b(59.82%)
    5(l:1) blk: 18 numTuple: 101 free: 3720b(54.43%)
    6(l:1) blk: 10 numTuple: 98 free: 3852b(52.82%)
    7(l:1) blk: 19 numTuple: 111 free: 3280b(59.82%)
    8(l:1) blk: 9 numTuple: 97 free: 3896b(52.28%)
    9(l:1) blk: 20 numTuple: 104 free: 3588b(56.05%)
    10(l:1) blk: 14 numTuple: 96 free: 3940b(51.74%)
    11(l:1) blk: 21 numTuple: 106 free: 3500b(57.13%)
    12(l:1) blk: 7 numTuple: 103 free: 3632b(55.51%)
    13(l:1) blk: 1 numTuple: 101 free: 3720b(54.43%)
    14(l:1) blk: 16 numTuple: 97 free: 3896b(52.28%)
    15(l:1) blk: 24 numTuple: 103 free: 3632b(55.51%)
    16(l:1) blk: 4 numTuple: 98 free: 3852b(52.82%)
    17(l:1) blk: 25 numTuple: 98 free: 3852b(52.82%)
    18(l:1) blk: 3 numTuple: 97 free: 3896b(52.28%)
    19(l:1) blk: 26 numTuple: 96 free: 3940b(51.74%)
    20(l:1) blk: 6 numTuple: 103 free: 3632b(55.51%)
    21(l:1) blk: 8 numTuple: 162 free: 1036b(87.31%)
    22(l:1) blk: 23 numTuple: 94 free: 4028b(50.66%)
    23(l:1) blk: 12 numTuple: 82 free: 4556b(44.19%)
    24(l:1) blk: 27 numTuple: 105 free: 3544b(56.59%)
    25(l:1) blk: 5 numTuple: 90 free: 4204b(48.51%)
    26(l:1) blk: 28 numTuple: 100 free: 3764b(53.90%)
    27(l:1) blk: 22 numTuple: 101 free: 3720b(54.43%)
    28(l:1) blk: 15 numTuple: 95 free: 3984b(51.20%)
    29(l:1) blk: 29 numTuple: 95 free: 3984b(51.20%)
  • gist_print(INDEXNAME) - prints objects stored in GiST tree, works only if objects in index have textual representation (type_out functions should be implemented for given object type). It's known to work with R-tree GiST based index (contrib/rtree_gist). Note, in example below, objects are of type box.
# \di 
             List of relations
 Schema | Name | Type  |  Owner   |  Table  
--------+------+-------+----------+---------
 public | bix  | index | postgres | boxtmp
 public | pix  | index | postgres | polytmp
(2 rows)
# \d pix
Index "public.pix"
 Column | Type 
--------+------
 p      | box
gist, for table "public.polytmp"

# select * from gist_print('pix') as t(level int, a box) where level =1;

for 8.1 and later this query should looks like

# select * from gist_print('pix') as t(level int, valid bool, a box) where level =1;
 level | valid |              a
-------+-------+-----------------------------
     1 | t     | (37357,50073),(34242,357)
     1 | t     | (43499,49770),(40358,43)
     1 | t     | (31193,24679),(25047,12410)
     1 | t     | (31018,12142),(25083,6)
     1 | t     | (49944,25174),(43471,12802)
     1 | t     | (12577,49757),(6302,37534)
     1 | t     | (12528,37333),(6171,24861)
     1 | t     | (50027,49751),(46817,25462)
     1 | t     | (46870,49912),(43664,25722)
     1 | t     | (24855,25574),(12447,19263)
     1 | t     | (25054,19126),(12403,12796)
     1 | t     | (32737,49923),(31178,1038)  
     1 | t     | (3184,24465),(15,81)
     1 | t     | (24951,49983),(12740,44000)
     1 | t     | (24919,43956),(12617,37901)
     1 | t     | (40387,49852),(37338,25217)
     1 | t     | (40325,24963),(37375,491)  
     1 | t     | (24919,12698),(12654,6518)
     1 | t     | (25002,6338),(12350,51)   
     1 | t     | (49985,12554),(43447,222)  
     1 | t     | (25003,37769),(12552,25573)
     1 | t     | (34270,49382),(32763,594)  
     1 | t     | (6205,50012),(3,37527)
     1 | t     | (6163,37358),(120,25034)  
     1 | t     | (12343,24542),(9295,294)
     1 | t     | (9308,24151),(6234,620)
     1 | t     | (6230,24629),(3169,108)
     1 | t     | (31179,50040),(28113,25556)
     1 | t     | (28048,49694),(25000,25000)
(29 rows)

  • Using Gevel module it's possible to visualize index tree, see for example Rtree Index.
  • gin_stat(INDEXNAME) prints statistics about each index entry in GIN index (PostgreSQL 8.2+)

For each index entry 'key' it returns set of records (key, nrows), where 'nrows' is the estimated number of rows from tables, returned by select. For example, in text search 'nrows' is the number of documents, which contains word 'key'. This statistics may be used ONLY as estimation of true counts, since it doesn't takes into account the visibility information (from heap) and uses approximate counts in case of long posting lists.

Vacuum table after table modification to obtain more accurate statistics.

Top-5 most frequent words (from 463873) :

=# SELECT * FROM gin_stat('gin_idx') as t(word text, ndoc int)  order by ndoc desc limit 5;
  word  |  ndoc
--------+--------
 page   | 340858
 figur  | 240366
 use    | 148022
 model  | 134442
 result | 129010
(5 rows)

Time: 520.714 ms

Compare exact stats, obtained using ts_stat() function, with estimated counts:

=# select a.word, b.ndoc as exact, a.estimation as estimation, 
round ( (a.estimation-b.ndoc)*100.0/a.estimation,2)||'%' as error from
(SELECT * FROM gin_stat('gin_x_idx') as t(word text, estimation int)
 order by estimation desc limit 5 ) as a, stat b where a.word = b.word;
  word  | exact  | estimation | error 
--------+--------+------------+-------
 page   | 340430 |     340858 | 0.13%
 figur  | 240104 |     240366 | 0.11%
 use    | 147132 |     148022 | 0.60%
 model  | 133444 |     134442 | 0.74%
 result | 128977 |     129010 | 0.03%
(5 rows)

Time: 550.562 ms

Example for integer arrays:

# SELECT * FROM gin_stat('gin_idx') as t(value int, nrow int) where nrow > 250;
  value | nrow
 -------+------
     31 |  254
     47 |  251
     52 |  257
     59 |  259
 (4 rows)