GinTestUpdate

Last edit

Changed:

< update tt set a=array( select(1000*random())::integer+generate_series( 0,200) )

to

> update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) )


Testing GIN update performance

For integer arrays GIN is 10x slower than GiST on update. It's about 10x faster than GiST on select. Updating of 100 rows in 100,000 takes < 0.5s for GIN index, which is quite reasonable speed.

Test environment

Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM.

PostgreSQL 8.2.4

  • shared_buffers = 256MB
  • work_mem = 8MB
  • maintenance_work_mem = 64MB
  • effective_cache_size = 256MB

Test table tt(id integer, a int[]) with GIN index created

test=# \d tt 
       Table "public.tt"
 Column |   Type    | Modifiers 
--------+-----------+-----------
 id     | integer   | 
 a      | integer[] | 
Indexes:
    "gin_idx" gin (a)

For creating GiST index

create index gist_idx on tt using gist(a gist__intbig_ops);

Load data

Example of populating a table - 10000 rows, with integer arrays cardinality 500 and maximal array length of 100.

sh ./test-id.sh  -l 100 -n 10000 -v 500 | psql test -c "copy tt from stdin with delimiter as '|'"

Example of update SQL - update first 300 rows

update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) ) 
where id < 300;

Results

Update array column by random array of length 200, vary #rows ( All timings in ms !)

Low cardinality (500)

-l 100 -n 10000 -v 500

Create index:

  • GIN 2804.709
  • GiST 1458.526
  • #updates, GIN, GiST
  • 10 - 28.916, 21.273
  • 100 - 453.513, 28.504
  • 1000 - 2824.042, 159.474
  • 1000 - 27909.337, 2092.837

-l 100 -n 100000 -v 500

Create index:

  • GIN 7669.875
  • GiST 1458.526
  • 10 - 94.637, 54.870
  • 100 - 473.569, 69.470
  • 1000 - 2407.273, 243.742
  • 10000 - 30856.287, 2155.403

Big cardinality (5000)

-l 100 -n 10000 -v 5000

Create index:

  • GIN 2303.915
  • GiST 1167.952
  • #updates, GIN, GiST
  • 10 - 40.031, 17.197
  • 100 - 194.322, 26.901
  • 1000 - 2265.735, 172.825
  • 1000 - 32416.623, 2054.121

-l 100 -n 100000 -v 5000

Create index (real):

  • GIN 9746.955
  • GiST 13600.256
  • 10 - 158.897, 47.338
  • 100 - 250.516, 65.378
  • 1000 - 2871.439, 425.907
  • 10000 - 31537.869, 2538.468

Very Big cardinality (100000)

Simulate typical news archive - 100,000 documents with length < 500 words and dictionary size 100,000 words.

-l 500 -n 100000 -v 100000

Create index:

  • GIN 146765.140
  • GiST 12721.421
  • #updates, GIN, GiST
  • 10 - 160.569, 72.390
  • 100 - 543.285, 83.026
  • 1000 - 3702.067, 284.039
  • 1000 - 32571.510, 2969.296

Search performance:

explain analyze select * from tt where a @> '{2}';

GiST:

 Bitmap Heap Scan on tt  (cost=29.27..405.87 rows=100 width=979) (actual time=68.714..6311.757 rows=678 loops=1)
   Filter: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gist_idx  (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 6312.807 ms

GIN:

 Bitmap Heap Scan on tt  (cost=150.50..527.11 rows=100 width=982) (actual time=0.610..11.277 rows=678 loops=1)
   Recheck Cond: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gin_idx  (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 11.892 ms

GiST ia a way slower than GIN index, because GiST found 10774 documents where only 678 documents are right answer and remaining are false drops !

Support script

Скрипт test-id.sh for generating test load. It's not optimized and could be horribly slow !

#!/bin/sh

# default values
# array length
MAXLEN=4
# number of rows
MAXNUM=10
# cardinality
MAXVAL=100

while getopts l:n:v: opt
do
        case "$opt" in
                l) MAXLEN="$OPTARG";;
                n) MAXNUM="$OPTARG";;
                v) MAXVAL="$OPTARG";;
        esac
done

for ((i=0;i<$MAXNUM;i++)) do
 len=$(( $RANDOM % $MAXLEN))+1
 for ((j=0;j<$len;j++)) do
         if [ $j == '0' ]; then
                val=$(( $RANDOM % $MAXVAL))
         else
                val=$val,$(( $RANDOM % $MAXVAL))
         fi
 done
 echo $i\|{$val}
done