GinFastUpdate

GIN Fast Update

The idea of GIN fast update is to accumulate new index rows in separate location and build index at vacuum time. That should greatly improve insert time, but increase search time, since search should lookup both that location and regular inverted index.

New option for CREATE INDEX - fastupdate={[true|1|on]|false|0|off}

Testing environment - GIN index on int[]:

  • # uniq elements - 500,000
  • # array length - random value from range [50-100]
  • # number of rows - 1,000,000
postgres=# \d tt
       Table "public.tt"
 Column |   Type    | Modifiers
--------+-----------+-----------
 id     | integer   |
 a      | integer[] |
Indexes:
    "gin_idx" gin (a)

postgres=# \d tt2
       Table "public.tt2"
 Column |   Type    | Modifiers
--------+-----------+-----------
 id     | integer   |
 a      | integer[] |
Indexes:
    "gin_idx_nfu" gin (a) WITH (fastupdate=false)

One can consider this as emulation of blog with a rich dictionary. We emulate real-time environment using concurrent clients, which read and creates new documents. To improve performance 'vacuum analyze' runs after every 50 modifications (inserts/updates). Parent process forks clients, which run different queries (depending on given weights, for example, read/insert = 100/1), and collects stats from childs. Also, it periodically runs 'vacuum analyze' and prints global stats every 1000 operations.

Script runs forever, typical output looks like (times in sec):

select/insert/update = 1:0.01:0
Total   = 700139:counts: 693220 6919 0 139
TotalDay= 75180250
Elapse= 804.626334:Sum= 8096.54966000014:7736.28839500003 176.893917 0 183.367348

TotalDay is the estimation of total number of queries per day. counts: 634872 6254 0 126 shows counters for select/insert/update/vacuum operations. Elapse is the time (s) for running script, Sum is the sum of all individual timings for selects,inserts,updates and vaccuum.

Environment: My desktop machine with 4GB RAM, Core2Duo 2.66GHz, 2.6.24.3-smp, busy firefox (30+ tabs), icewm, mplayer running, PostgreSQL CVS head + GIN fastupdate v. 0.13.

Summary:

select/insert/update = 1:0.01:0
Total   = 700139:counts: 693220 6919 0 139
TotalDay= 75180250
Elapse= 804.626334:Sum= 8096.54966000014:7736.28839500003 176.893917 0 183.367348

Total   = 700133:counts: 693500 6633 0 133
TotalDay= 80674016
Elapse= 749.826202:Sum= 7592.16917599985:7280.20674199988 168.201272 0 143.761162
------------------------------------------------------------------------
select/insert/update = 1:0.1:0
Total   = 701276:counts: 637520 63756 0 1276
TotalDay= 32229391
Elapse= 1879.968669:Sum= 15753.7955909996:12785.8333579997 1459.52234 0 1508.439893

Total   = 701273:counts: 637660 63613 0 1273
TotalDay= 32900288
Elapse= 1841.624821:Sum= 14951.7257170003:11652.6948879999 1838.66890699999 0 1460.361922
------------------------------------------------------------------------
select/insert/update = 1:1:0
Total   = 707177:counts: 348360 358817 0 7177
TotalDay= 9119596
Elapse= 6699.868084:Sum= 13777.8184119997:5163.04862900011 2082.48855400007 0 6532.281229

Total   = 707171:counts: 348620 358551 0 7171
TotalDay= 9095799
Elapse= 6717.339423:Sum= 14320.8554989999:4774.93323400009 3002.11292499999 0 6543.80934000001
-------------------------------------------------------------------------
select/insert/update = 1:10:0
Total   = 712989:counts: 63540 649449 0 12989
TotalDay= 4087770
Elapse= 15069.889573:Sum= 18328.8505179987:2181.982464 1139.24994300097 0 15007.618111

Total   = 712970:counts: 64460 648510 0 12970
TotalDay= 4095267
Elapse= 15041.900094:Sum= 19889.9708929997:2507.75610700001 2404.95518500038 0 14977.2596010001