GiST Concurrency Recovery

GiST Concurrency & Recovery support in PostgreSQL

UPDATES !

We have submitted all changes to CVS HEAD for upcoming 8.1 release. GiST is now concurrent and have recovery support !

The gist of our work is that all GiST based extensions, such as tsearch2, ltree, intarray, PosGIS automatically inherit concurrency and recovery support after upgrading of PostgreSQL software and thereby provides applications an industrial strength. During our work and extensible testing we found and fixed some ancient bugs (not only in GiST code) with one of them was very critical.

Many thanks to Paul Ramsey from Refractions.net and PosGIS community for financial support of this challengeable project.


The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries for specific domain experts, not a database one.

Concurrency control ensures that individual users see consistent states of the database even if there are many interleaved operations. Recovery ensures that the database is fault tolerant, i.e., the database state is not corrupted as the result of a software, system, or media failure. These functionalities in PostgreSQL allow application to be developed without explicit concern for concurrency and fault tolerant.

However, current GiST implementation (as of 8.0.3) still doesn't support concurrent access, transactional isolation or recovery, which PostgreSQL provides for B-tree access method.

For example, PostgreSQL PITR feature allows online backup but not for GiST based indices, because GiST isn't wal-enabled ! You have to recreate them manually, which, of course, doesn't suited for 24x7 enterprise databases. Moreover, GiST doesn't supports recovery after crash (power outage, for example), i.e. index will be broken if there were update operations when crashing occured.

GiST doesn't supports concurrency and this means update operation should get exclusive lock on whole index preventing other process from accessing that table to maintain data consistency, which adversely affects performance. ( Standard behaviour in PostgreSQL (MVCC) is that reading never blocks writing and writing never blocks reading.)

# select amname,amconcurrent from pg_am;
 amname | amconcurrent 
--------+--------------
 rtree  | f
 btree  | t
 hash   | t
 gist   | f
(4 rows)

It's possible to set amconcurrent for gist and run several concurrent updates (inserts,deletes,updates) to get unpredictable and wrong results !

We recognize that current GiST doesn't conforms to enterprize level and our people starting experience difficulties using our extensions in production environment. So, we start working on GiST concurrency and recovery for 8.1 release. Our plan is to implement basic support prior feature freeze (July 1) and start testing, fixing bugs in beta period.

  1. Recovery (online backup and crash recovery) - submitted to CVS
  2. Vacuum (full) now supports space reusing for GiST indices - submitted to CVS
  3. Concurrency - submitted to CVS
  4. Regression tests

Keep in mind, this work is very complex and touched a lot of postgresql internals. Testing will require a lot of time and we need support from community to keep our attention on GiST.

References:

Appendix 1:

Test example of concurrent inserting (4 processes, 300,000 rows) into b-tree index and gist in PostgreSQL 8.0.3.

Information about locks could be obtained using following query:

test=# select  
        l.mode as lock_mode,
        l.granted as lock_granted,
        c.relname as locked_relation,
        c.relnamespace as locked_relnamespace,
        c.reltype as locked_reltype 
        from pg_stat_activity s,
        pg_locks l,
        pg_class c
where             
        l.pid = s.procpid
and                      
        l.relation = c.oid
order by age(s.query_start) desc;

B-tree behaves as expected with nice performance:

    lock_mode     | lock_granted | locked_relation  | locked_relnamespace | locked_reltype 
------------------+--------------+------------------+---------------------+----------------
 RowExclusiveLock | t            | idx              |                2200 |              0      
 RowExclusiveLock | t            | wow              |                2200 |          38457     
 RowExclusiveLock | t            | wow              |                2200 |          38457     
 RowExclusiveLock | t            | idx              |                2200 |              0      
 RowExclusiveLock | t            | wow              |                2200 |          38457     
 RowExclusiveLock | t            | wow              |                2200 |          38457    

real    1m54.260s
user    0m33.281s
sys     0m2.844s

GiST works as in serializable mode - only one process could update index at the same time, which leads to poor performance. Performance degradation (in compare to B-tree) should be noticeable for more concurrent process on multiprocessor servers.

      lock_mode      | lock_granted | locked_relation  | locked_relnamespace | locked_reltype 
---------------------+--------------+------------------+---------------------+----------------
 RowExclusiveLock    | t            | wow              |                2200 |          38460
 AccessExclusiveLock | f            | idx              |                2200 |              0
 RowExclusiveLock    | t            | wow              |                2200 |          38460
 AccessExclusiveLock | f            | idx              |                2200 |              0
 RowExclusiveLock    | t            | wow              |                2200 |          38460
 AccessExclusiveLock | t            | idx              |                2200 |              0
 RowExclusiveLock    | t            | wow              |                2200 |          38460

real    3m21.431s
user    0m38.993s
sys     0m2.954s