2007-06-11

PostgreSQL MVCC

Oiginally, multilple-versions in Postgres was because of Time Travel and to avoid having WAL. Time Travel was removed in 1997 (check archives), pg had only committed log (pg_log) with 2bits per transactions for statuses(in-progress, committed,abort) and no WAL. Its no-overwrite storage 'remembers' all versions (tuples) of table rows (it served the same purpose as WAL, Stonebraker 1990), so there was no need for undo changes of aborted transactions (just mark them aborted in pg_log), while redo after crash recovery was instant, since previous versions are already available (mark all transactions in-progress as aborted). To collect garbage (dead tuples) there was special process - vacuum. Oracle that time already was mvcc and had WAL.

Vadim Mikheev (http://www.pgsql.ru/db/mw/msg.html?mid=1248761) proposed MVCC using existing multi-versionning technique.

In original postgres there was dedicated vacuum process...
Vacuuming without human administration is possible but
in any case commit in non-overwriting system requires
~2 data block writes (first - to write changes, second - to
write updated xmin/xmax statuses). In WAL systems only
1 data block write required...

Ok, we have to decide two issues about what would we like
to use in future:

1. type of storage manager/transaction system -

   WAL or non-overwriting.

2. type of concurrency/consistency control -

   Locking or multi-versions.

These are quite different issues!

Oracle is WAL and multi-version system!

We could implement multi-version control now and switch
to WAL latter...

If we decide that locking is ok for concurrency/consistency
then it's better to switch to WAL before implementing LLL.

I personally very like multi-versions...

Comments/votes..?