GiST internals

GiST internals

GiST internal interface

GiST interface to database is conformed to PostgreSQL index interface agreements. Source files are located under src/backend/access/gist and header files - under src/include/access in PostgreSQL source tree. Basic algorithms implemented in GiST are described here. Application level GiST interface is described in introduction and GiST Programming Tutorial (in Russian).

gistbuild
Creates index.
Parameters: table and index descriptors
gistinsert
Insert new value to the index.
Parameters: index descriptor, value, pointer to table record
gistbulkdelete
Dead tuples cleanup. Tree-walking and calling callback function to recognize does tuple is dead or live.
Parameters: index descriptor, callback function, callback state
gistvacuumcleanup
Removes empty pages and restore broken (after crash) tuples. Also rebuilds internal keys if needed (index is locked when rebuilding).
Parameters: index descriptor.
gistbeginscan, gistendscan, gistrescan
These are base methods for search in index - create scan, finish scan, repeat scan.
Parameters:
  • gistbeginscan - index descriptor, search keys (may be omitted)
  • gistendscan, gistrescan - scan
gistmarkpos
Save scan state.
Parameters: scan
gistrestrpos
Restore scan state.
Parameters: scan
gistgettuple, gistgetmulti
Returns one or several pointers to records in table, according search key, specified in gistbeginscan.
Parameters: scan
gist_redo
Restore one WAL record.
Parameters: LSN (LogSequenceNumber), WAL record
gist_desc
Describes WAL record (debug only)
gist_xlog_startup
Prepare to recover using WAL
gist_xlog_cleanup
Finalize recovering.

NOTES:

  • Functions gistmarkpos, gistrestrpos and gistrescan are required by PostgreSQL optimizer and executor and may be not needed for other database engine.
  • GiST has no direct access to tables, only via pointers ( the number of page and the number of record on the page) to records. So, callback function in gistbulkdelete uses pointer to record and this function should recognize if pointer is still valid (points to the record) or not.

GiST requirements to memory manager

Memory manager should provide:

  • malloc/realloc/free functions
  • create,switch, clean and destroy memory context, used by alloc. functions. At the same time there are could be several memory contexts.

GiST requirements to lock manager

  • share/exclusive locks at the page level. Deadlock detector NOT needed.
  • share/exclusive locks at the index level. Deadlock detector NOT needed.

GiST requirements to page manager

  • Ability to easy find page by its number/index
  • When creating new pages it's very desirable to use pages marked as free by gistvacuumcleanup

GiST requirements to WAL

  • LSN (Log Sequence Number) - monotonically increased (on the whole period of database life) sequence number (PostgreSQL uses 64-bit integer), that identifies log record.
  • Function which writes WAL record and returns its LSN

NOTE: It's possible to implement GiST concurrency without WAL. In that case we need a sequencer which generates monotonically increased integers.

What we need to know

  • Agreements on passing keys (by value, by pointers, length, memory usage,…)
  • It'd be nice to know specification (names, descriptions) of WAL, lock and page manager interfaces.
  • Specifications of functions (if any ) for converting value → tuple, tuple → page.