Zen: TODO

PostgreSQL TODO

Time is approximated and estimated for two of us

Short list of todo items we plan for PostgreSQL 8.5

Developers: Oleg Bartunov, Teodor Sigaev

Looking for support !!!

Dictionaries (ispell) caching - 1 month. Currently, we have to initialize dictionaries for every connections, which may be quite long and consume a lot of memory. Ideally, dictionaries should be initialized once and be shareable. The main complexity of the project is related with necessity of structures flattening in memory and transaction support (dictionary can be altered with ALTER command).
Index Organized Tables (IOT) - 3-4 months. Better performance since data and index (on primary key) are stored together, no separate index file and , hence, no additional IO.
WIP, see http://www.sai.msu.su/~megera/wiki/2009-08-12: Phrase Search - 1-2 month. This includes developing of proximity operator and algebra for search query with proximity operator.
Fallback search - 2-3 weeks. If query is failed (no results), then rewrite it using query term frequencies and rerun query.
Better support of hunspell - currently, we support extended ASCII (8-bit) character, so languages like arabian are out of luck (http://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754) - 3 weeks
DONE: contrib/btree_gin, GIN indexing for scalar data types -1 month. Initially, GIN was designed for indexing of non-scalar data types (arrays, text search, for example), but it's useful to have support for scalar data types to use in multicolumn GIN index. For example, GIN index on (timestamp, tsvector) could be very useful for typical text search queries.
Improve selectivity of hstore index
DONE: Fix bugs in current support of geo-ops (polygon intersections is based on the boundary box!) and implement better algorithms.

See other non-completed items below.

Short list of todo items we plan for PostgreSQL 8.4+

Developers: Oleg Bartunov, Teodor Sigaev

Partially sponsored by EnterpriseDB and JFG Networks.

DONE: Wildcard search ( a*b ) using permuted word GIN index (in expense of the size). See http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
DONE: GIN fast online update - 2 weeks ( Committed after about a year !)
DONE: GIN multicolumn support - 2 weeks
DONE: GIN not only equality search, GIN prefix search support - 4 weeks, see http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
DONE: Index support for wildcard search in LIKE operator
ltree, pg_trgm - UTF8 support - 1 week. UPDATE: - ltree is now multibyte safe. Thanks Yahoo China !
GiSTarray - intarray for all built-in data types - 1 week
filtering dictionary support for text search - 3 days
DONE: prefix search support for tsearch ('abc:*'::tsquery). See http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
DONE: Fast approximate statistics of GIN index - Gevel
HTML parser for text search - 2 weeks (configurable lexemes entry weight for bolding, heading, etc)
Increase the # of weights from 4 to 16 and compress tsvector to compensate its swelling - ?research
Teach dictionary to return lexeme weight (as itself, not as an entry) - ?research (use case: noun can be more important that adjective)
DONE: remove ugly @@@ operator for text search (depends on indexing infrastructure) - 2 days (only for our work)
Add GiST bulk indexing interface. This should produce better (quality ) tree and be faster ( no pick/split) - ?research.
- Better algorithm of picksplit for intarray, ltree, text search - currently we use Hamming distance which is not good because it produces only Nbits+1 distances, while Tanamoto - about 4Nbits^2/pi^2 ( Farey sequence). Another interesting (asymmetric) distance is Jaccard's similarity and Tversky index.
WIP, see knngist: Change GiST interface to support tree traversing - enable KNN-search, SP-GiST, Patricia, digital trees, etc - 2 months
optimizing a>b OR a=b (and more complex) - 2days ( require convincing -hackers)
XML GIN support - ???
q3c - scalable search on spherical data ( Sergei Koposov )

TODO

Projects

Links

PostgreSQL TODO

Looking for support !!!

See other non-completed items below.