PostgreSQL TODO
Time is approximated and estimated for two of us
Short list of todo items we plan for PostgreSQL 8.5
Developers: Oleg Bartunov, Teodor Sigaev
Looking for support !!!
- Recognize "names" in tsearch parser (allow stopwords indexed in "names").
- Dictionaries (ispell) caching - 1 month. Currently, we have to initialize dictionaries for every connections, which may be quite long and consume a lot of memory. Ideally, dictionaries should be initialized once and be shareable. The main complexity of the project is related with necessity of structures flattening in memory and transaction support (dictionary can be altered with ALTER command).
- Index Organized Tables (IOT) - 3-4 months. Better performance since data and index (on primary key) are stored together, no separate index file and , hence, no additional IO.
- WIP, see http://www.sai.msu.su/~megera/wiki/2009-08-12: Phrase Search - 1-2 month. This includes developing of proximity operator and algebra for search query with proximity operator.
- Fallback search - 2-3 weeks. If query is failed (no results), then rewrite it using query term frequencies and rerun query.
- Better support of hunspell - currently, we support extended ASCII (8-bit) character, so languages like arabian are out of luck (http://sourceforge.net/docman/display_doc.php?docid=29374&group_id=143754) - 3 weeks
- DONE: contrib/btree_gin, GIN indexing for scalar data types -1 month. Initially, GIN was designed for indexing of non-scalar data types (arrays, text search, for example), but it's useful to have support for scalar data types to use in multicolumn GIN index. For example, GIN index on (timestamp, tsvector) could be very useful for typical text search queries.
- Improve selectivity of hstore index
- DONE: Fix bugs in current support of geo-ops (polygon intersections is based on the boundary box!) and implement better algorithms.
See other non-completed items below.
Short list of todo items we plan for PostgreSQL 8.4+
Developers: Oleg Bartunov, Teodor Sigaev
Partially sponsored by EnterpriseDB and JFG Networks.
- DONE: Wildcard search ( a*b ) using permuted word GIN index (in expense of the size). See http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
- DONE: GIN fast online update - 2 weeks ( Committed after about a year !)
- DONE: GIN multicolumn support - 2 weeks
- DONE: GIN not only equality search, GIN prefix search support - 4 weeks, see http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
- DONE: Index support for wildcard search in LIKE operator
- DONE: ltree, pg_trgm - UTF8 support - 1 week. UPDATE: - ltree is now multibyte safe. Thanks Yahoo China !
- GiSTarray - intarray for all built-in data types - 1 week
- DONE: filtering dictionary support for text search - 3 days
- DONE: prefix search support for tsearch ('abc:*'::tsquery). See http://archives.postgresql.org/pgsql-patches/2008-04/msg00129.php
- DONE: Fast approximate statistics of GIN index - Gevel
- HTML parser for text search - 2 weeks (configurable lexemes entry weight for bolding, heading, etc)
- Increase the # of weights from 4 to 16 and compress tsvector to compensate its swelling - ?research
- Teach dictionary to return lexeme weight (as itself, not as an entry) - ?research (use case: noun can be more important that adjective)
- DONE: remove ugly @@@ operator for text search (depends on indexing infrastructure) - 2 days (only for our work)
- Add GiST bulk indexing interface. This should produce better (quality ) tree and be faster ( no pick/split) - Aleksandr Korotkov GSoC
- Better algorithm of picksplit for intarray, ltree, text search - currently we use Hamming distance which is not good because it produces only Nbits+1 distances, while Tanamoto - about 4Nbits^2/pi^2 ( Farey sequence). Another interesting (asymmetric) distance is Jaccard's similarity and Tversky index.
- COMMITTED for 9.1, see knngist: Change GiST interface to support tree traversing - enable KNN-search
- WIP for 9.2: SP-GiST, Patricia, digital trees, etc - 2 months
- optimizing a>b OR a=b (and more complex) - 2days ( require convincing -hackers)
- XML GIN support - ???
- q3c - scalable search on spherical data ( Sergei Koposov )
See also our old todo list