FTS Todo

FTS TODO

Features:

  • Prefix search -
  • Substring search - could be realized with special dictionary. Needs special ranking function.
  • Exact search - search exact form of a word without stemming. It could be realized teaching all dictionaries to return an original word along with "normalization", but better have it possible on global level.
  • Phrase search
  • N-gram search - probably, special variant of a substring search. Should be useful for Asian languages.
  • HTML parser - recognize HTML tags
  • Configuration for a parser - join characters (blank, apostrophe, dot, underline, hyphen), option for splitting
  • translate(word, dic1, dic2) - translation dictionary, which returns word, normalized by dic1, and its translation (dic2).
  • Probabilistic stemmer - language independent stemmer based on probability of Markov's chain.

Performance:

  • Store positional information in GIN index to avoid heap lookup for ranking
  • Smarter algorithm for GIN_FUZZY_LIMIT