Обсуждение: how to transform list to table and evaluate an expression on it?

Поиск
Список
Период
Сортировка

how to transform list to table and evaluate an expression on it?

От
Tomas Vondra
Дата:
Hello,

in several apps running over postgres 8.0/8.1 I've used following
"full-text" search engine - imagine there are two tables:

1) Documents - with "docId", "title", "abstract" and "body" columns
2) Index - with "docId", "word" and "score"

It's very easy to build the Index from the Documents table - for each
document (row in the Documents table) a list of words is built, and the
number of occurences is used as the score (there are different
possibilities, of course).

When searching for a string, at first I search for documents with at
least one of the words in the string. So for example when searching for
"apples oranges", at first I do something like

SELECT docId, word, score FROM Index where word IN ('apples','oranges');

Then I fetch all these results to PHP, and in PHP, I transform this
"list" of words into two-dimensional associative array with "document
ID" as a key at first level and words at second level.

$results = array($docId1 => array(        word1 => score,        word2 => score,        ...                  )$docId2
=>array(        word1 => score,        word2 => score,        ...                  )       ...);
 

And this array is processed (in PHP) to get the final score - each OR
connective is translated as MAX() and each AND connective is translated
as MIN() function (on each element of the array).

For example the "apples AND oranges" would be translated as
   MIN(score for apples, score for oranges)

Then I have a list of (docId,score) pairs, so I can fetch details for
the documents I want (the ones with the highest score, for example).

QUESTIONS & PROBLEMS
--------------------

What I really don't like on the "algorithm" is the need to process the
data outside of the database - fetch them into the PHP, process them and
query the database again to get the final result. It's not fast, it's
not shared for different parts of the projects written in different
languages, etc.

I'd like to transform all this processing into PL/pgSQL (or maybe
PL/Perl or something else) function, but I don't know

1) How to create a "table" in the form
  documtent_id  | word_1 | word_2 | ... | word_n -------------------------------------------------       1345     |
11  |   12   | ... |   2       1202     |    1   |    0   | ... |  12         .           .        .      .      .
  .           .        .      .      .       1129     |   20   |    1   | ... |   0
 
  from the query
  SELECT docId, word, score FROM Index WHERE word IN (word_1,..,word_n)

2) How to evaluate the function derived from the 'search string' on this  table, but this probably will work as an
EXECUTEstatement or  something like that.
 

Maybe there's a completely different solution to all of this, which I
can't see.

Thanks for your advices
Tomas


Re: how to transform list to table and evaluate an

От
Frank Bax
Дата:
Today's your lucky day (I think), because I was looking for (and used) the 
aggregate function mentioned below just before reading your question.


At 11:03 AM 1/7/06, Tomas Vondra wrote:
>1) How to create a "table" in the form
>
>    documtent_id  | word_1 | word_2 | ... | word_n
>   -------------------------------------------------
>         1345     |   11   |   12   | ... |   2
>         1202     |    1   |    0   | ... |  12
>           .           .        .      .      .
>           .           .        .      .      .
>         1129     |   20   |    1   | ... |   0
>
>    from the query
>
>    SELECT docId, word, score FROM Index WHERE word IN (word_1,..,word_n)


From section 31.10 of the 8.0 docs came this function... CREATE AGGREGATE array_accum (     sfunc = array_append,
basetype= anyelement,     stype = anyarray,     initcond = '{}' );
 

This query will return one row per docld.  It's not exactly the format you 
asked for, but perhaps it's a start ...

select docld,array_accum(word),array_accum(score) from index where word in 
('apples','orange') group by docld;

Then your could write two functions (beyond my capabilities):
minarray( int[] ) so you could select minarray( array_accum(score) ) /* 
apples AND orange */
maxarray( int[] ) so you could select maxarray( array_accum(score) ) /* 
apples OR orange */



>2) How to evaluate the function derived from the 'search string' on this
>    table, but this probably will work as an EXECUTE statement or
>    something like that.



Not sure what you're asking here....