fulltext searching via a custom index type

Поиск
Список
Период
Сортировка
От Eric Ridge
Тема fulltext searching via a custom index type
Дата
Msg-id 61356C28-37DD-11D8-A406-000A95BB5944@tcdi.com
обсуждение исходный текст
Ответы Re: fulltext searching via a custom index type  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
(I started with this addressed to the -general list, but it just 
doesn't seem like a general usage topic.  If -hackers is the wrong 
place, please point me in the right direction).

I've been working on a custom index type (access method) that allows 
postgres to do fulltext searching (including phrase and proximity) that 
I believe is more flexible and natural than "tsearch2" (no offense to 
the tsearch2 guys... it's been a source of inspiration).  I also plan 
on providing "term browsing" and hit-position information.

I'm using Xapian (www.xapian.org) as the backend fulltext engine, and 
I'm trying to learn more of the innards of postgres as I go.  I've only 
spent about 18-24 hours on this, so it's nowhere near complete, and I 
really hesitate to even mention it in a public forum.  But if in the 
end it doesn't suck it might make a great contrib/ package.  *shrug*  
who knows?

In a nutshell, I've implemented all the necessary access method 
functions to teach postgres about a new index type named "xapian", and 
so far, everything is really great.  It works just like any other index 
type would:
create table test (stuff varchar(255), more_stuff text);create index idxstuff on test using xapian (stuff);create index
idxmore_stuffon test using xapian (more_stuff);
 
insert into test (stuff, more_stuff) values ('this is stuff', 'this is 
more stuff');insert into test (stuff, more_stuff) values ('my name is eric ridge', 
'i like to drink beer');
select * from test where stuff => 'stuff' or more_stuff => '"drink 
beer"';         stuff         |      more_stuff
-----------------------+---------------------- this is stuff         | this is more stuff my name is eric ridge | i
liketo drink beer
 
(2 rows)

All this aside, I've got some questions related to indexes and query 
plans.

1) Is it possible for an access method to receive some kind of "DROP 
INDEX" notification?  As far as I can tell, the answer is "no", but it 
can't hurt to ask.

2) When does the query planner decide that it needs to "Filter" 
results, and is there any way to turn this off or otherwise fool the 
query planner into NOT doing Filters (even if only for certain 
operators)?

For example, this query:select * from test where stuff => 'stuff' AND NOT more_stuff => 
'"drink beer"';
has this plan:Index Scan using idxstuff on test  (cost=0.00..-0.98 rows=250 
width=177)           Index Cond: ((stuff)::text => 'stuff'::text)           Filter: (NOT (more_stuff => '"drink
beer"'::text))

In this case, postgres is forced to re-parse the contents of the 
"more_stuff" field (via the defined procedure for the => operator) for 
every row returned by the previous index scan, just so it can determine 
if the field contains the phrase 'drink beer' or not.  Since so much 
overhead is involved here, it would be cool if postgres could somehow 
do another index scan.

Maybe there's some way for the operator function to know not only the 
Datum value, but the actual field (and ItemPointer)?

3) How does one get the $PGDATA directory? getenv() doesn't seem ideal 
since PGDATA can be specified as a command-line argument.

4) Can a function be registered as part of a transaction, pre-commit -- 
so the function can have an opportunity to abort the transaction.  I've 
seen RegisterEOXactCallback, but it's not quite what I'm looking for.

5) are there discussions in the archives of this list (or other pgsql- 
lists) that discuss fulltext searching that y'all think are worth 
reading?

thanks in advance for your time and input!

eric



В списке pgsql-hackers по дате отправления:

Предыдущее
От: markw@osdl.org
Дата:
Сообщение: DBT-2 PostgreSQL on STP w/ LVM2
Следующее
От: Michael Meskes
Дата:
Сообщение: Docu question