Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )

Поиск
Список
Период
Сортировка
От Eric Ridge
Тема Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )
Дата
Msg-id 4CF83855-3F98-11D8-ADB4-000A95BB5944@tcdi.com
обсуждение исходный текст
Ответ на Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Ответы Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )  (Eric Ridge <ebr@tcdi.com>)
Список pgsql-hackers
On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote:
> I think your approach is too ugly.  You will have tons of problems the
> minute you start thinking about concurrency (unless you want to allow
> only a single user accessing the index)

It might be ugly, but it's very fast.  Surprisingly fast, actually.

Concerning concurrency, Xapian internally supports multiple readers and 
only 1 concurrent writer.  So the locking requirements should be far 
less complex than a true concurrent solution.  Now, I'm not arguing 
that this ideal, but if Xapian is a search engine you're interested in, 
then you've already made up your mind that you're willing to deal with 
1 writer at a time.

However, Xapian does have built-in support for searching multiple 
databases at once.  One thought I've had is to simply create a new 
1-document database on every INSERT/UPDATE beyond the initial CREATE 
INDEX.  Then whenever you do an index scan, tell Xapian to use all the 
little databases that exist in the index.  This would give some bit of 
concurrency.  Then on VACUUM (or FULL), all these little databases 
could be merged back into the main index.

> and recovery (unless you want to force users to REINDEX when the 
> system crashes).

I don't yet understand how the WAL stuff works.  I haven't looked at 
the API's yet, but if something you can record is "write these bytes to 
this BlockNumber at this offset", or if you can say, "index Tuple X 
from Relation Y", then it seems like recovery is still possible.

If ya can't do any of that, then I need to go look at WAL further.

> I think one way of attacking the problem would be using the existing
> nbtree by allowing it to store the five btrees.  First read the README
> in the nbtree dir, and then poke at the metapage's only structure.  You
> will see that it has a BlockNumber to the root page of the index.

Right, I had gotten this far in my investigation already.  The daunting 
thing about trying to use the nbtree code, is the a code itself.  It's 
very complex.  Plus, I just don't know how well the rest of Xapian 
would respond to all of a sudden having a concurrent backend.  It's 
likely that it would make no difference, but it's just an unknown to me 
at this time.

> Try modifying that to make it have a BlockNumber to every index's root 
> page.
> You will have to provide ways to access each root page and maybe other
> nonstandard things (such as telling the root split operation what root
> page are you going to split), but you will get recovery and concurrency
> (at least to a point) for free.

And I'm not convinced that recovery and concurrency would be "for free" 
in this case either.  The need to keep essentially 5 different trees in 
sync greatly complicates the concurrency issue, I would think.

thanks for your time!

eric



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: Announce: Search PostgreSQL related resources
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Proposed Query Planner TODO items