Re: [HACKERS] how to deal with sparse/to-be populated tables

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Re: [HACKERS] how to deal with sparse/to-be populated tables
Дата
Msg-id 20000203213232.Y25520@fw.wintelcom.net
обсуждение исходный текст
Ответ на Re: [HACKERS] how to deal with sparse/to-be populated tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] how to deal with sparse/to-be populated tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
* Tom Lane <tgl@sss.pgh.pa.us> [000203 20:58] wrote:
> Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> > Hmm. Doesn't PostgreSQL have a big list of error codes? I don't think
> > it does, I've never seen one. There should be a way to get error
> > codes without comparing strings. Should this be on the TODO?
> 
> It doesn't, there should, and it already is ;-)
> 
> In the meantime, looking at the error message string is Alfred's
> only option for distinguishing duplicate-record from other errors,
> I'm afraid.
> 
> A partial answer to his performance concern is to use a rule
> (or possibly a trigger) on the database side to reinterpret
> "insert into table X" as "either insert or update in table Y,
> depending on whether the key is already there".  This wouldn't
> buy anything in terms of database cycles, but it would avoid two
> rounds of client-to-backend communication and query parsing.
> 
> I've never done that myself, but perhaps someone else on the
> list has a working example.

Actually we have some plpgsql code lying around that does this.
The issue isn't ease of implementation, but actually the speed of
the implementation.  Even parsing the error return isn't as optimal
as a insert_new|update_existing_with_args single op would be.

One of the more fustrating aspects is that we could use the field
that we merge rows on as a primary index, this would allow us to
do a insert or update on failed insert...

however... if we fail to locate the row on the initial query (to
see if it exists) we pay a large penalty because the insert must
be validated to be unique.  This effectively doubles the search.
This is also a problem if we do "update or insert on fail", basically
a double scan is required.

(yes, I just thought about only indexing, and trying the update
first and only on failure doing an insert, however we really can't
determine if the initial update failed because no record matched(ok),
or possible some other error (ouch))

That's why we can't use this feild as a primary index, even though
it is supposed to be unqiue.

Basically the database seems to force a _double_ lookup, the only
way I see around this is to then switch over to a bulk copy getting
around the double lookup.  However, this will only work for our
special case where there is only a single reader/writer updating
the table at any time, otherwise we need special locking to avoid
races.

Even if this isn't a TODO item, if there's a wish list out there
it'd be nice to see this request for feature listed.

I think once the dust settles over here and the need to scale goes
from very scalable to insanely scalable I'm going to have an even
greater interest in learning postgresql internals. :)

thanks,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hiroshi Inoue"
Дата:
Сообщение: RE: [HACKERS] how to deal with sparse/to-be populated tables
Следующее
От: Alfred Perlstein
Дата:
Сообщение: Re: [HACKERS] how to deal with sparse/to-be populated tables