Re: Postgres for a "data warehouse", 5-10 TB
От | Robert Klemme |
---|---|
Тема | Re: Postgres for a "data warehouse", 5-10 TB |
Дата | |
Msg-id | j4ltdo$r65$1@dough.gmane.org обсуждение исходный текст |
Ответ на | Re: Postgres for a "data warehouse", 5-10 TB (Andy Colson <andy@squeakycode.net>) |
Ответы |
Re: Postgres for a "data warehouse", 5-10 TB
(Marti Raudsepp <marti@juffo.org>)
Re: Postgres for a "data warehouse", 5-10 TB (Gianni Ciolli <gianni.ciolli@2ndquadrant.it>) |
Список | pgsql-performance |
On 12.09.2011 19:22, Andy Colson wrote: > On 9/12/2011 12:15 PM, Robert Klemme wrote: >> On 11.09.2011 19:02, Marti Raudsepp wrote: >>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy@squeakycode.net> wrote: >>>> On 09/11/2011 08:59 AM, Igor Chudov wrote: >>>>> By the way, does that INSERT UPDATE functionality or something like >>>>> this exist in Postgres? >>>> You have two options: >>>> 1) write a function like: >>>> create function doinsert(_id integer, _value text) returns void as >>>> 2) use two sql statements: >>> >>> Unfortunately both of these options have caveats. Depending on your >>> I/O speed, you might need to use multiple loader threads to saturate >>> the write bandwidth. >>> >>> However, neither option is safe from race conditions. If you need to >>> load data from multiple threads at the same time, they won't see each >>> other's inserts (until commit) and thus cause unique violations. If >>> you could somehow partition their operation by some key, so threads >>> are guaranteed not to conflict each other, then that would be perfect. >>> The 2nd option given by Andy is probably faster. >>> >>> You *could* code a race-condition-safe function, but that would be a >>> no-go on a data warehouse, since each call needs a separate >>> subtransaction which involves allocating a transaction ID. >> >> Wouldn't it be sufficient to reverse order for race condition safety? >> Pseudo code: >> >> begin >> insert ... >> catch >> update ... >> if not found error >> end >> >> Speed is another matter though... > No, I dont think so, if you had two loaders, both would start a > transaction, then neither could see what the other was doing. It depends. But the point is that not both INSERTS can succeed. The one which fails will attempt the UPDATE and - depending on isolation level and DB implementation - will be blocked or fail. In the case of PG this particular example will work: 1. TX inserts new PK row 2. TX tries to insert same PK row => blocks 1. TX commits 2. TX fails with PK violation 2. TX does the update (if the error is caught) > There are > transaction isolation levels, but they are like playing with fire. (in > my opinion). You make them sound like witchcraft. But they are clearly defined - even standardized. Granted, different RDBMS might implement them in different ways - here's PG's view of TX isolation: http://www.postgresql.org/docs/8.4/interactive/transaction-iso.html In my opinion anybody working with RDBMS should make himself familiar with this concept - at least know about it - because it is one of the fundamental features of RDBMS and certainly needs consideration in applications with highly concurrent DB activity. Kind regards robert
В списке pgsql-performance по дате отправления: