Re: Performance and WAL on big inserts/updates

Поиск
Список
Период
Сортировка
От Marty Scholes
Тема Re: Performance and WAL on big inserts/updates
Дата
Msg-id 405120D6.3080501@outputservices.com
обсуждение исходный текст
Ответ на Performance and WAL on big inserts/updates  (Marty Scholes <marty@outputservices.com>)
Ответы Re: Performance and WAL on big inserts/updates  (Sailesh Krishnamurthy <sailesh@cs.berkeley.edu>)
Список pgsql-hackers
> A major reason for this is that logical logs make recovery contingent> on being able to execute the "parsed
statements".This execution> might, however, not be possible if the system is itself not in a> consistent state .. as is
normallythe case during recovery.>
 

I am not sure I follow you here.  The logs should (IMHO) save both types 
of data: physical pages (what happens now), or the SQL statement if it 
is small and generates a bunch of changes.

If the DB state cannot be put back to a consistent state prior to a SQL 
statement in the log, then NO amount of logging will help.  The idea is 
that the state can be put back to what it was prior to a particular log 
entry, be it raw datafile blocks or a SQL statement.
> What if, for instance, it's the catalog tables that were hosed when> the system went down ? It may be difficult to
executethe parsed> statements without the catalogs.>
 

See above.  If this cannot be resolved prior to re-executing a statement 
in the log, then the problem is beyond ANY subsequent logging.
> Having said that, page-oriented undo logging can be a pain when B-tree> pages split. For higher concurrency, ARIES
useslogical undo> logging. In this case, the logs are akin to your "parsed statement"> idea.>
 

Yes, my experience exactly.  Maybe we are the only company on the planet 
that experiences this sort of thing.  Maybe not.
> In any case, the only place that parsed statements are useful, imo are> with searched updates that cause a large
numberof records to change> and with "insert into from select" statements.>
 

Yes.  Correlated UPDATE, INSERT INTO with subselects AND mass DELETE on 
heavily indexed tables.  Index creation...  The list goes on and on.  I 
have experienced and live it all on a daily basis with Oracle.  And I 
despise it.

The difference is, of course, I can't even have this kind of discussion 
with Oracle, but I can here.  ;-)
> Then, there is also the case that this, the "parsed statements"> approach, is not a general solution. How would you
handlethe "update> current of cursor" scenarios ? In this case, there is some application> logic that determines the
preciserecords that change and how they> change.>> Ergo, it is my claim that while logical redo logging does have some>
benefits,it is not a viable general solution.>
 

Agreed, this is not a general solution.  What it is, however, is a 
tremendous improvement over the current situation for transactions that 
do massive changes to heavily indexed datasets.

I am working on an application right now that will require current 
postal information on EVERY address in the U.S. -- street name, street 
address, directional, subunit, 5 digit zip, 3 digit zip, city, state, 
delivery point barcode, carrier route, lattitude, longitude, etc.  Most 
of these fields will need to be indexed, because they will be searched 
in real time via a web application several thousand times per day.

To keep the address current, we will be updating them all (150+ million)  on a programmed basis, so we will go through
andupdate several 
 
million addresses EVERY DAY, while needing to ensure that the address 
updates happen atomically so that they don't disrupt web activity.

Maybe this is not a "traditional" RDBMS app, but I am not in the mood to 
write my own storage infrastructure for it.

Then again, maybe I don't know what I am talking about...

Marty


Sailesh Krishnamurthy wrote:
>>>>>>"Marty" == Marty Scholes <marty@outputservices.com> writes:
>>>>>
> 
>     Marty> Why have I not seen this in any database?
>     Marty> There must be a reason.
> 
> For ARIES-style systems, logging parsed statements (commonly called
> "logical" logging) is not preferred compared to logging data items
> ("physical" or "physiological" logging). 
> 
> A major reason for this is that logical logs make recovery contingent
> on being able to execute the "parsed statements". This execution
> might, however, not be possible if the system is itself not in a
> consistent state .. as is normally the case during recovery. 
> 
> What if, for instance, it's the catalog tables that were hosed when
> the system went down ? It may be difficult to execute the parsed
> statements without the catalogs. 
> 
> For this reason, a major goal of ARIES was to have each and every data
> object (tables/indexes) individually recoverable. So ARIES follows
> page-oriented redo logging.
> 
> Having said that, page-oriented undo logging can be a pain when B-tree
> pages split. For higher concurrency, ARIES uses logical undo
> logging. In this case, the logs are akin to your "parsed statement"
> idea.
> 
> In any case, the only place that parsed statements are useful, imo are
> with searched updates that cause a large number of records to change
> and with "insert into from select" statements.
> 
> Then, there is also the case that this, the "parsed statements"
> approach, is not a general solution. How would you handle the "update
> current of cursor" scenarios ? In this case, there is some application
> logic that determines the precise records that change and how they
> change. 
> 
> Ergo, it is my claim that while logical redo logging does have some
> benefits, it is not a viable general solution.
> 




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Default Stats Revisited
Следующее
От: Marty Scholes
Дата:
Сообщение: Re: Performance and WAL on big inserts/updates