Re: [PERFORM] DELETE vs TRUNCATE explanation
От | Jeff Janes |
---|---|
Тема | Re: [PERFORM] DELETE vs TRUNCATE explanation |
Дата | |
Msg-id | CAMkU=1yLXvODRZZ_=fgrEeJfk2tvZPTTD-8n8BwrAhNz_WBT0A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PERFORM] DELETE vs TRUNCATE explanation (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: [PERFORM] DELETE vs TRUNCATE explanation
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Thu, Jul 12, 2012 at 9:55 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I've moved this thread from performance to hackers. > > The topic was poor performance when truncating lots of small tables > repeatedly on test environments with fsync=off. > > On Thu, Jul 12, 2012 at 6:00 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > >> I think the problem is in the Fsync Absorption queue. Every truncate >> adds a FORGET_RELATION_FSYNC to the queue, and processing each one of >> those leads to sequential scanning the checkpointer's pending ops hash >> table, which is quite large. It is almost entirely full of other >> requests which have already been canceled, but it still has to dig >> through them all. So this is essentially an N^2 operation. ... > >> I'm not sure why we don't just delete the entry instead of marking it >> as cancelled. It looks like the only problem is that you can't delete >> an entry other than the one just returned by hash_seq_search. Which >> would be fine, as that is the entry that we would want to delete; >> except that mdsync might have a different hash_seq_search open, and so >> it wouldn't be safe to delete. The attached patch addresses this problem by deleting the entry when it is safe to do so, and flagging it as canceled otherwise. I thought of using has_seq_scans to determine when it is safe, but dynahash.c does not make that function public, and I was afraid it might be too slow, anyway. So instead I used a static variable, plus the knowledge that the only time there are two scans on the table is when mdsync starts one and then calls RememberFsyncRequest indirectly. There is one other place that does a seq scan, but there is no way for control to pass from that loop to reach RememberFsyncRequest. I've added code to disclaim the scan if mdsync errors out. I don't think that this should a problem because at that point the scan object is never going to be used again, so if its internal state gets screwed up it shouldn't matter. However, I wonder if it should also call hash_seq_term, otherwise the pending ops table will be permanently prevented from expanding (this is a pre-existing condition, not to do with my patch). Since I don't know what can make mdsync error out without being catastrophic, I don't know how to test this out. One concern is that if the ops table ever does become bloated, it can never recover while under load. The bloated table will cause mdsync to take a long time to run, and as long as mdsync is in the call stack the antibloat feature is defeated--so we have crossed a tipping point and cannot get back. I don't see that occurring in the current use case, however. With my current benchmark, the anti-bloat is effective enough that mdsync never takes very long to execute, so a virtuous circle exists. As an aside, the comments in dynahash.c seem to suggest that one can always delete the entry returned by hash_seq_search, regardless of the existence of other sequential searches. I'm pretty sure that this is not true. Also, shouldn't this contract about when one is allowed to delete entries be in the hsearch.h file, rather than the dynahash.c file? Also, I still wonder if it is worth memorizing fsyncs (under fsync=off) that may or may not ever take place. Is there any guarantee that we can make by doing so, that couldn't be made otherwise? Cheers, Jeff
Вложения
В списке pgsql-hackers по дате отправления: