Обсуждение: slow dropping of tables, DropRelFileNodeBuffers, tas

Поиск
Список
Период
Сортировка

slow dropping of tables, DropRelFileNodeBuffers, tas

От
Sergey Koposov
Дата:
Hello,

I was running some tests on PG9.2beta where I'm creating and dropping 
large number of tables (~ 20000).

And I noticed that table dropping was extremely slow -- e.g. like half a 
second per table.

I tried to move the table dropping into bigger transactions (100 per one 
transaction) (increasing in the same time max_locks_per_trans to 128).
And still, the commits took ~ 50-60 seconds.

I tried to oprofile it, and I saw that 99% is spend on 
DropRelFileNodeBuffers, and when compiled with symbols (CFLAGS=-g)
I saw that actually most of the time is spend in tas() function, see 
below:

PU: Intel Architectural Perfmon, speed 1862 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        linenr info                 symbol name
-------------------------------------------------------------------------------
831665   86.6756  s_lock.h:204                tas  831665   100.000  s_lock.h:204                tas [self]
-------------------------------------------------------------------------------
122573   12.7745  bufmgr.c:2048               DropRelFileNodeBuffers  122573   100.000  bufmgr.c:2048
DropRelFileNodeBuffers[self]
 
-------------------------------------------------------------------------------
849       0.0885  xlog.c:697                  XLogInsert  849      100.000  xlog.c:697                  XLogInsert
[self]
-------------------------------------------------------------------------------
269       0.0280  catcache.c:444              CatalogCacheIdInvalidate  269      100.000  catcache.c:444
CatalogCacheIdInvalidate[self]
 
-------------------------------------------------------------------------------
232       0.0242  catcache.c:1787             PrepareToInvalidateCacheTuple  232      100.000  catcache.c:1787
  PrepareToInvalidateCacheTuple [self]
 
-------------------------------------------------------------------------------
202       0.0211  dynahash.c:807              hash_search_with_hash_value  202      100.000  dynahash.c:807
hash_search_with_hash_value [self]
 
-------------------------------------------------------------------------------
199       0.0207  nbtsearch.c:344             _bt_compare  199      100.000  nbtsearch.c:344             _bt_compare
[self]
-------------------------------------------------------------------------------
198       0.0206  list.c:506                  list_member_oid  198      100.000  list.c:506
list_member_oid[self]
 


Is the current behaviour expected ?
Because I saw the comment around droprelfilenodebuffers, saying
"XXX currently it sequentially searches the buffer pool, should be
changed to more clever ways of searching.  However, this 
routine is used only in code paths that aren't very 
performance-critical, and we shouldn't slow down the hot paths to make it 
faster ".

Maybe it is stupid, but I also wonder whether the root cause for what I'm 
seeing can be also responsible for the problems I recently  reported 
about the scaling and locking
http://archives.postgresql.org/pgsql-hackers/2012-05/msg01118.php

Some additional info:
The database is accessed in a single thread, shared_buffers are 10G. 
The tables themselves are empty essentially. the cpu if it matters is 
4 times Xeon E7- 4807 (Westmere architecture).
I did a vacuum full of everything just in case and it didn't help
And another maybe important factor is that I noticed is that
pg_catalog.pg_attribute is quite large (238 meg) (because of the large 
number of tables times columns).

I also stopped PG with gdb a few times and it was always at this backtrace:

(gdb) bt
#0  tas (lock=0x7fa4e3007538 "\001") at ../../../../src/include/storage/s_lock.h:218
#1  0x00000000006e6956 in DropRelFileNodeBuffers (rnode=..., forkNum=VISIBILITYMAP_FORKNUM, firstDelBlock=0) at
bufmgr.c:2062
#2  0x000000000070c014 in smgrdounlink (reln=0x1618210, forknum=VISIBILITYMAP_FORKNUM, isRedo=0 '\000') at smgr.c:354
#3  0x000000000051ecf6 in smgrDoPendingDeletes (isCommit=1 '\001') at storage.c:364
#4  0x00000000004a7b33 in CommitTransaction () at xact.c:1925
#5  0x00000000004a8479 in CommitTransactionCommand () at xact.c:2524
#6  0x0000000000710b3f in finish_xact_command () at postgres.c:2419
#7  0x000000000070ff4e in exec_execute_message (portal_name=0x1608990 "", max_rows=1) at postgres.c:1956
#8  0x0000000000712988 in PostgresMain (argc=2, argv=0x1548568, username=0x1548390 
with only forknum changing to MAIN_FORKNUM or FSM_FORKNUM

Thanks in advance,    Sergey


*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Heikki Linnakangas
Дата:
On 30.05.2012 03:40, Sergey Koposov wrote:
> I was running some tests on PG9.2beta where I'm creating and dropping
> large number of tables (~ 20000).
>
> And I noticed that table dropping was extremely slow -- e.g. like half a
> second per table.
>
> ...
>
> I also stopped PG with gdb a few times and it was always at this backtrace:
>
> (gdb) bt
> #0 tas (lock=0x7fa4e3007538 "\001") at
> ../../../../src/include/storage/s_lock.h:218
> #1 0x00000000006e6956 in DropRelFileNodeBuffers (rnode=...,
> forkNum=VISIBILITYMAP_FORKNUM, firstDelBlock=0) at bufmgr.c:2062
> #2 0x000000000070c014 in smgrdounlink (reln=0x1618210,
> forknum=VISIBILITYMAP_FORKNUM, isRedo=0 '\000') at smgr.c:354
> #3 0x000000000051ecf6 in smgrDoPendingDeletes (isCommit=1 '\001') at
> storage.c:364

Hmm, we do this in smgrDoPendingDeletes:

for (i = 0; i <= MAX_FORKNUM; i++)
{smgrdounlink(srel, i, false);
}

So we drop the buffers for each relation fork separately, which means 
that we scan the buffer pool four times. Relation forks in 8.4 
introduced that issue, and 9.1 made it worse by adding another fork for 
unlogged tables. With some refactoring, we could scan the buffer pool 
just once. That would help a lot.

Also, I wonder if DropRelFileNodeBuffers() could scan the pool without 
grabbing the spinlocks on every buffer? It could do an unlocked test 
first, and only grab the spinlock on buffers that need to be dropped.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Robert Haas
Дата:
On Wed, May 30, 2012 at 7:10 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> So we drop the buffers for each relation fork separately, which means that
> we scan the buffer pool four times. Relation forks in 8.4 introduced that
> issue, and 9.1 made it worse by adding another fork for unlogged tables.
> With some refactoring, we could scan the buffer pool just once. That would
> help a lot.

+1.

> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
> grabbing the spinlocks on every buffer? It could do an unlocked test first,
> and only grab the spinlock on buffers that need to be dropped.

I think it would be possible for the unlocked test to indicate that
the buffer should be dropped when it really ought not to be, because
someone else might be in the middle of changing the buffer tag, and
that's not atomic.  So you'd have to recheck after taking the
spinlock.  However, I don't think it's possible for the unlocked test
to report a false negative, because we've already taken
AccessExclusiveLock on the relation, which had better be enough to
guarantee that nobody's pulling in any more buffers from that relation
(if it doesn't guarantee that, the current code is already broken).
Acquiring a heavyweight lock also interposes a full memory barrier,
which should eliminate any risks due to memory-ordering effects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Jeff Janes
Дата:
On Wed, May 30, 2012 at 4:10 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 30.05.2012 03:40, Sergey Koposov wrote:
>>
>> I was running some tests on PG9.2beta where I'm creating and dropping
>> large number of tables (~ 20000).
>>
>> And I noticed that table dropping was extremely slow -- e.g. like half a
>> second per table.
>>
>> ...
>>
>>
>> I also stopped PG with gdb a few times and it was always at this
>> backtrace:
>>
>> (gdb) bt
>> #0 tas (lock=0x7fa4e3007538 "\001") at
>> ../../../../src/include/storage/s_lock.h:218
>> #1 0x00000000006e6956 in DropRelFileNodeBuffers (rnode=...,
>> forkNum=VISIBILITYMAP_FORKNUM, firstDelBlock=0) at bufmgr.c:2062
>> #2 0x000000000070c014 in smgrdounlink (reln=0x1618210,
>> forknum=VISIBILITYMAP_FORKNUM, isRedo=0 '\000') at smgr.c:354
>> #3 0x000000000051ecf6 in smgrDoPendingDeletes (isCommit=1 '\001') at
>> storage.c:364
>
>
> Hmm, we do this in smgrDoPendingDeletes:
>
> for (i = 0; i <= MAX_FORKNUM; i++)
> {
>        smgrdounlink(srel, i, false);
> }
>
> So we drop the buffers for each relation fork separately, which means that
> we scan the buffer pool four times. Relation forks in 8.4 introduced that
> issue, and 9.1 made it worse by adding another fork for unlogged tables.
> With some refactoring, we could scan the buffer pool just once. That would
> help a lot.

If someone drops many tables in the same transaction, could it be made
to stuff them into a hash table and then drop all of them with one
cycle around the buffer pool?  Or is the use case for that too narrow
a use case to be worthwhile?

Cheers,

Jeff


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Ants Aasma
Дата:
On Wed, May 30, 2012 at 2:10 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
> grabbing the spinlocks on every buffer? It could do an unlocked test first,
> and only grab the spinlock on buffers that need to be dropped.

The scanning of buffers seemed awfully slow to me. Doing the math it
ends up being somewhere around 120ns per buffer, about consistent with
a full cache miss. It looks like the spinlock tas implementation (lock
xchgb) is preventing prefetching. This suspicion is corroborated by
the following comment in Linux kernels per_cpu.h:

/** xchg is implemented using cmpxchg without a lock prefix. xchg is* expensive due to the implied lock prefix.  The
processorcannot prefetch* cachelines if xchg is used.*/ 

I'm not sure, but doing an unlocked test first might also be useful in
triggering the prefetchers. The CPU should be doing a lot better than
the current ~4.3GB/s when scanning buffer descriptors.

Of course not scanning at all or doing less scans at the expense of
more work in the inner loop would be even better.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 30 May 2012 12:10, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> Hmm, we do this in smgrDoPendingDeletes:
>
> for (i = 0; i <= MAX_FORKNUM; i++)
> {
>        smgrdounlink(srel, i, false);
> }
>
> So we drop the buffers for each relation fork separately, which means that
> we scan the buffer pool four times. Relation forks in 8.4 introduced that
> issue, and 9.1 made it worse by adding another fork for unlogged tables.
> With some refactoring, we could scan the buffer pool just once. That would
> help a lot.

That struck me as a safe and easy optimisation. This was a problem I'd
been trying to optimise for 9.2, so I've written a patch that appears
simple and clean enough to be applied directly.

> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
> grabbing the spinlocks on every buffer? It could do an unlocked test first,
> and only grab the spinlock on buffers that need to be dropped.

Sounds less good and we'd need reasonable proof it actually did
anything useful without being dangerous.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Sergey Koposov
Дата:
On Thu, 31 May 2012, Simon Riggs wrote:

>
> That struck me as a safe and easy optimisation. This was a problem I'd
> been trying to optimise for 9.2, so I've written a patch that appears
> simple and clean enough to be applied directly.

Thanks! The patch indeed improved the timings, 
The dropping of 100 tables in a single commit before the patch took ~ 50 
seconds, now it takes ~ 5 sec (it would be nice to reduce it further 
though, because the dropping of 10000 tables still takes ~10 min).

Cheers,    S

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Jeff Janes
Дата:
On Thu, May 31, 2012 at 11:09 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
> On Thu, 31 May 2012, Simon Riggs wrote:
>
>>
>> That struck me as a safe and easy optimisation. This was a problem I'd
>> been trying to optimise for 9.2, so I've written a patch that appears
>> simple and clean enough to be applied directly.
>
>
> Thanks! The patch indeed improved the timings, The dropping of 100 tables in
> a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it
> would be nice to reduce it further though, because the dropping of 10000
> tables still takes ~10 min).

I'm surprised it helped that much.  I thought the most it could
theoretically could help would be a factor of 4.

I tried the initially unlocked test, and for me it cut the time by a
factor of 3.  But I only have a 1GB shared_buffers at the max, I would
expect it help more at larger sizes because there is a constant
overhead not related to scanning the shared buffers which gets diluted
out the larger shared_buffers is.

I added to that a drop-all very similar to what Simon posted and got
another factor of 3.

But, if you can do this during a maintenance window, then just
restarting with a much smaller shared_buffers should give you a much
larger speed up than either or both of these.  If I can extrapolate up
to 10G from my current curve, setting it to 8MB instead would give a
speed up of nearly 400 fold.

Cheers,

Jeff


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 31 May 2012 19:09, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
> On Thu, 31 May 2012, Simon Riggs wrote:
>
>>
>> That struck me as a safe and easy optimisation. This was a problem I'd
>> been trying to optimise for 9.2, so I've written a patch that appears
>> simple and clean enough to be applied directly.
>
>
> Thanks! The patch indeed improved the timings, The dropping of 100 tables in
> a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec

Thanks for the timing.

>(it
> would be nice to reduce it further though, because the dropping of 10000
> tables still takes ~10 min).

Why do you have 10,000 tables and why is it important to drop them so quickly?

If its that important, why not run the drop in parallel sessions?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Sergey Koposov
Дата:
On Fri, 1 Jun 2012, Simon Riggs wrote:

>
> Why do you have 10,000 tables and why is it important to drop them so quickly?

10000 tables are there, because that's the number of partitions. And I'm 
dropping them at the moment, because I'm doing testing. So it won't be
really crucial for production. But I still thought it was worth reporting. 
Especially when the table dropping took .5 a sec.

The problem is that when I set up the shared_buffers say to 48G, the 
timings of the tables rise significantly again.

> If its that important, why not run the drop in parallel sessions?

Yes, before there was a strong reason to do that, now the timings are more 
manageable, but maybe I'll implement that.

Cheers,    S

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 1 June 2012 12:34, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
> On Fri, 1 Jun 2012, Simon Riggs wrote:
>
>>
>> Why do you have 10,000 tables and why is it important to drop them so
>> quickly?
>
>
> 10000 tables are there, because that's the number of partitions. And I'm
> dropping them at the moment, because I'm doing testing. So it won't be
> really crucial for production. But I still thought it was worth reporting.
> Especially when the table dropping took .5 a sec.

Ah, partitions. That explains the long drop time.

Hopefully people don't need to do that too frequently.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Jeff Janes
Дата:
On Thu, May 31, 2012 at 5:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 30 May 2012 12:10, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>
>> Hmm, we do this in smgrDoPendingDeletes:
>>
>> for (i = 0; i <= MAX_FORKNUM; i++)
>> {
>>        smgrdounlink(srel, i, false);
>> }
>>
>> So we drop the buffers for each relation fork separately, which means that
>> we scan the buffer pool four times. Relation forks in 8.4 introduced that
>> issue, and 9.1 made it worse by adding another fork for unlogged tables.
>> With some refactoring, we could scan the buffer pool just once. That would
>> help a lot.
>
> That struck me as a safe and easy optimisation. This was a problem I'd
> been trying to optimise for 9.2, so I've written a patch that appears
> simple and clean enough to be applied directly.

By directly do you mean before the fork/commit fest begins?

>
>> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
>> grabbing the spinlocks on every buffer? It could do an unlocked test first,
>> and only grab the spinlock on buffers that need to be dropped.
>
> Sounds less good and we'd need reasonable proof it actually did
> anything useful without being dangerous.

Doing an initial unlocked test speeds things up another 2.69 fold (on
top of 3.55 for your patch) for me, with 1GB of shared buffers.  That
seems like it should be worthwhile.

How do we go about getting reasonable proof that it is safe?

Thanks,

Jeff

Вложения

Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 3 June 2012 19:07, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, May 31, 2012 at 5:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 30 May 2012 12:10, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>
>>> Hmm, we do this in smgrDoPendingDeletes:
>>>
>>> for (i = 0; i <= MAX_FORKNUM; i++)
>>> {
>>>        smgrdounlink(srel, i, false);
>>> }
>>>
>>> So we drop the buffers for each relation fork separately, which means that
>>> we scan the buffer pool four times. Relation forks in 8.4 introduced that
>>> issue, and 9.1 made it worse by adding another fork for unlogged tables.
>>> With some refactoring, we could scan the buffer pool just once. That would
>>> help a lot.
>>
>> That struck me as a safe and easy optimisation. This was a problem I'd
>> been trying to optimise for 9.2, so I've written a patch that appears
>> simple and clean enough to be applied directly.
>
> By directly do you mean before the fork/commit fest begins?
>
>>
>>> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
>>> grabbing the spinlocks on every buffer? It could do an unlocked test first,
>>> and only grab the spinlock on buffers that need to be dropped.
>>
>> Sounds less good and we'd need reasonable proof it actually did
>> anything useful without being dangerous.
>
> Doing an initial unlocked test speeds things up another 2.69 fold (on
> top of 3.55 for your patch) for me, with 1GB of shared buffers.  That
> seems like it should be worthwhile.
>
> How do we go about getting reasonable proof that it is safe?

That's enough for me.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Sergey Koposov
Дата:
On Tue, 5 Jun 2012, Simon Riggs wrote:

>>> Sounds less good and we'd need reasonable proof it actually did
>>> anything useful without being dangerous.
>>
>> Doing an initial unlocked test speeds things up another 2.69 fold (on
>> top of 3.55 for your patch) for me, with 1GB of shared buffers.  That
>> seems like it should be worthwhile.
>>
>> How do we go about getting reasonable proof that it is safe?
>
> That's enough for me.

So is it planned to apply that patch for 9.2 ?

Thanks,    S

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Tom Lane
Дата:
Sergey Koposov <koposov@ast.cam.ac.uk> writes:
> On Tue, 5 Jun 2012, Simon Riggs wrote:
>>>> Sounds less good and we'd need reasonable proof it actually did
>>>> anything useful without being dangerous.

>>> Doing an initial unlocked test speeds things up another 2.69 fold (on
>>> top of 3.55 for your patch) for me, with 1GB of shared buffers. �That
>>> seems like it should be worthwhile.

>>> How do we go about getting reasonable proof that it is safe?

>> That's enough for me.

Say what?  That's a performance result and proves not a damn thing about
safety.
        regards, tom lane


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 7 June 2012 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Koposov <koposov@ast.cam.ac.uk> writes:
>> On Tue, 5 Jun 2012, Simon Riggs wrote:
>>>>> Sounds less good and we'd need reasonable proof it actually did
>>>>> anything useful without being dangerous.
>
>>>> Doing an initial unlocked test speeds things up another 2.69 fold (on
>>>> top of 3.55 for your patch) for me, with 1GB of shared buffers.  That
>>>> seems like it should be worthwhile.
>
>>>> How do we go about getting reasonable proof that it is safe?
>
>>> That's enough for me.
>
> Say what?  That's a performance result and proves not a damn thing about
> safety.

Of course not.

Based on the rationale explained in the code comments in the patch, it
seems like a reasonable thing to me now.

The argument was that since we hold AccessExclusiveLock on the
relation, no other agent can be reading in new parts of the table into
new buffers, so the only change to a buffer would be away from the
dropping relation, in which case we wouldn't care. Which seems correct
to me.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 7 June 2012 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Say what? �That's a performance result and proves not a damn thing about
>> safety.

> Of course not.

> Based on the rationale explained in the code comments in the patch, it
> seems like a reasonable thing to me now.

> The argument was that since we hold AccessExclusiveLock on the
> relation, no other agent can be reading in new parts of the table into
> new buffers, so the only change to a buffer would be away from the
> dropping relation, in which case we wouldn't care. Which seems correct
> to me.

Oh, I must be confused about which patch we are talking about --- I
thought this was in reference to some of the WIP ideas that were being
thrown about with respect to using lock-free access primitives.  Which
patch are you proposing for commit now, exactly?
        regards, tom lane


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 7 June 2012 17:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 7 June 2012 14:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Say what?  That's a performance result and proves not a damn thing about
>>> safety.
>
>> Of course not.
>
>> Based on the rationale explained in the code comments in the patch, it
>> seems like a reasonable thing to me now.
>
>> The argument was that since we hold AccessExclusiveLock on the
>> relation, no other agent can be reading in new parts of the table into
>> new buffers, so the only change to a buffer would be away from the
>> dropping relation, in which case we wouldn't care. Which seems correct
>> to me.
>
> Oh, I must be confused about which patch we are talking about --- I
> thought this was in reference to some of the WIP ideas that were being
> thrown about with respect to using lock-free access primitives.  Which
> patch are you proposing for commit now, exactly?

Both of these, as attached up thread.

Simon's patch - dropallforks.v1.patch
Jeff's patch - DropRelFileNodeBuffers_unlock_v1.patch
(needs a little tidyup)

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 7 June 2012 17:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oh, I must be confused about which patch we are talking about --- I
>> thought this was in reference to some of the WIP ideas that were being
>> thrown about with respect to using lock-free access primitives. �Which
>> patch are you proposing for commit now, exactly?

> Both of these, as attached up thread.

> Simon's patch - dropallforks.v1.patch
> Jeff's patch - DropRelFileNodeBuffers_unlock_v1.patch
> (needs a little tidyup)

OK, will take a look.
        regards, tom lane


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
>> On 30 May 2012 12:10, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
>>> grabbing the spinlocks on every buffer? It could do an unlocked test first,
>>> and only grab the spinlock on buffers that need to be dropped.

>> Sounds less good and we'd need reasonable proof it actually did
>> anything useful without being dangerous.

> Doing an initial unlocked test speeds things up another 2.69 fold (on
> top of 3.55 for your patch) for me, with 1GB of shared buffers.  That
> seems like it should be worthwhile.

With shared_buffers set to 1GB, I see about a 2X reduction in the total
time to drop a simple table, iecreate table zit(f1 text primary key);drop table zit;
(This table definition is chosen to ensure there's an index and a toast
table involved, so several scans of the buffer pool are needed.)  The
DROP goes from about 40ms to about 20ms on a fairly recent Xeon desktop.
So I'm convinced this is a win.

I extended the patch to also cover DropDatabaseBuffers,
FlushRelationBuffers, and FlushDatabaseBuffers, which have got the exact
same type of full-pool scan loop, and committed it.
        regards, tom lane


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Tom Lane
Дата:
I wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Both of these, as attached up thread.
>> Simon's patch - dropallforks.v1.patch
>> Jeff's patch - DropRelFileNodeBuffers_unlock_v1.patch
>> (needs a little tidyup)

> OK, will take a look.

I didn't like dropallforks.v1.patch at all as presented, for several
reasons:

* Introducing an AllForks notation that only works in some contexts is
a foot-gun of large caliber.  This concern is not academic: you broke
dropping of temp relations entirely, in the patch as presented, because
for temp rels DropRelFileNodeBuffers would hand off to
DropRelFileNodeLocalBuffers and the latter had not been taught about
AllForks.

* Since we have found out in this thread that the inner loop of
DropRelFileNodeBuffers is performance-critical for the cases we're
dealing with, it seems inappropriate to me to make its tests more
complex.  We want simpler, and we can have simpler given that the
relation-drop case cares neither about fork nor block number.

* The patch modified behavior of XLogDropRelation, which has not been
shown to be performance-critical, and probably can't be because the
hashtable it searches should never be all that large.  It certainly
doesn't matter to the speed of normal execution.

I thought it would be a lot safer and probably a little bit quicker
if we just split DropRelFileNodeBuffers into two routines, one for
the specific-fork case and one for the all-forks case; and then the
same for its main caller smgrdounlink.  So I modified the patch along
those lines and committed it.

As committed, the smgrdounlinkfork case is actually dead code; it's
never called from anywhere.  I left it in place just in case we want
it someday.
        regards, tom lane


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Simon Riggs
Дата:
On 7 June 2012 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I thought it would be a lot safer and probably a little bit quicker
> if we just split DropRelFileNodeBuffers into two routines, one for
> the specific-fork case and one for the all-forks case; and then the
> same for its main caller smgrdounlink.  So I modified the patch along
> those lines and committed it.
>
> As committed, the smgrdounlinkfork case is actually dead code; it's
> never called from anywhere.  I left it in place just in case we want
> it someday.

That's fine. The first version of the patch did it exactly that way.

I tried to double guess objections and so recoded it the way submitted.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: slow dropping of tables, DropRelFileNodeBuffers, tas

От
Sergey Koposov
Дата:
On Thu, 7 Jun 2012, Tom Lane wrote:
> I extended the patch to also cover DropDatabaseBuffers,
> FlushRelationBuffers, and FlushDatabaseBuffers, which have got the exact
> same type of full-pool scan loop, and committed it.

Thanks everybody for the patches/commits.
    Sergey

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/