Re: Speed up Clog Access by increasing CLOG buffers

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Speed up Clog Access by increasing CLOG buffers
Дата
Msg-id CAMkU=1yLzEBi3w-zsAMzyYvDs-FM1p_AiUu9=0d67u0fULWgqw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 11, 2015 at 9:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Sep 11, 2015 at 10:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Could you perhaps try to create a testcase where xids are accessed that
> > > are so far apart on average that they're unlikely to be in memory? And
> > > then test that across a number of client counts?
> > >
> >
> > Now about the test, create a table with large number of rows (say 11617457,
> > I have tried to create larger, but it was taking too much time (more than a day))
> > and have each row with different transaction id.  Now each transaction should
> > update rows that are at least 1048576 (number of transactions whose status can
> > be held in 32 CLog buffers) distance apart, that way ideally for each update it will
> > try to access Clog page that is not in-memory, however as the value to update
> > is getting selected randomly and that leads to every 100th access as disk access.
>
> What about just running a regular pgbench test, but hacking the
> XID-assignment code so that we increment the XID counter by 100 each
> time instead of 1?
>

If I am not wrong we need 1048576 number of transactions difference
for each record to make each CLOG access a disk access, so if we
increment XID counter by 100, then probably every 10000th (or multiplier
of 10000) transaction would go for disk access.

The number 1048576 is derived by below calc:
#define CLOG_XACTS_PER_BYTE 4
#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)

Transaction difference required for each transaction to go for disk access:
CLOG_XACTS_PER_PAGE * num_clog_buffers.


That guarantees that every xid occupies its own 32-contiguous-pages chunk of clog.  

But clog pages are not pulled in and out in 32-page chunks, but one page chunks.  So you would only need 32,768 differences to get every real transaction to live on its own clog page, which means every look up of a different real transaction would have to do a page replacement.  (I think your references to disk access here are misleading.  Isn't the issue here the contention on the lock that controls the page replacement, not the actual IO?)

I've attached a patch that allows you set the guc "JJ_xid",which makes it burn the given number of xids every time one new one is asked for.  (The patch introduces lots of other stuff as well, but I didn't feel like ripping the irrelevant parts out--if you don't set any of the other gucs it introduces from their defaults, they shouldn't cause you trouble.)  I think there are other tools around that do the same thing, but this is the one I know about.  It is easy to drive the system into wrap-around shutdown with this, so lowering autovacuum_vacuum_cost_delay is a good idea.

Actually I haven't attached it, because then the commitfest app will list it as the patch needing review, instead I've put it here https://drive.google.com/file/d/0Bzqrh1SO9FcERV9EUThtT3pacmM/view?usp=sharing

I think reducing to every 100th access for transaction status as disk access
is sufficient to prove that there is no regression with the patch for the screnario
asked by Andres or do you think it is not?

Now another possibility here could be that we try by commenting out fsync
in CLOG path to see how much it impact the performance of this test and
then for pgbench test.  I am not sure there will be any impact because even
every 100th transaction goes to disk access that is still less as compare
WAL fsync which we have to perform for each transaction. 

You mentioned that your clog is not on ssd, but surely at this scale of hardware, the hdd the clog is on has a bbu in front of it, no?

But I thought Andres' concern was not about fsync, but about the fact that the SLRU does linear scans (repeatedly) of the buffers while holding the control lock?  At some point, scanning more and more buffers under the lock is going to cause more contention than scanning fewer buffers and just evicting a page will.
 
Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Less than ideal error reporting in pg_stat_statements
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: Confusing remark about UPSERT in fdwhandler.sgml