Обсуждение: Early hint bit setting

Поиск
Список
Период
Сортировка

Early hint bit setting

От
Ants Aasma
Дата:
I was thinking about what is the earliest time where we could set hint
bits. This would be just after the commit has been made visible. When
the transaction completes and commit confirmation is sent to the
client the backend will usually go to sleep waiting on the network
socket waiting for further commands. Because most clients wait for the
commit confirmation before proceeding this means that we have atleast
one network RTT before this backend is expected to respond again.

The idea is to keep a small backend local ring buffer of pages that
have been modified. When a transaction has just committed, we do a
non-blocking read on the socket. When nothing is available we take the
opportunity to go and set hint bits in the recently modified buffers.

Hurting latency for single-threaded workloads using lots of
transactions is bad. It follows that it would be a bad idea to do
anything that could take a long time while waiting for the next
command. Because early hinting is a performance optimisation we can
safely skip it if it becomes bothersome. Anything that causes IO can
take too long. So we only set the hint bits when the page is still in
shared buffers to avoid reading in the page. Furthermore, we only hint
the tuples that the recently completed transaction modified to avoid
IO from CLOG (we could hint other tuples if their xid happens to be in
the SLRU, but it probably won't be very useful).

Hint bits are set sooner or later. Setting them earlier is a
throughput win for any workload because we avoid generating extra
load. We avoid doing any IO and we might save some so for IO this is a
pure win. The hinting CPU work needs to be done sooner or later, so
that's a tie, except for extremely bursty write heavy loads with lots
of transactions. Memory loads could in principle hurt other backends.
Refilling the whole last level cache of modern processors takes a few
hundred microseconds at peak speed. If the WAL is on fast storage
(BBWC, SSD) there's a pretty good chance that the page being hinted is
still in the cpu cache, avoiding the memory bandwidth overhead.

Abstraction wise, I think we need to set up a mechanism to run very
short maintenance jobs from backends waiting for new commands.
SocketBackend could check if there's anything to do, and call
pq_getbyte_if_available if there is anything to do before proceeding
to do it.

Setting hint bits early would help workloads with small synchronously
writing transactions. Async commits could also benefit from proactive
hint bit setting, but this would require some global cooperation and
isn't as clear of a win. One idea would be to copy the local ring
buffer entries to a global one tagged with the LSN when the
transaction has been made visible. When someone flushes xlog, they
also check if it enables some background hinting and set the
corresponding flag for any backend with spare cycles to pick up.

Comments?

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


Re: Early hint bit setting

От
Merlin Moncure
Дата:
On Wed, May 30, 2012 at 4:42 PM, Ants Aasma <ants@cybertec.at> wrote:
> I was thinking about what is the earliest time where we could set hint
> bits. This would be just after the commit has been made visible. When
> the transaction completes and commit confirmation is sent to the
> client the backend will usually go to sleep waiting on the network
> socket waiting for further commands. Because most clients wait for the
> commit confirmation before proceeding this means that we have atleast
> one network RTT before this backend is expected to respond again.
>
> The idea is to keep a small backend local ring buffer of pages that
> have been modified. When a transaction has just committed, we do a
> non-blocking read on the socket. When nothing is available we take the
> opportunity to go and set hint bits in the recently modified buffers.
>
> Hurting latency for single-threaded workloads using lots of
> transactions is bad. It follows that it would be a bad idea to do
> anything that could take a long time while waiting for the next
> command. Because early hinting is a performance optimisation we can
> safely skip it if it becomes bothersome. Anything that causes IO can
> take too long. So we only set the hint bits when the page is still in
> shared buffers to avoid reading in the page. Furthermore, we only hint
> the tuples that the recently completed transaction modified to avoid
> IO from CLOG (we could hint other tuples if their xid happens to be in
> the SLRU, but it probably won't be very useful).
>
> Hint bits are set sooner or later. Setting them earlier is a
> throughput win for any workload because we avoid generating extra
> load. We avoid doing any IO and we might save some so for IO this is a
> pure win. The hinting CPU work needs to be done sooner or later, so
> that's a tie, except for extremely bursty write heavy loads with lots
> of transactions. Memory loads could in principle hurt other backends.
> Refilling the whole last level cache of modern processors takes a few
> hundred microseconds at peak speed. If the WAL is on fast storage
> (BBWC, SSD) there's a pretty good chance that the page being hinted is
> still in the cpu cache, avoiding the memory bandwidth overhead.
>
> Abstraction wise, I think we need to set up a mechanism to run very
> short maintenance jobs from backends waiting for new commands.
> SocketBackend could check if there's anything to do, and call
> pq_getbyte_if_available if there is anything to do before proceeding
> to do it.
>
> Setting hint bits early would help workloads with small synchronously
> writing transactions. Async commits could also benefit from proactive
> hint bit setting, but this would require some global cooperation and
> isn't as clear of a win. One idea would be to copy the local ring
> buffer entries to a global one tagged with the LSN when the
> transaction has been made visible. When someone flushes xlog, they
> also check if it enables some background hinting and set the
> corresponding flag for any backend with spare cycles to pick up.
>
> Comments?

I think this is a really neat idea, and could solve a lot of problems.Since you don't have to do any clog checks (you
knowwhen you commit)
 
-- i think it's a win all around -- so much so that it might be worth
seeing the worst case latency hit if you force one page out always
before doing the socket check.  Hm, could you shave cpu cycles by just
storing the specific offsets of the hint bit bytes you want to set, or
is that too hacky?

merlin


Re: Early hint bit setting

От
Ants Aasma
Дата:
On Thu, May 31, 2012 at 1:01 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I think this is a really neat idea, and could solve a lot of problems.
>  Since you don't have to do any clog checks (you know when you commit)
> -- i think it's a win all around -- so much so that it might be worth
> seeing the worst case latency hit if you force one page out always
> before doing the socket check.  Hm, could you shave cpu cycles by just
> storing the specific offsets of the hint bit bytes you want to set, or
> is that too hacky?

Maybe even do both. By default store tuple offsets, but when the last
item was from the same page convert it to a page hinting request. I
have a specific near-realtime datawarehouse workload in mind where
bulk load is being constantly performed by smallish transactions. By
having page granularity in the buffer almost all pages could be hinted
before hitting the disk. The latency vs throughput tradeoff could
possibly be per backend tunable.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


Re: Early hint bit setting

От
Jim Nasby
Дата:
On 5/30/12 4:42 PM, Ants Aasma wrote:
> I was thinking about what is the earliest time where we could set hint
> bits. This would be just after the commit has been made visible.

Except that's only true when there are no other transactions running. That's been one of the big sticking points about
tryingto proactively set hint bits; in a real system you're not going to gain very much unless you wait a while before
settingthem.
 

An interesting option might be to keep the first XID that dirtied a page and loop through all pages in the background
lookingfor pages where first_dirty_xid is < the oldest running XID. Those pages would have hint bits that could be set.
Whilescanning the page you would want to set first_dirty_xid to the oldest XID that could not be hinted.
 

This is a modification of the idea to set hint bits when a page is on it's way out of the buffer; the advantage here is
thatit would also handle pages that are too hot to leave the buffer.
 
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


Re: Early hint bit setting

От
Merlin Moncure
Дата:
On Wed, Jun 6, 2012 at 5:41 PM, Jim Nasby <jim@nasby.net> wrote:
> On 5/30/12 4:42 PM, Ants Aasma wrote:
>>
>> I was thinking about what is the earliest time where we could set hint
>> bits. This would be just after the commit has been made visible.
>
>
> Except that's only true when there are no other transactions running. That's
> been one of the big sticking points about trying to proactively set hint
> bits; in a real system you're not going to gain very much unless you wait a
> while before setting them.

are you sure?   the relevant code to set hint bit during tuple scan
looks like this:
    else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple)))    {        if
(HeapTupleHeaderGetCmin(tuple)>= snapshot->curcid)            return false;    /* inserted after scan started */
 
        if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */            return true;
        if (tuple->t_infomask & HEAP_IS_LOCKED)        /* not deleter */            return true;
        Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));
        if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))        {            /* deleting
subtransactionmust have aborted */            SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);           return true;        }
 
        if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)            return true;    /* deleted after scan started
*/       else            return false;    /* deleted before scan started */    }    else if
(TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))       return false;    else if
(TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))       SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
        HeapTupleHeaderGetXmin(tuple));    else    {        /* it must have aborted or crashed */
SetHintBits(tuple,buffer, HEAP_XMIN_INVALID,                    InvalidTransactionId);        return false;    }
 

The backend that commits the transaction knows that the transaction is
committed and that it's not in progress (at least from itself).   Why
do you have to wait for other transactions in progress to finish?
Setting the xmin committed bit doesn't keep you from checking the xmax
based rules.

merlin


Re: Early hint bit setting

От
Robert Haas
Дата:
On Wed, Jun 6, 2012 at 6:41 PM, Jim Nasby <jim@nasby.net> wrote:
> Except that's only true when there are no other transactions running. That's
> been one of the big sticking points about trying to proactively set hint
> bits; in a real system you're not going to gain very much unless you wait a
> while before setting them.

No, the committed hint bit just means that the transaction is
committed.  You don't have to wait for it to be all-visible.

I think my biggest concern about this is that it inevitably relies on
some assumption about how much latency there will be before the client
sends the next request.  That strikes me as impossible to tune.  On
system A, connected to the Internet via an overloaded 56k modem link,
you can get away with doing a huge amount of fiddling around while
waiting for the next request.  But on system B, which uses 10GE or
Infiniband or local sockets, the acceptable latency will be much less.Even given identical hardware, scheduler behavior
maymatter quite a
 
lot - rumor has it that FreeBSD's scheduling latency may be
significantly less than on Linux, although I have not verified it and
rumor may lie.  But the point is that whether or not this works out to
a win on any given system seems like it will depend on an awful lot of
stuff that we can't know or control.

I would be more inclined to look at trying to make this happen in a
background process, although that's not without its own challenges.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company