Обсуждение: Performance bug in prepared statement binding in 9.2?

Поиск
Список
Период
Сортировка

Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
Folks,

I'm seeing what may be a major performance bug in BIND in 9.2.4.

We have a client who has an application which uses
Tomcat+Hibernate+JDBC.  They are in the process of upgrading this
application from 8.4.17 to 9.2.4.  As part of this, they have been doing
performance testing, and 9.2 is coming out MUCH worse than 8.4.  The
problem appears to be bind/plan time.

Their application does not use prepared queries usefully, doing
parse,bind,execute on every query cycle.

Here's timings overall for 29 test cycles (cycle 1 has been omitted).
As you can see, parse+execute times are pretty much constant, as are
application think times, but bind times vary quite a lot.  In 8.4, the
29 cycles are constantly 4.5min to 5.75min long.  In 9.2, which is the
chart below, they are all over the place.

Definitions:
cycle: test cycle #, arbitrary.  Each cycle does the same amount of
"work" measured in rows of data.
non_bind_time: time spent in PARSE and EXECUTE.
bind_time: time spent in BIND
app_time: time spent outside Postgres
all times are in minutes:

 cycle | non_bind_time | bind_time | app_time | total_time
-------+---------------+-----------+----------+------------
     2 |          0.79 |      0.62 |     3.19 |       4.60
     3 |          0.77 |      0.87 |     3.13 |       4.77
     4 |          0.76 |      1.10 |     3.16 |       5.02
     5 |          0.76 |      1.26 |     3.08 |       5.10
     6 |          0.72 |      1.40 |     3.08 |       5.20
     7 |          0.72 |      1.51 |     3.05 |       5.28
     8 |          0.70 |      1.60 |     3.07 |       5.37
     9 |          0.73 |      1.72 |     3.05 |       5.50
    10 |          0.71 |      1.84 |     3.05 |       5.60
    11 |          0.70 |      1.96 |     3.07 |       5.73
    12 |          0.74 |      2.11 |     3.08 |       5.93
    13 |          0.74 |      3.58 |     3.08 |       7.40
    14 |          0.73 |      2.41 |     3.08 |       6.22
    15 |          0.75 |      4.15 |     3.08 |       7.98
    16 |          0.74 |      2.69 |     3.09 |       6.52
    17 |          0.76 |      4.68 |     3.09 |       8.53
    18 |          0.74 |      2.99 |     3.09 |       6.82
    19 |          0.77 |      5.24 |     3.11 |       9.12
    20 |          0.75 |      3.29 |     3.08 |       7.12
    21 |          0.78 |      5.90 |     3.14 |       9.82
    22 |          0.78 |      3.57 |     3.12 |       7.47
    23 |          0.76 |      6.17 |     3.10 |      10.03
    24 |          0.77 |      6.61 |     3.10 |      10.48
    25 |          0.77 |      3.97 |     3.11 |       7.85
    26 |          0.77 |      5.24 |     3.12 |       9.13
    27 |          0.76 |      7.15 |     3.12 |      11.03
    28 |          0.76 |      4.37 |     3.10 |       8.23
    29 |          0.78 |      4.48 |     3.12 |       8.38
    30 |          0.76 |      7.73 |     3.11 |      11.60

I pulled out some of the queries with the greatest variance in bind
time.  Unexpectedly, they are not particularly complex.  Here's the
anonymized plan for a query which in the logs took 80ms to bind:

http://explain.depesz.com/s/YSj

Nested Loop  (cost=8.280..26.740 rows=1 width=289)
  ->  Nested Loop  (cost=8.280..18.450 rows=1 width=248)
        ->  Hash Join  (cost=8.280..10.170 rows=1 width=140)
                Hash Cond: (foxtrot2kilo_oscar.quebec_seven =
kilo_juliet1kilo_oscar.sierra_quebec)
              ->  Seq Scan on foxtrot november  (cost=0.000..1.640
rows=64 width=25)
              ->  Hash  (cost=8.270..8.270 rows=1 width=115)
                    ->  Index Scan using quebec_six on victor_india
sierra_oscar  (cost=0.000..8.270 rows=1 width=115)
                            Index Cond: (quebec_seven = 10079::bigint)
        ->  Index Scan using alpha on seven_tango lima
(cost=0.000..8.270 rows=1 width=108)
                Index Cond: ((xray = 10079::bigint) AND (golf =
10002::bigint))
  ->  Index Scan using six on india victor_romeo  (cost=0.000..8.280
rows=1 width=41)
          Index Cond: (quebec_seven = seven_victor0kilo_oscar.delta)

As you can see, it's not particularly complex; it only joins 4 tables,
and it has 2 parameters.  This database does have some horrible ugly
queries with up to 500 parameters, but inexplicably those don't take a
particularly long time to bind.

Note that I have not been able to reproduce this long bind time
interactively, but it's 100% reproducable in the test.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
On 05/29/2013 05:14 PM, Josh Berkus wrote:
> Here's timings overall for 29 test cycles (cycle 1 has been omitted).
> As you can see, parse+execute times are pretty much constant, as are
> application think times, but bind times vary quite a lot.  In 8.4, the
> 29 cycles are constantly 4.5min to 5.75min long.  In 9.2, which is the
> chart below, they are all over the place.

To be clear, the TOTAL times for 8.4 are 4.5 to 5.75 minutes long.  Bind
times in 8.4 are a more-or-less constant 0.75 minutes.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Amit Kapila
Дата:
On Thursday, May 30, 2013 5:45 AM Josh Berkus wrote:
> Folks,
>
> I'm seeing what may be a major performance bug in BIND in 9.2.4.
>
> We have a client who has an application which uses
> Tomcat+Hibernate+JDBC.  They are in the process of upgrading this
> application from 8.4.17 to 9.2.4.  As part of this, they have been
> doing
> performance testing, and 9.2 is coming out MUCH worse than 8.4.  The
> problem appears to be bind/plan time.
>
> Their application does not use prepared queries usefully, doing
> parse,bind,execute on every query cycle.
>
> Here's timings overall for 29 test cycles (cycle 1 has been omitted).
> As you can see, parse+execute times are pretty much constant, as are
> application think times, but bind times vary quite a lot.  In 8.4, the
> 29 cycles are constantly 4.5min to 5.75min long.  In 9.2, which is the
> chart below, they are all over the place.

I think it might be because of choosing custom plan option due to which it might be generating new plan during
exec_bind_message().
exec_bind_message()->GetCachedPlan()->choose_custom_plan(). If it chooses custom plan, then it will regenerate the plan
whichcan cause extra cost 
observed in test.
Though there is calculation that it should not choose custom plan always, but still I guess the variation observed in
thetest can be due to this reason. 

To test if this is the cause, we might hack the code such that it always chooses generic plan, so that it doesn't need
togenerate plan again. 

With Regards,
Amit Kapila.



Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
Amit,

> I think it might be because of choosing custom plan option due to which it might be generating new plan during
exec_bind_message().
> exec_bind_message()->GetCachedPlan()->choose_custom_plan(). If it chooses custom plan, then it will regenerate the
planwhich can cause extra cost 
> observed in test.
> Though there is calculation that it should not choose custom plan always, but still I guess the variation observed in
thetest can be due to this reason. 

This is why I'm asking them to run tests on 9.1.  If 9.1 doesn't exhibit
this behavior, then customplan is liable to be at fault.

HOWEVER, that doesn't explain why creating a plan for a query during
application operation would take 80ms, but only 1.2ms when I do it
interactively.

FYI, per questions from IRC: the times for each "cycle" in my data are
cumulative minutes.  Each cycle runs around 500,000 queries, so that's
the aggregate across all queries.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
> This is why I'm asking them to run tests on 9.1.  If 9.1 doesn't exhibit
> this behavior, then customplan is liable to be at fault.

9.1 shows the same performance as 9.2.  So it's not the custom plan thing.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Amit Kapila
Дата:
On Thursday, May 30, 2013 11:36 PM Josh Berkus wrote:
> Amit,
>
> > I think it might be because of choosing custom plan option due to
> which it might be generating new plan during exec_bind_message().
> > exec_bind_message()->GetCachedPlan()->choose_custom_plan(). If it
> chooses custom plan, then it will regenerate the plan which can cause
> extra cost
> > observed in test.
> > Though there is calculation that it should not choose custom plan
> always, but still I guess the variation observed in the test can be due
> to this reason.
>
> This is why I'm asking them to run tests on 9.1.  If 9.1 doesn't
> exhibit
> this behavior, then customplan is liable to be at fault.
>
> HOWEVER, that doesn't explain why creating a plan for a query during
> application operation would take 80ms, but only 1.2ms when I do it
> interactively.

When you say interactively, does it mean that you are using psql to test the same?

> FYI, per questions from IRC: the times for each "cycle" in my data are
> cumulative minutes.  Each cycle runs around 500,000 queries, so that's
> the aggregate across all queries.

Today I tried to see the changes between 8.4 and 9.1 for bind path in server. Following is summary of whatever I could
seethe differences 

1. 4 new parameters are added to ParamListInfo, for which palloc is done in exec_bind_message
2. changed function for converting client to server encoding, but it seems for bind path, it will still follow same
pathas for 8.4 
2. small change in RevalidateCachedPlan() for new hook added in ParamListInfo
3. standard_ExecutorStart(), changes to setup After Statement Trigger context
4. InitPlan has some changes for FOR UPDATE/FOR SELECT statements and junk filter case (update/delete statements)

From the changes, it doesn't seem that any of such changes can cause the problem you have seen.

Do you think it can be due to
a. JDBC - communication, encoding or some other changes
b. can we assume that plans generated for all statements are same, if not it might have some cost for query plan
initialization(InitPlan) but again it should not be that big cost. 

How do measure  individual bind time cost (is the cost of only server side or it includes client bind or ..)?

With Regards,
Amit Kapila.



Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
> When you say interactively, does it mean that you are using psql to test the same?

Yes.  Of course, on the psql command line, there's no separate BIND
step, just PREPARE and EXECUTE.

> From the changes, it doesn't seem that any of such changes can cause the problem you have seen.

No, but clearly in this case something is broken.

> Do you think it can be due to
> a. JDBC - communication, encoding or some other changes
> b. can we assume that plans generated for all statements are same, if not it might have some cost for query plan
initialization(InitPlan) but again it should not be that big cost. 

I don't think we can assume that, no.

> How do measure  individual bind time cost (is the cost of only server side or it includes client bind or ..)?

From the PostgreSQL activity log.  BIND gets logged separately.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
Amit, All:

So we just retested this on 9.3b2.  The performance is the same as 9.1
and 9.2; that is, progressively worse as the test cycles go on, and
unacceptably slow compared to 8.4.

Some issue introduced in 9.1 is causing BINDs to get progressively
slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
thread, that can bloat to 200X time required for a BIND, and it's
definitely PostgreSQL-side.

I'm trying to produce a test case which doesn't involve the user's
application.  However, hints on other things to analyze would be keen.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Jeff Janes
Дата:
On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Amit, All:
>
> So we just retested this on 9.3b2.  The performance is the same as 9.1
> and 9.2; that is, progressively worse as the test cycles go on, and
> unacceptably slow compared to 8.4.
>
> Some issue introduced in 9.1 is causing BINDs to get progressively
> slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
> thread, that can bloat to 200X time required for a BIND, and it's
> definitely PostgreSQL-side.
>
> I'm trying to produce a test case which doesn't involve the user's
> application.  However, hints on other things to analyze would be keen.

Does it seem to be all CPU time (it is hard to imagine what else it
would be, but...)

Could you use oprofile or perf or gprof to get a profile of the
backend during a run?  That should quickly narrow it down to which C
function has the problem.

Did you test 9.0 as well?

If the connection is dropped and re-established between "cycles" does
the problem still show up?

Cheers,

Jeff


Re: Performance bug in prepared statement binding in 9.2?

От
Andrew Dunstan
Дата:
On 08/01/2013 03:20 PM, Jeff Janes wrote:
> On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> Amit, All:
>>
>> So we just retested this on 9.3b2.  The performance is the same as 9.1
>> and 9.2; that is, progressively worse as the test cycles go on, and
>> unacceptably slow compared to 8.4.
>>
>> Some issue introduced in 9.1 is causing BINDs to get progressively
>> slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
>> thread, that can bloat to 200X time required for a BIND, and it's
>> definitely PostgreSQL-side.
>>
>> I'm trying to produce a test case which doesn't involve the user's
>> application.  However, hints on other things to analyze would be keen.
> Does it seem to be all CPU time (it is hard to imagine what else it
> would be, but...)
>
> Could you use oprofile or perf or gprof to get a profile of the
> backend during a run?  That should quickly narrow it down to which C
> function has the problem.
>
> Did you test 9.0 as well?


This has been tested back to 9.0. What we have found is that the problem
disappears if the database has come in via dump/restore, but is present
if it is the result of pg_upgrade. There are some long-running
transactions also running alongside this - we are currently planning a
test where those are not present. We're also looking at constructing a
self-contained test case.

Here is some perf output from the bad case:

    +  14.67%  postgres           [.] heap_hot_search_buffer
    +  11.45%  postgres           [.] LWLockAcquire
    +   8.39%  postgres           [.] LWLockRelease
    +   6.60%  postgres           [.] _bt_checkkeys
    +   6.39%  postgres           [.] PinBuffer
    +   5.96%  postgres           [.] hash_search_with_hash_value
    +   5.43%  postgres           [.] hash_any
    +   5.14%  postgres           [.] UnpinBuffer
    +   3.43%  postgres           [.] ReadBuffer_common
    +   2.34%  postgres           [.] index_fetch_heap
    +   2.04%  postgres           [.] heap_page_prune_opt
    +   2.00%  libc-2.15.so       [.] 0x8041b
    +   1.94%  postgres           [.] _bt_next
    +   1.83%  postgres           [.] btgettuple
    +   1.76%  postgres           [.] index_getnext_tid
    +   1.70%  postgres           [.] LockBuffer
    +   1.54%  postgres           [.] ReadBufferExtended
    +   1.25%  postgres           [.] FunctionCall2Coll
    +   1.14%  postgres           [.] HeapTupleSatisfiesNow
    +   1.09%  postgres           [.] ReleaseAndReadBuffer
    +   0.94%  postgres           [.] ResourceOwnerForgetBuffer
    +   0.81%  postgres           [.] _bt_saveitem
    +   0.80%  postgres           [.] _bt_readpage
    +   0.79%  [kernel.kallsyms]  [k] 0xffffffff81170861
    +   0.64%  postgres           [.] CheckForSerializableConflictOut
    +   0.60%  postgres           [.] ResourceOwnerEnlargeBuffers
    +   0.59%  postgres           [.] BufTableLookup

and here is the good case:

    +   9.54%  libc-2.15.so       [.] 0x15eb1f
    +   7.31%  [kernel.kallsyms]  [k] 0xffffffff8117924b
    +   5.65%  postgres           [.] AllocSetAlloc
    +   3.57%  postgres           [.] SearchCatCache
    +   2.67%  postgres           [.] hash_search_with_hash_value
    +   1.69%  postgres           [.] base_yyparse
    +   1.49%  libc-2.15.so       [.] vfprintf
    +   1.34%  postgres           [.] MemoryContextAllocZeroAligned
    +   1.34%  postgres           [.] XLogInsert
    +   1.24%  postgres           [.] copyObject
    +   1.10%  postgres           [.] palloc
    +   1.09%  postgres           [.] _bt_compare
    +   1.04%  postgres           [.] core_yylex
    +   0.96%  postgres           [.] _bt_checkkeys
    +   0.95%  postgres           [.] expression_tree_walker
    +   0.88%  postgres           [.] ScanKeywordLookup
    +   0.87%  postgres           [.] pg_encoding_mbcliplen
    +   0.86%  postgres           [.] LWLockAcquire
    +   0.72%  postgres           [.] nocachegetattr
    +   0.67%  postgres           [.] FunctionCall2Coll
    +   0.63%  postgres           [.] fmgr_info_cxt_security
    +   0.62%  postgres           [.] hash_any
    +   0.62%  postgres           [.] ExecInitExpr
    +   0.58%  postgres           [.] hash_uint32
    +   0.55%  postgres           [.] PostgresMain
    +   0.55%  postgres           [.] LWLockRelease
    +   0.54%  postgres           [.] lappend
    +   0.52%  postgres           [.] slot_deform_tuple
    +   0.50%  postgres           [.] PinBuffer
    +   0.48%  postgres           [.] AllocSetFree
    +   0.46%  postgres           [.] check_stack_depth
    +   0.44%  postgres           [.] DirectFunctionCall1Coll
    +   0.43%  postgres           [.] ExecScanHashBucket
    +   0.36%  postgres           [.] deconstruct_array
    +   0.36%  postgres           [.] CatalogCacheComputeHashValue
    +   0.35%  postgres           [.] pfree
    +   0.33%  libc-2.15.so       [.] _IO_default_xsputn
    +   0.32%  libc-2.15.so       [.] malloc
    +   0.32%  postgres           [.] TupleDescInitEntry
    +   0.30%  postgres           [.] new_tail_cell.isra.2
    +   0.30%  libm-2.15.so       [.] 0x5898
    +   0.30%  postgres           [.] LockAcquireExtended
    +   0.30%  postgres           [.] _bt_first
    +   0.29%  postgres           [.] add_paths_to_joinrel
    +   0.28%  postgres           [.] MemoryContextCreate
    +   0.28%  postgres           [.] appendBinaryStringInfo
    +   0.28%  postgres           [.] MemoryContextStrdup
    +   0.27%  postgres           [.] heap_hot_search_buffer
    +   0.27%  postgres           [.] GetSnapshotData
    +   0.26%  postgres           [.] hash_search
    +   0.26%  postgres           [.] heap_getsysattr
    +   0.26%  [vdso]             [.] 0x7fff681ff70c
    +   0.25%  postgres           [.] compare_scalars
    +   0.25%  postgres           [.] pg_verify_mbstr_len



cheers

andrew


Re: Performance bug in prepared statement binding in 9.2?

От
Bruce Momjian
Дата:
On Mon, Sep  9, 2013 at 08:38:09PM -0400, Andrew Dunstan wrote:
>
> On 08/01/2013 03:20 PM, Jeff Janes wrote:
> >On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
> >>Amit, All:
> >>
> >>So we just retested this on 9.3b2.  The performance is the same as 9.1
> >>and 9.2; that is, progressively worse as the test cycles go on, and
> >>unacceptably slow compared to 8.4.
> >>
> >>Some issue introduced in 9.1 is causing BINDs to get progressively
> >>slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
> >>thread, that can bloat to 200X time required for a BIND, and it's
> >>definitely PostgreSQL-side.
> >>
> >>I'm trying to produce a test case which doesn't involve the user's
> >>application.  However, hints on other things to analyze would be keen.
> >Does it seem to be all CPU time (it is hard to imagine what else it
> >would be, but...)
> >
> >Could you use oprofile or perf or gprof to get a profile of the
> >backend during a run?  That should quickly narrow it down to which C
> >function has the problem.
> >
> >Did you test 9.0 as well?
>
>
> This has been tested back to 9.0. What we have found is that the
> problem disappears if the database has come in via dump/restore, but
> is present if it is the result of pg_upgrade. There are some
> long-running transactions also running alongside this - we are
> currently planning a test where those are not present. We're also
> looking at constructing a self-contained test case.
>
> Here is some perf output from the bad case:
>
>    +  14.67%  postgres           [.] heap_hot_search_buffer

OK, certainly looks like a HOT chain issue.  I think there are two
possibilities:

1)  the heap or index file is different from a dump/restore vs.
    pg_upgrade
2)  some other files is missing or changed between the two

My guess is that the dump/restore removes all the HOT chains as it just
dumps the most recent value of the chain.  Could it be HOT chain
overhead that you are seeing, rather than a pg_upgrade issue/bug?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Performance bug in prepared statement binding in 9.2?

От
Mel Llaguno
Дата:
Let me clarify further - when we reconstruct our schema (without the upgrade step) via a sql script, the problem still
persists.Restoring an upgraded DB which contains existing data into exactly the same instance will correct the
behavior. 

--------

Mel Llaguno | Principal Engineer (Performance/Deployment)
Coverity | 800 6th Avenue S.W. | Suite 410 | Calgary, AB | Canada | T2P 3G3
mllaguno@coverity.com

________________________________________
From: pgsql-performance-owner@postgresql.org
[pgsql-performance-owner@postgresql.org] on behalf of Andrew Dunstan
[andrew@dunslane.net]
Sent: Monday, September 09, 2013 6:38 PM
To: Jeff Janes
Cc: Josh Berkus; Amit Kapila; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance bug in prepared statement binding in
9.2?

On 08/01/2013 03:20 PM, Jeff Janes wrote:
> On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> Amit, All:
>>
>> So we just retested this on 9.3b2.  The performance is the same as 9.1
>> and 9.2; that is, progressively worse as the test cycles go on, and
>> unacceptably slow compared to 8.4.
>>
>> Some issue introduced in 9.1 is causing BINDs to get progressively
>> slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
>> thread, that can bloat to 200X time required for a BIND, and it's
>> definitely PostgreSQL-side.
>>
>> I'm trying to produce a test case which doesn't involve the user's
>> application.  However, hints on other things to analyze would be keen.
> Does it seem to be all CPU time (it is hard to imagine what else it
> would be, but...)
>
> Could you use oprofile or perf or gprof to get a profile of the
> backend during a run?  That should quickly narrow it down to which C
> function has the problem.
>
> Did you test 9.0 as well?


This has been tested back to 9.0. What we have found is that the problem
disappears if the database has come in via dump/restore, but is present
if it is the result of pg_upgrade. There are some long-running
transactions also running alongside this - we are currently planning a
test where those are not present. We're also looking at constructing a
self-contained test case.

Here is some perf output from the bad case:

    +  14.67%  postgres           [.] heap_hot_search_buffer
    +  11.45%  postgres           [.] LWLockAcquire
    +   8.39%  postgres           [.] LWLockRelease
    +   6.60%  postgres           [.] _bt_checkkeys
    +   6.39%  postgres           [.] PinBuffer
    +   5.96%  postgres           [.] hash_search_with_hash_value
    +   5.43%  postgres           [.] hash_any
    +   5.14%  postgres           [.] UnpinBuffer
    +   3.43%  postgres           [.] ReadBuffer_common
    +   2.34%  postgres           [.] index_fetch_heap
    +   2.04%  postgres           [.] heap_page_prune_opt
    +   2.00%  libc-2.15.so       [.] 0x8041b
    +   1.94%  postgres           [.] _bt_next
    +   1.83%  postgres           [.] btgettuple
    +   1.76%  postgres           [.] index_getnext_tid
    +   1.70%  postgres           [.] LockBuffer
    +   1.54%  postgres           [.] ReadBufferExtended
    +   1.25%  postgres           [.] FunctionCall2Coll
    +   1.14%  postgres           [.] HeapTupleSatisfiesNow
    +   1.09%  postgres           [.] ReleaseAndReadBuffer
    +   0.94%  postgres           [.] ResourceOwnerForgetBuffer
    +   0.81%  postgres           [.] _bt_saveitem
    +   0.80%  postgres           [.] _bt_readpage
    +   0.79%  [kernel.kallsyms]  [k] 0xffffffff81170861
    +   0.64%  postgres           [.] CheckForSerializableConflictOut
    +   0.60%  postgres           [.] ResourceOwnerEnlargeBuffers
    +   0.59%  postgres           [.] BufTableLookup

and here is the good case:

    +   9.54%  libc-2.15.so       [.] 0x15eb1f
    +   7.31%  [kernel.kallsyms]  [k] 0xffffffff8117924b
    +   5.65%  postgres           [.] AllocSetAlloc
    +   3.57%  postgres           [.] SearchCatCache
    +   2.67%  postgres           [.] hash_search_with_hash_value
    +   1.69%  postgres           [.] base_yyparse
    +   1.49%  libc-2.15.so       [.] vfprintf
    +   1.34%  postgres           [.] MemoryContextAllocZeroAligned
    +   1.34%  postgres           [.] XLogInsert
    +   1.24%  postgres           [.] copyObject
    +   1.10%  postgres           [.] palloc
    +   1.09%  postgres           [.] _bt_compare
    +   1.04%  postgres           [.] core_yylex
    +   0.96%  postgres           [.] _bt_checkkeys
    +   0.95%  postgres           [.] expression_tree_walker
    +   0.88%  postgres           [.] ScanKeywordLookup
    +   0.87%  postgres           [.] pg_encoding_mbcliplen
    +   0.86%  postgres           [.] LWLockAcquire
    +   0.72%  postgres           [.] nocachegetattr
    +   0.67%  postgres           [.] FunctionCall2Coll
    +   0.63%  postgres           [.] fmgr_info_cxt_security
    +   0.62%  postgres           [.] hash_any
    +   0.62%  postgres           [.] ExecInitExpr
    +   0.58%  postgres           [.] hash_uint32
    +   0.55%  postgres           [.] PostgresMain
    +   0.55%  postgres           [.] LWLockRelease
    +   0.54%  postgres           [.] lappend
    +   0.52%  postgres           [.] slot_deform_tuple
    +   0.50%  postgres           [.] PinBuffer
    +   0.48%  postgres           [.] AllocSetFree
    +   0.46%  postgres           [.] check_stack_depth
    +   0.44%  postgres           [.] DirectFunctionCall1Coll
    +   0.43%  postgres           [.] ExecScanHashBucket
    +   0.36%  postgres           [.] deconstruct_array
    +   0.36%  postgres           [.] CatalogCacheComputeHashValue
    +   0.35%  postgres           [.] pfree
    +   0.33%  libc-2.15.so       [.] _IO_default_xsputn
    +   0.32%  libc-2.15.so       [.] malloc
    +   0.32%  postgres           [.] TupleDescInitEntry
    +   0.30%  postgres           [.] new_tail_cell.isra.2
    +   0.30%  libm-2.15.so       [.] 0x5898
    +   0.30%  postgres           [.] LockAcquireExtended
    +   0.30%  postgres           [.] _bt_first
    +   0.29%  postgres           [.] add_paths_to_joinrel
    +   0.28%  postgres           [.] MemoryContextCreate
    +   0.28%  postgres           [.] appendBinaryStringInfo
    +   0.28%  postgres           [.] MemoryContextStrdup
    +   0.27%  postgres           [.] heap_hot_search_buffer
    +   0.27%  postgres           [.] GetSnapshotData
    +   0.26%  postgres           [.] hash_search
    +   0.26%  postgres           [.] heap_getsysattr
    +   0.26%  [vdso]             [.] 0x7fff681ff70c
    +   0.25%  postgres           [.] compare_scalars
    +   0.25%  postgres           [.] pg_verify_mbstr_len



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance






Re: Performance bug in prepared statement binding in 9.2?

От
Bruce Momjian
Дата:
On Tue, Sep 10, 2013 at 01:36:27AM +0000, Mel Llaguno wrote:
> Let me clarify further - when we reconstruct our schema (without the
> upgrade step) via a sql script, the problem still persists. Restoring
> an upgraded DB which contains existing data into exactly the same
> instance will correct the behavior.

I do not understand what you are saying above.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Performance bug in prepared statement binding in 9.2?

От
Mel Llaguno
Дата:
We're currently using an embedded PG 8.4.17 for our application. Our PG 9.x tests consists of one of the following :

- Take a 8.4.17 DB which contains only our application schema and required seed data and use pg_upgrade to create a 9.x
databasedirectory, run the analyze_new_cluster.sh script and fire up the application. Under this set of conditions, the
bindconnection issue is present during our test. 

- Start with a fresh PG 9.x DB (using use create_db) and use psql to recreate our application schema and required seed
data.When the application is started and our test executed, the bind connection issue is still present. 

In both of the above cases, a base application schema is used.

If I upgrade an 8.4.17 DB that contains additional application data (generated by interaction with our application) to
9.x,the bind connection issue is no longer present. Restoring this upgraded 9.x DB into any PG instance in the
previouslydescribed scenarios also seems to fix the bind connection issue. 

Please let me know if this clarifies my earlier post.

Thanks, M.

Mel Llaguno | Principal Engineer (Performance/Deployment)
Coverity | 800 6th Avenue S.W. | Suite 410 | Calgary, AB | Canada | T2P 3G3
mllaguno@coverity.com
________________________________________
From: Bruce Momjian [bruce@momjian.us]
Sent: Monday, September 09, 2013 7:45 PM
To: Mel Llaguno
Cc: Andrew Dunstan; Jeff Janes; Josh Berkus; Amit Kapila;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance bug in prepared statement binding in
9.2?

On Tue, Sep 10, 2013 at 01:36:27AM +0000, Mel Llaguno wrote:
> Let me clarify further - when we reconstruct our schema (without the
> upgrade step) via a sql script, the problem still persists. Restoring
> an upgraded DB which contains existing data into exactly the same
> instance will correct the behavior.

I do not understand what you are saying above.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +






Re: Performance bug in prepared statement binding in 9.2?

От
Bruce Momjian
Дата:
On Tue, Sep 10, 2013 at 02:06:08AM +0000, Mel Llaguno wrote:
> We're currently using an embedded PG 8.4.17 for our application. Our
> PG 9.x tests consists of one of the following :
>
> - Take a 8.4.17 DB which contains only our application schema and
> required seed data and use pg_upgrade to create a 9.x database
> directory, run the analyze_new_cluster.sh script and fire up the
> application. Under this set of conditions, the bind connection issue
> is present during our test.
>
> - Start with a fresh PG 9.x DB (using use create_db) and use psql
> to recreate our application schema and required seed data. When the
> application is started and our test executed, the bind connection
> issue is still present.
>
> In both of the above cases, a base application schema is used.
>
> If I upgrade an 8.4.17 DB that contains additional application data
> (generated by interaction with our application) to 9.x, the bind
> connection issue is no longer present. Restoring this upgraded 9.x DB
> into any PG instance in the previously described scenarios also seems
> to fix the bind connection issue.
>
> Please let me know if this clarifies my earlier post.

Yes, that is clear.  So it is the seed data that is causing the problem?
That is the only different I see from #2 and #3.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Performance bug in prepared statement binding in 9.2?

От
Mel Llaguno
Дата:
Bruce,

First of all, I'd like to thank you for taking some interest in this issue. We'd love to migrate to the latest PG
version,but this issue is currently preventing us from doing so. 

Regardless of the DB used (base application schema _or_ restored DB with additional app data + base application
schema),seed information is present in all tests. I guess my question is this - why would having existing data change
thebind behavior at all? Is it possible that the way indexes are created has changed between 8.4 -> 9.x?  

Thanks, M.

Mel Llaguno | Principal Engineer (Performance/Deployment)
Coverity | 800 6th Avenue S.W. | Suite 410 | Calgary, AB | Canada | T2P 3G3
mllaguno@coverity.com
________________________________________
From: Bruce Momjian
[bruce@momjian.us]
Sent: Monday, September 09, 2013 8:16 PM
To: Mel
Llaguno
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM]
Performance bug in prepared statement binding in 9.2?

On Tue, Sep 10, 2013
at 02:06:08AM +0000, Mel Llaguno wrote:
> We're currently using an embedded
PG 8.4.17 for our application. Our
> PG 9.x tests consists of one of the
following :
>
> - Take a 8.4.17 DB which contains only our application
schema and
> required seed data and use pg_upgrade to create a 9.x
database
> directory, run the analyze_new_cluster.sh script and fire up
the
> application. Under this set of conditions, the bind connection issue
>
is present during our test.
>
> - Start with a fresh PG 9.x DB (using use
create_db) and use psql
> to recreate our application schema and required
seed data. When the
> application is started and our test executed, the bind
connection
> issue is still present.
>
> In both of the above cases, a base
application schema is used.
>
> If I upgrade an 8.4.17 DB that contains
additional application data
> (generated by interaction with our
application) to 9.x, the bind
> connection issue is no longer present.
Restoring this upgraded 9.x DB
> into any PG instance in the previously
described scenarios also seems
> to fix the bind connection issue.
>
>
Please let me know if this clarifies my earlier post.

Yes, that is clear.
So it is the seed data that is causing the problem?
That is the only
different I see from #2 and #3.

--
  Bruce Momjian  <bruce@momjian.us>
http://momjian.us
  EnterpriseDB
http://enterprisedb.com

  + It's impossible for everything to be true. +






Re: Performance bug in prepared statement binding in 9.2?

От
Bruce Momjian
Дата:
On Tue, Sep 10, 2013 at 05:43:16AM +0000, Mel Llaguno wrote:
> Bruce,
>
> First of all, I'd like to thank you for taking some interest in this
> issue. We'd love to migrate to the latest PG version, but this issue
> is currently preventing us from doing so.
>
> Regardless of the DB used (base application schema _or_ restored DB
> with additional app data + base application schema), seed information
> is present in all tests. I guess my question is this - why would
> having existing data change the bind behavior at all? Is it possible
> that the way indexes are created has changed between 8.4 -> 9.x?

I don't know as you have not shown us exactly what is slower between
versions --- you only said the bug appears or not in certain
circumstances.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2013-09-09 20:38:09 -0400, Andrew Dunstan wrote:
>
> On 08/01/2013 03:20 PM, Jeff Janes wrote:
> >On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
> >>Amit, All:
> >>
> >>So we just retested this on 9.3b2.  The performance is the same as 9.1
> >>and 9.2; that is, progressively worse as the test cycles go on, and
> >>unacceptably slow compared to 8.4.
> >>
> >>Some issue introduced in 9.1 is causing BINDs to get progressively
> >>slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
> >>thread, that can bloat to 200X time required for a BIND, and it's
> >>definitely PostgreSQL-side.
> >>
> >>I'm trying to produce a test case which doesn't involve the user's
> >>application.  However, hints on other things to analyze would be keen.
> >Does it seem to be all CPU time (it is hard to imagine what else it
> >would be, but...)
> >
> >Could you use oprofile or perf or gprof to get a profile of the
> >backend during a run?  That should quickly narrow it down to which C
> >function has the problem.
> >
> >Did you test 9.0 as well?
>
>
> This has been tested back to 9.0. What we have found is that the problem
> disappears if the database has come in via dump/restore, but is present if
> it is the result of pg_upgrade. There are some long-running transactions
> also running alongside this - we are currently planning a test where those
> are not present. We're also looking at constructing a self-contained test
> case.
>
> Here is some perf output from the bad case:
>
>    +  14.67%  postgres           [.] heap_hot_search_buffer
>    +  11.45%  postgres           [.] LWLockAcquire
>    +   8.39%  postgres           [.] LWLockRelease
>    +   6.60%  postgres           [.] _bt_checkkeys
>    +   6.39%  postgres           [.] PinBuffer
>    +   5.96%  postgres           [.] hash_search_with_hash_value
>    +   5.43%  postgres           [.] hash_any
>    +   5.14%  postgres           [.] UnpinBuffer
>    +   3.43%  postgres           [.] ReadBuffer_common
>    +   2.34%  postgres           [.] index_fetch_heap
>    +   2.04%  postgres           [.] heap_page_prune_opt
>    +   2.00%  libc-2.15.so       [.] 0x8041b
>    +   1.94%  postgres           [.] _bt_next
>    +   1.83%  postgres           [.] btgettuple
>    +   1.76%  postgres           [.] index_getnext_tid
>    +   1.70%  postgres           [.] LockBuffer
>    +   1.54%  postgres           [.] ReadBufferExtended
>    +   1.25%  postgres           [.] FunctionCall2Coll
>    +   1.14%  postgres           [.] HeapTupleSatisfiesNow
>    +   1.09%  postgres           [.] ReleaseAndReadBuffer
>    +   0.94%  postgres           [.] ResourceOwnerForgetBuffer
>    +   0.81%  postgres           [.] _bt_saveitem
>    +   0.80%  postgres           [.] _bt_readpage
>    +   0.79%  [kernel.kallsyms]  [k] 0xffffffff81170861
>    +   0.64%  postgres           [.] CheckForSerializableConflictOut
>    +   0.60%  postgres           [.] ResourceOwnerEnlargeBuffers
>    +   0.59%  postgres           [.] BufTableLookup

After a second look at this, I very tentatively guess that you'll see
get_actual_variable_range() as the entry point here. Which would explain
why you're seing this during PARSE.

But there still is the question why we never actually seem to prune...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
Hi,

On 2013-09-09 20:38:09 -0400, Andrew Dunstan wrote:
>
> On 08/01/2013 03:20 PM, Jeff Janes wrote:
> >On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus <josh@agliodbs.com> wrote:
> >>Amit, All:
> >>
> >>So we just retested this on 9.3b2.  The performance is the same as 9.1
> >>and 9.2; that is, progressively worse as the test cycles go on, and
> >>unacceptably slow compared to 8.4.
> >>
> >>Some issue introduced in 9.1 is causing BINDs to get progressively
> >>slower as the PARSEs BINDs get run repeatedly.  Per earlier on this
> >>thread, that can bloat to 200X time required for a BIND, and it's
> >>definitely PostgreSQL-side.
> >>
> >>I'm trying to produce a test case which doesn't involve the user's
> >>application.  However, hints on other things to analyze would be keen.
> >Does it seem to be all CPU time (it is hard to imagine what else it
> >would be, but...)
> >
> >Could you use oprofile or perf or gprof to get a profile of the
> >backend during a run?  That should quickly narrow it down to which C
> >function has the problem.
> >
> >Did you test 9.0 as well?
>
>
> This has been tested back to 9.0. What we have found is that the problem
> disappears if the database has come in via dump/restore, but is present if
> it is the result of pg_upgrade. There are some long-running transactions
> also running alongside this - we are currently planning a test where those
> are not present. We're also looking at constructing a self-contained test
> case.
>
> Here is some perf output from the bad case:
>
>    +  14.67%  postgres           [.] heap_hot_search_buffer
>    +  11.45%  postgres           [.] LWLockAcquire
>    +   8.39%  postgres           [.] LWLockRelease
>    +   6.60%  postgres           [.] _bt_checkkeys
>    +   6.39%  postgres           [.] PinBuffer
>    +   5.96%  postgres           [.] hash_search_with_hash_value
>    +   5.43%  postgres           [.] hash_any
>    +   5.14%  postgres           [.] UnpinBuffer
>    +   3.43%  postgres           [.] ReadBuffer_common
>    +   2.34%  postgres           [.] index_fetch_heap
>    +   2.04%  postgres           [.] heap_page_prune_opt

A backtrace for this would be useful. Alternatively you could recompile
postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g.

Any chance you have older prepared xacts, older sessions or something
like that around? I'd expect heap_prune* to be present in workloads that
spend significant time in heap_hot_search_buffer...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Andrew Dunstan
Дата:
On 09/10/2013 08:20 AM, Andres Freund wrote:

> A backtrace for this would be useful. Alternatively you could recompile
> postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g.

It's using a custom build, so this should be doable.

>
> Any chance you have older prepared xacts, older sessions or something
> like that around? I'd expect heap_prune* to be present in workloads that
> spend significant time in heap_hot_search_buffer...


Not sure about prepared transactions. There are certainly probably old
prepared statements around, and long running transactions alongside this
one.

cheers

andrew


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2013-09-10 08:45:33 -0400, Andrew Dunstan wrote:
>
> On 09/10/2013 08:20 AM, Andres Freund wrote:
>
> >A backtrace for this would be useful. Alternatively you could recompile
> >postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g.
>
> It's using a custom build, so this should be doable.

Great.

> >Any chance you have older prepared xacts, older sessions or something
> >like that around? I'd expect heap_prune* to be present in workloads that
> >spend significant time in heap_hot_search_buffer...
>
>
> Not sure about prepared transactions. There are certainly probably old
> prepared statements around, and long running transactions alongside this
> one.

Ok, long running transactions will do the trick. I quicky checked and
doing an index lookup for min/max histogram lookups was added *after*
8.4 which would explain why you're not seing the issue there
(c.f. 40608e7f949fb7e4025c0ddd5be01939adc79eec).

It getting slower and slower during a testrun would be explained by the
additional tuple versions amassing which cannot be marked dead because
of older transactions around. I guess those are also part of the test?

If I interpret things correctly you're using serializable? I guess there
is no chance to use repeatable read instead?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2013-09-10 15:21:33 +0200, Andres Freund wrote:
> If I interpret things correctly you're using serializable? I guess there
> is no chance to use repeatable read instead?

Err, that wouldn't help much. Read committed. That lets PGXACT->xmin advance
these days and thus might help to reduce the impact of the longrunning
transactions.
Otherwise you will have to shorten those...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Andrew Dunstan
Дата:
On 09/10/2013 09:23 AM, Andres Freund wrote:
> On 2013-09-10 15:21:33 +0200, Andres Freund wrote:
>> If I interpret things correctly you're using serializable? I guess there
>> is no chance to use repeatable read instead?
> Err, that wouldn't help much. Read committed. That lets PGXACT->xmin advance
> these days and thus might help to reduce the impact of the longrunning
> transactions.
> Otherwise you will have to shorten those...
>


Yeah, we're looking at eliminating them.

cheers

andrew



Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
All,

We've confirmed that this issue is caused by having long-running idle
transactions on the server.  When we disabled their queueing system
(which prodiced hour-long idle txns), the progressive slowness went away.

Why that should affect 9.X far more strongly than 8.4, I'm not sure
about.  Does that mean that 8.4 was unsafe, or that this is something
which *could* be fixed in later versions?

I'm also confused as to why this would affect BIND time rather than
EXECUTE time.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Andrew Dunstan
Дата:
On 09/11/2013 02:35 PM, Josh Berkus wrote:
> All,
>
> We've confirmed that this issue is caused by having long-running idle
> transactions on the server.  When we disabled their queueing system
> (which prodiced hour-long idle txns), the progressive slowness went away.
>
> Why that should affect 9.X far more strongly than 8.4, I'm not sure
> about.  Does that mean that 8.4 was unsafe, or that this is something
> which *could* be fixed in later versions?
>
> I'm also confused as to why this would affect BIND time rather than
> EXECUTE time.
>


One thing that this made me wonder is why we don't have
transaction_timeout, or maybe transaction_idle_timeout.

cheers

andrew


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2013-09-11 15:06:23 -0400, Andrew Dunstan wrote:
>
> On 09/11/2013 02:35 PM, Josh Berkus wrote:
> >All,
> >
> >We've confirmed that this issue is caused by having long-running idle
> >transactions on the server.  When we disabled their queueing system
> >(which prodiced hour-long idle txns), the progressive slowness went away.
> >
> >Why that should affect 9.X far more strongly than 8.4, I'm not sure
> >about.  Does that mean that 8.4 was unsafe, or that this is something
> >which *could* be fixed in later versions?
> >
> >I'm also confused as to why this would affect BIND time rather than
> >EXECUTE time.
> >
>
>
> One thing that this made me wonder is why we don't have transaction_timeout,
> or maybe transaction_idle_timeout.

Because it's harder than it sounds, at least if you want to support
idle-in-transactions. Note that we do not support pg_cancel_backend()
for those yet...

Also, I think it might lead to papering over actual issues with
applications leaving transactions open. I don't really see a valid
reason for an application needing cancelling of long idle transactions.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2013-09-11 11:35:45 -0700, Josh Berkus wrote:
> All,
>
> We've confirmed that this issue is caused by having long-running idle
> transactions on the server.  When we disabled their queueing system
> (which prodiced hour-long idle txns), the progressive slowness went away.
>
> Why that should affect 9.X far more strongly than 8.4, I'm not sure
> about.  Does that mean that 8.4 was unsafe, or that this is something
> which *could* be fixed in later versions?

The explanation is in
http://archives.postgresql.org/message-id/20130910132133.GJ1024477%40alap2.anarazel.de

The referenced commit introduced a planner feature. Funnily you seem to
have been the trigger for it's introduction ;)

> I'm also confused as to why this would affect BIND time rather than
> EXECUTE time.

Because we're doing the histogram checks during planning and not during
execution.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
> The explanation is in
> http://archives.postgresql.org/message-id/20130910132133.GJ1024477%40alap2.anarazel.de
>
> The referenced commit introduced a planner feature. Funnily you seem to
> have been the trigger for it's introduction ;)

Oh, crap, the "off the end of the index" optimization?

It's the story of our lives: we can't optimize anything without
deoptimizing something else.  Dammit.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Jeff Janes
Дата:
On Wed, Sep 11, 2013 at 2:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-11 15:06:23 -0400, Andrew Dunstan wrote:
>
> One thing that this made me wonder is why we don't have transaction_timeout,
> or maybe transaction_idle_timeout.

Because it's harder than it sounds, at least if you want to support
idle-in-transactions. Note that we do not support pg_cancel_backend()
for those yet...

So we are left with pg_terminate_backend in a cron job.  That mostly seems to work, because usually apps that abandon connections in the idle-in-transaction state will never check back on them anyway, but cancel would be nicer.
 

Also, I think it might lead to papering over actual issues with
applications leaving transactions open. I don't really see a valid
reason for an application needing cancelling of long idle transactions.

Some of us make a living, at least partially, by papering over issues with 3rd party applications that we have no control over!

Cheers,

Jeff

Re: Performance bug in prepared statement binding in 9.2?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> The explanation is in
>> http://archives.postgresql.org/message-id/20130910132133.GJ1024477%40alap2.anarazel.de
>>
>> The referenced commit introduced a planner feature. Funnily you seem to
>> have been the trigger for it's introduction ;)

> Oh, crap, the "off the end of the index" optimization?

> It's the story of our lives: we can't optimize anything without
> deoptimizing something else.  Dammit.

I wonder if we could ameliorate this problem by making
get_actual_variable_range() use SnapshotDirty rather than either
SnapshotNow (as it does in released versions) or the active snapshot (as
it does in HEAD).  We discussed that idea in the SnapshotNow removal
thread, see eg
http://www.postgresql.org/message-id/CA+TgmoZ_q2KMkxZAoRxRHB7k1tOmjVjQgYt2JuA7=U7QZoLxNw@mail.gmail.com
In that thread I claimed that a current MVCC snapshot was the most
appropriate thing, which it probably is; but the argument for it isn't so
strong that I think we should be willing to spend unbounded effort to get
that version of the column min/max rather than some other approximation.
The best objection to it that I can see is Robert's security concern about
leakage of uncommitted values --- but I don't think that holds a huge
amount of water either.  We already try to limit the visibility of the
regular elements of the histogram, why are these not-yet-committed values
significantly more of an issue?

            regards, tom lane


Re: Performance bug in prepared statement binding in 9.2?

От
Kevin Grittner
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wonder if we could ameliorate this problem by making
> get_actual_variable_range() use SnapshotDirty

> In that thread I claimed that a current MVCC snapshot was the
> most appropriate thing, which it probably is;

If it reads from the end of the index, won't it actually be reading
starting with the value we will obtain using SnapshotDirty, and
since the transaction is not yet committed, won't we be making the
trip to the heap for each index tuple we read from that point?  Why
doesn't that make SnapshotDirty more appropriate for purposes of
cost estimation?

> but the argument for it isn't so strong that I think we should be
> willing to spend unbounded effort to get that version of the
> column min/max rather than some other approximation.

Since the whole point is to try to get the fastest runtime which
produces correct results, I agree even if the other value is in
some way more "correct".

> The best objection to it that I can see is Robert's security
> concern about leakage of uncommitted values --- but I don't think
> that holds a huge amount of water either.
It doesn't look like Robert did either, if you read the whole
message.  In fact, he also questioned why index tuples which would
need to be read if we process from that end of the index don't
matter for purposes of cost estimation.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Performance bug in prepared statement binding in 9.2?

От
Tom Lane
Дата:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder if we could ameliorate this problem by making
>> get_actual_variable_range() use SnapshotDirty

>> In that thread I claimed that a current MVCC snapshot was the
>> most appropriate thing, which it probably is;

> If it reads from the end of the index, won't it actually be reading
> starting with the value we will obtain using SnapshotDirty, and
> since the transaction is not yet committed, won't we be making the
> trip to the heap for each index tuple we read from that point?� Why
> doesn't that make SnapshotDirty more appropriate for purposes of
> cost estimation?

This code isn't used (directly) for cost estimation.  It's used for
selectivity estimation, that is counting how many valid rows the query
will fetch when it executes.  We can't know that with certainty, of
course, so the argument boils down to how good an approximation we're
getting and how much it costs to get it.

The original coding here was SnapshotNow, which could be defended on the
grounds that it was the newest available data and thus the best possible
approximation if the query gets executed with a snapshot taken later.
That's still what's being used in the field, and of course it's got the
problem we're on about that uncommitted rows cause us to move on to the
next index entry (and thrash the ProcArrayLock for each uncommitted row
we hit :-().  In HEAD we replaced that with GetActiveSnapshot, which is
a better approximation if you assume the query will be executed with this
same snapshot, and otherwise slightly worse (because older).  But it's
just the same as far as the cost imposed by uncommitted rows goes.

My proposal to use SnapshotDirty instead is based on the thought that
it will give the same answers as SnapshotNow for definitely-live or
definitely-dead rows, while not imposing such large costs for in-doubt
rows: it will accept the first in-doubt row, and thus prevent us from
moving on to the next index entry.  The reported problems seem to involve
hundreds or thousands of in-doubt rows at the extreme of the index,
so the cost will go down proportionally to whatever that count is.
As far as the accuracy of the approximation is concerned, it's a win
if you imagine that pending inserts will have committed by the time the
query's execution snapshot is taken.  Otherwise not so much.

The other thing we might consider doing is using SnapshotAny, which
would amount to just taking the extremal index entry at face value.
This would be cheap all right (in fact, we might later be able to optimize
it to not even visit the heap).  However, I don't like the implications
for the accuracy of the approximation.  It seems quite likely that an
erroneous transaction could insert a wildly out-of-range index entry
and then roll back --- or even if it committed, somebody might soon come
along and clean up the bogus entry in a separate transaction.  If we use
SnapshotAny then we'd believe that bogus entry until it got vacuumed, not
just till it got deleted.  This is a fairly scary proposition, because
the wackier that extremal value is, the more impact it will have on the
selectivity estimates.

If it's demonstrated that SnapshotDirty doesn't reduce the estimation
costs enough to solve the performance problem the complainants are facing,
I'd be willing to consider using SnapshotAny here.  But I don't want to
take that step until it's proven that the more conservative approach
doesn't solve the performance problem.

There's an abbreviated version of this argument in the comments in
my proposed patch at
http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
What I'm hoping will happen next is that the complainants will hot-patch
that and see if it fixes their problems.  We can't really determine
what to do without that information.

> It doesn't look like Robert did either, if you read the whole
> message.� In fact, he also questioned why index tuples which would
> need to be read if we process from that end of the index don't
> matter for purposes of cost estimation.

The key point here is that we are not looking to estimate the cost of
getting the extremal value.  This code gets called if we are interested
in a probe value falling within the endmost histogram bin; that's not
necessarily, or even probably, the extremal value.  We're just trying to
find out if the bin endpoint has moved a lot since the last ANALYZE.

Another way of explaining this is that what Robert was really suggesting
is that we ought to account for the cost of fetching rows that are then
determined not to be visible to the query.  That's true, but it would be
a conceptual error to factor that in by supposing that they're visible.
The planner doesn't, at present, account for this cost explicitly, but I
believe it's at least somewhat handled by the fact that we charge for page
accesses on the basis of selectivity fraction times physical table size.
A table containing a lot of uncommitted/dead tuples will get charged more
I/O than a less bloated table.  Of course, this approach can't account for
special cases like "there's a whole lot of uncommitted tuples right at the
end of the index range, and not so many elsewhere".  That's below the
granularity of what we can sensibly model at the moment, I'm afraid.

            regards, tom lane


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
Tom,

> There's an abbreviated version of this argument in the comments in
> my proposed patch at
> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
> What I'm hoping will happen next is that the complainants will hot-patch
> that and see if it fixes their problems.  We can't really determine
> what to do without that information.

Unfortunately, the original reporter of this issue will not be available
for testing for 2-3 weeks, and I haven't been able to devise a synthetic
test which clearly shows the issue.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
>> What I'm hoping will happen next is that the complainants will hot-patch
>> that and see if it fixes their problems.  We can't really determine
>> what to do without that information.

> Unfortunately, the original reporter of this issue will not be available
> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
> test which clearly shows the issue.

Well, we had a synthetic test from the other complainant.  What's at
stake now is whether this is a good enough fix for real-world cases.
I'm willing to wait ...

            regards, tom lane


Re: Performance bug in prepared statement binding in 9.2?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> Tom,
>> There's an abbreviated version of this argument in the comments in
>> my proposed patch at
>> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
>> What I'm hoping will happen next is that the complainants will hot-patch
>> that and see if it fixes their problems.  We can't really determine
>> what to do without that information.

> Unfortunately, the original reporter of this issue will not be available
> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
> test which clearly shows the issue.

Ping?  I've been waiting on committing that patch pending some real-world
testing.  It'd be nice to resolve this question before we ship 9.3.3,
which I'm supposing will be sometime in January ...

            regards, tom lane


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
On 12/31/2013 09:55 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Tom,
>>> There's an abbreviated version of this argument in the comments in
>>> my proposed patch at
>>> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
>>> What I'm hoping will happen next is that the complainants will hot-patch
>>> that and see if it fixes their problems.  We can't really determine
>>> what to do without that information.
>
>> Unfortunately, the original reporter of this issue will not be available
>> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
>> test which clearly shows the issue.
>
> Ping?  I've been waiting on committing that patch pending some real-world
> testing.  It'd be nice to resolve this question before we ship 9.3.3,
> which I'm supposing will be sometime in January ...

Did this patch every make it in?  Or did it hang waiting for verification?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Jeff Janes
Дата:
On Mon, Nov 10, 2014 at 10:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
On 12/31/2013 09:55 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Tom,
>>> There's an abbreviated version of this argument in the comments in
>>> my proposed patch at
>>> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
>>> What I'm hoping will happen next is that the complainants will hot-patch
>>> that and see if it fixes their problems.  We can't really determine
>>> what to do without that information.
>
>> Unfortunately, the original reporter of this issue will not be available
>> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
>> test which clearly shows the issue.
>
> Ping?  I've been waiting on committing that patch pending some real-world
> testing.  It'd be nice to resolve this question before we ship 9.3.3,
> which I'm supposing will be sometime in January ...

Did this patch every make it in?  Or did it hang waiting for verification?

It made it in:

commit 4162a55c77cbb54acb4ac442ef3565b813b9d07a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Feb 25 16:04:09 2014 -0500

Cheers,

Jeff

Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
On 11/10/2014 10:59 AM, Jeff Janes wrote:
> On Mon, Nov 10, 2014 at 10:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> On 12/31/2013 09:55 AM, Tom Lane wrote:
>>> Josh Berkus <josh@agliodbs.com> writes:
>>>> Tom,
>>>>> There's an abbreviated version of this argument in the comments in
>>>>> my proposed patch at
>>>>> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
>>>>> What I'm hoping will happen next is that the complainants will
>> hot-patch
>>>>> that and see if it fixes their problems.  We can't really determine
>>>>> what to do without that information.
>>>
>>>> Unfortunately, the original reporter of this issue will not be available
>>>> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
>>>> test which clearly shows the issue.
>>>
>>> Ping?  I've been waiting on committing that patch pending some real-world
>>> testing.  It'd be nice to resolve this question before we ship 9.3.3,
>>> which I'm supposing will be sometime in January ...
>>
>> Did this patch every make it in?  Or did it hang waiting for verification?
>>
>
> It made it in:
>
> commit 4162a55c77cbb54acb4ac442ef3565b813b9d07a
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Tue Feb 25 16:04:09 2014 -0500

Thanks, then the problem I'm seeing now is something else.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Andres Freund
Дата:
On 2014-11-10 10:48:24 -0800, Josh Berkus wrote:
> On 12/31/2013 09:55 AM, Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> >> Tom,
> >>> There's an abbreviated version of this argument in the comments in
> >>> my proposed patch at
> >>> http://www.postgresql.org/message-id/11927.1384199294@sss.pgh.pa.us
> >>> What I'm hoping will happen next is that the complainants will hot-patch
> >>> that and see if it fixes their problems.  We can't really determine
> >>> what to do without that information.
> >
> >> Unfortunately, the original reporter of this issue will not be available
> >> for testing for 2-3 weeks, and I haven't been able to devise a synthetic
> >> test which clearly shows the issue.
> >
> > Ping?  I've been waiting on committing that patch pending some real-world
> > testing.  It'd be nice to resolve this question before we ship 9.3.3,
> > which I'm supposing will be sometime in January ...
>
> Did this patch every make it in?  Or did it hang waiting for
> verification?

src/tools/git_changelog is your friend.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_4_BR [fccebe421] 2014-02-25 16:04:06 -0500
Branch: REL9_3_STABLE Release: REL9_3_4 [4162a55c7] 2014-02-25 16:04:09 -0500
Branch: REL9_2_STABLE Release: REL9_2_8 [00283cae1] 2014-02-25 16:04:12 -0500
Branch: REL9_1_STABLE Release: REL9_1_13 [3e2db4c80] 2014-02-25 16:04:16 -0500
Branch: REL9_0_STABLE Release: REL9_0_17 [1e0fb6a2c] 2014-02-25 16:04:20 -0500

    Use SnapshotDirty rather than an active snapshot to probe index endpoints.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Performance bug in prepared statement binding in 9.2?

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> On 11/10/2014 10:59 AM, Jeff Janes wrote:
>> On Mon, Nov 10, 2014 at 10:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Did this patch every make it in?  Or did it hang waiting for verification?

>> It made it in:
>> commit 4162a55c77cbb54acb4ac442ef3565b813b9d07a
>> Author: Tom Lane <tgl@sss.pgh.pa.us>
>> Date:   Tue Feb 25 16:04:09 2014 -0500

> Thanks, then the problem I'm seeing now is something else.

Notice that only went in this past Feb., so you need to check you're
dealing with a fairly recent minor release before you dismiss it.

            regards, tom lane


Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
On 11/10/2014 11:11 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 11/10/2014 10:59 AM, Jeff Janes wrote:
>>> On Mon, Nov 10, 2014 at 10:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> Did this patch every make it in?  Or did it hang waiting for verification?
>
>>> It made it in:
>>> commit 4162a55c77cbb54acb4ac442ef3565b813b9d07a
>>> Author: Tom Lane <tgl@sss.pgh.pa.us>
>>> Date:   Tue Feb 25 16:04:09 2014 -0500
>
>> Thanks, then the problem I'm seeing now is something else.
>
> Notice that only went in this past Feb., so you need to check you're
> dealing with a fairly recent minor release before you dismiss it.

It's 9.3.5

The new issue will get its own thread.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Performance bug in prepared statement binding in 9.2?

От
Jeff Janes
Дата:
On Mon, Nov 10, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
On 11/10/2014 10:59 AM, Jeff Janes wrote:
> On Mon, Nov 10, 2014 at 10:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
>

>> Did this patch every make it in?  Or did it hang waiting for verification?
>>
>
> It made it in:
>
> commit 4162a55c77cbb54acb4ac442ef3565b813b9d07a
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Tue Feb 25 16:04:09 2014 -0500

Thanks, then the problem I'm seeing now is something else.

The related problem where the "end" rows are actually needed (e.g. ORDER BY...LIMIT) has not been fixed. 

My idea to fix that was to check if the row's creation-transaction was in the MVCC snapshot (which just uses local memory) before checking if that creation-transaction had committed (which uses shared memory).  But I didn't really have the confidence to push that given the fragility of that part of the code and my lack of experience with it.  See "In progress INSERT wrecks plans on table" thread.

Simon also had some patches to still do the shared memory look up but make them faster by caching where in the list it would be likely to find the match, based on where it found the last match.



Cheers,

Jeff

Re: Performance bug in prepared statement binding in 9.2?

От
Josh Berkus
Дата:
On 11/10/2014 12:13 PM, Jeff Janes wrote:

> The related problem where the "end" rows are actually needed (e.g. ORDER
> BY...LIMIT) has not been fixed.
>
> My idea to fix that was to check if the row's creation-transaction was in
> the MVCC snapshot (which just uses local memory) before checking if that
> creation-transaction had committed (which uses shared memory).  But I
> didn't really have the confidence to push that given the fragility of that
> part of the code and my lack of experience with it.  See "In progress
> INSERT wrecks plans on table" thread.

Oh!  I thought this issue had been fixed by Tom's patch as well.  It
could very well describe what I'm seeing (in the other thread), since
some of the waiting queries are INSERTs, and other queries do selects
against the same tables concurrently.

Although ... given that I'm seeing preposterously long BIND times (like
50 seconds), I don't think that's explained just by bad plans.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com