Re: User concurrency thresholding: where do I look?

Поиск
Список
Период
Сортировка
От Jignesh K. Shah
Тема Re: User concurrency thresholding: where do I look?
Дата
Msg-id 46A8BD62.2080904@sun.com
обсуждение исходный текст
Ответ на Re: User concurrency thresholding: where do I look?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: User concurrency thresholding: where do I look?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: User concurrency thresholding: where do I look?  ("Simon Riggs" <simon@2ndquadrant.com>)
Список pgsql-performance
I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..


For  600-850 Users: that potential mislabeled CheckPointStartLock  or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:



              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage_ReadOnly+0xc
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
          8202100

              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
         30822100


However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:



              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        406601300

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesUpdate+0x360
              postgres`heap_lock_tuple+0x27c
              postgres`ExecutePlan+0x33c
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        444523100

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
       1661300000



Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..

-Jignesh




Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> The count is only for a 10-second snapshot.. Plus remember there are
>> about 1000 users running so the connection  being profiled only gets
>> 0.01 of the period on CPU..  And the count is for that CONNECTION only.
>>
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here.  Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode.  The bgwriter would do that
> but it'd not be taking most of these other locks.  So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise.  You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
>             regards, tom lane
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Simon Riggs"
Дата:
Сообщение: Re: User concurrency thresholding: where do I look?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: User concurrency thresholding: where do I look?