Re: Perplexing, regular decline in performance

Поиск
Список
Период
Сортировка
От Hugh Ranalli
Тема Re: Perplexing, regular decline in performance
Дата
Msg-id CAAhbUMPVgOyiTBDU-b9MaoixDEU3jZB4s63QrpefzGtLVbtKOw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Perplexing, regular decline in performance  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Perplexing, regular decline in performance
Re: Perplexing, regular decline in performance
Список pgsql-performance
On Tue, 25 Jun 2019 at 12:23, Justin Pryzby <pryzby@telsasoft.com> wrote:
What kernel?  Version?  OS?
Ubuntu 18.04; current kernel is 4.15.0-51-generic4

If Linux, I wonder if transparent hugepages or KSM are enabled ?  It seems
possible that truncating the table is clearing enough RAM to mitigate the
issue, similar to restarting the DB.
tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/khugepaged/defrag /sys/kernel/mm/transparent_hugepage/enabled /sys/kernel/mm/transparent_hugepage/defrag
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com
==> /sys/kernel/mm/ksm/run <==
0
==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==
1
==> /sys/kernel/mm/transparent_hugepage/enabled <==
always [madvise] never
==> /sys/kernel/mm/transparent_hugepage/defrag <==
always defer defer+madvise [madvise] never
  
From my research in preparing for the upgrade, I understood transparent huge pages were a good thing, and should be enabled. Is this not correct?


11.2 would have parallel query, and enabled by default.  Are there other
settings you've changed (or not changed)?
https://wiki.postgresql.org/wiki/Server_Configuration

I've just posted the parameters we are changing from the default in a previous reply, so I won't repeat them here unless you want me to.
 
It's possible that the "administrative" queries are using up lots of your
shared_buffers, which are (also/more) needed by the customer-facing queries.  I
would install pg_buffercache to investigate.  Or, just pause the admin queries
and see if that the issue goes away during that interval ?

Pausing the admin queries isn't an option in our environment, especially as the issue reveals itself over the course of days, not minutes or hours. 
        ?column?        |  count  | count |  datname  |        coalesce         |     toast      |         dirtyfrac          |        avg        
------------------------+---------+-------+-----------+-------------------------+----------------+----------------------------+--------------------
 0.24904101286779650995 | 1044545 |     0 | mydb      | position                |                | 0.000000000000000000000000 | 4.8035517857057379
 0.16701241622795295199 |  700495 |     0 | mydb      | stat_position_click     |                | 0.000000000000000000000000 | 1.9870234619804567
 0.09935032779251879171 |  416702 |  6964 | mydb      | pg_toast_19788          | harvested_job  |     0.01671218280689797505 | 1.9346079452462431
 0.06979762146872315533 |  292750 |     0 | mydb      | url                     |                | 0.000000000000000000000000 | 4.9627873612297182
 0.03795774662998486745 |  159205 |     0 | mydb      | stat_sponsored_position |                | 0.000000000000000000000000 | 1.8412361420809648
 0.02923155381784048663 |  122605 |     0 | mydb      | pg_toast_20174          | page           | 0.000000000000000000000000 | 3.0259532645487541
 0.02755283459406156353 |  115564 |     0 | mydb      | location                |                | 0.000000000000000000000000 | 4.9953532241874632
 0.02015273698468076320 |   84526 |  1122 | mydb      | harvested_job           |                |     0.01327402219435439983 | 4.9922154130090150
 0.01913348905375406298 |   80251 |     0 | mydb      | pg_toast_20257          | position_index | 0.000000000000000000000000 | 4.9880001495308470

harvested_job is the rapidly growing "problematic" table I am talking about. page is the 355 GB table that gets referenced on the public searches. I'll google, but is there a place I should look to understand what I am seeing here? Also, Should pg_buffercache perhaps be run at the beginning and end of the week, to see if there is a significant difference?
 
Could you send query plan for the slow (customer-facing) queries?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN

I can, but can I ask why this would matter? I'm not looking to optimise the query (although I'm sure it could be; this is a legacy system with lots of barnacles). The problem is that the same query performs increasingly slowly over the course of a week, seemingly in sync with the rows with a large toast column added to one particular table (which, as I mentioned, isn't referenced by the query in question). Wouldn't the plan be the same at both the start of the week (when the problematic table is essentially empty) and at the end (when it is much larger)? 

Thanks!
Hugh 


В списке pgsql-performance по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Perplexing, regular decline in performance
Следующее
От: Hugh Ranalli
Дата:
Сообщение: Re: Perplexing, regular decline in performance