Обсуждение: Re: Queries containing ORDER BY and LIMIT started to work slowly

Поиск
Список
Период
Сортировка

Re: Queries containing ORDER BY and LIMIT started to work slowly

От
Jeff Janes
Дата:


On Tue, Aug 29, 2023 at 1:47 PM Rondat Flyag <rondatflyag@yandex.ru> wrote:
I have a legacy system that uses `Posgresql 9.6` and `Ubuntu 16.04`. Everything was fine several days ago even with standard Postgresql settings. I dumped a database with the compression option (maximum compression level -Z 9) in order to have a smaller size (`pg_dump --compress=9 database_name > database_name.sql`). After that I got a lot of problems.

You describe taking a dump of the database, but don't describe doing anything with it.  Did you replace your system with one restored from that dump?  If so, did vacuum and analyze afterwards?

Cheers,

Jeff

Re: Queries containing ORDER BY and LIMIT started to work slowly

От
Jeff Janes
Дата:
On Tue, Aug 29, 2023 at 2:55 PM Rondat Flyag <rondatflyag@yandex.ru> wrote:
I took the dump just to store it on another storage (external HDD). I didn't do anything with it.

I don't see how that could cause the problem, it is probably just a coincidence.  Maybe taking the dump held a long-lived snapshot open which caused some bloat.   But if that was enough to push your system over the edge, it was probably too close to the edge to start with.

Do you have a plan for the query while it was fast?  If not, maybe you can force it back to the old plan by setting enable_seqscan=off or perhaps enable_sort=off, to let you capture the old plan for comparison.

The estimate for the seq scan of  isbns_statistics is off by almost a factor of 2.  A seq scan with no filters and which can not stop early should not be hard to estimate accurately, so this suggests autovac is not keeping up.  VACUUM ANALYZE all of the involved tables and see if that fixes things.

Cheers,

Jeff

Re: Queries containing ORDER BY and LIMIT started to work slowly

От
Rick Otten
Дата:


On Tue, Aug 29, 2023 at 3:57 PM Rondat Flyag <rondatflyag@yandex.ru> wrote:
I took the dump just to store it on another storage (external HDD). I didn't do anything with it.
 
29.08.2023, 21:42, "Jeff Janes" <jeff.janes@gmail.com>:
 
 
On Tue, Aug 29, 2023 at 1:47 PM Rondat Flyag <rondatflyag@yandex.ru> wrote:
I have a legacy system that uses `Posgresql 9.6` and `Ubuntu 16.04`. Everything was fine several days ago even with standard Postgresql settings. I dumped a database with the compression option (maximum compression level -Z 9) in order to have a smaller size (`pg_dump --compress=9 database_name > database_name.sql`). After that I got a lot of problems.
 
You describe taking a dump of the database, but don't describe doing anything with it.  Did you replace your system with one restored from that dump?  If so, did vacuum and analyze afterwards?
 
Cheers,
 
Jeff

Since this is a very old system and backups are fairly I/O intensive, it is possible you have a disk going bad?  Sometimes after doing a bunch of I/O on an old disk, it will accelerate its decline.  You could be about to lose it altogether.

Re: Queries containing ORDER BY and LIMIT started to work slowly

От
David Rowley
Дата:
On Thu, 31 Aug 2023 at 06:32, Rondat Flyag <rondatflyag@yandex.ru> wrote:
> I tried VACUUM ANALYZE for three tables, but without success. I also tried to set enable_seqscan=off and the query
tookeven more time. If I set enable_sort=off then the query takes a lot of time and I cancel it.
 
>
> Please see the attached query plans.

It's a little hard to comment here as I don't see what the plan was
before when you were happy with the performance. I also see the
queries you mentioned in the initial email don't match the plans.
There's no table called "isbns" in the query. I guess this is "asins"?

Likely you could get a faster plan if there was an index on
asins_statistics (asin_id).  That would allow a query plan that scans
the isbns_value_key index and performs a parameterised nested loop on
asins_statistics using the asins_statistics (asin_id) index.  Looking
at your schema, I don't see that index, so it's pretty hard to guess
why the plan used to be faster.  Even if the books/asins merge join
used to take place first, there'd have been no efficient way to join
to the asins_statistics table and preserve the Merge Join's order (I'm
assuming non-parameterized nested loops would be inefficient in this
case). Doing that would have also required the asins_statistics
(asin_id) index.  Are you sure that index wasn't dropped?

However, likely it's a waste of time to try to figure out what the
plan used to be. Better to focus on trying to make it faster. I
suggest you create the asins_statistics (asin_id) index. However, I
can't say with any level of confidence that the planner would opt to
use that index if it did exist.   Lowering random_page_cost or
increasing effective_cache_size would increase the chances of that.

David



Re: Queries containing ORDER BY and LIMIT started to work slowly

От
Jeff Janes
Дата:
On Wed, Aug 30, 2023 at 1:31 PM Rondat Flyag <rondatflyag@yandex.ru> wrote:
Hi and thank you for the response.
 
I tried VACUUM ANALYZE for three tables, but without success. I also tried to set enable_seqscan=off and the query took even more time. If I set enable_sort=off then the query takes a lot of time and I cancel it.

Maybe you could restore (to a temp server, not the production) a physical backup taken from before the change happened, and get an old plan that way.  I'm guessing that somehow an index got dropped around the same time you took the dump.  That might be a lot of work, and maybe it would just be easier to optimize the current query while ignoring the past.  But you seem to be interested in a root-cause analysis, and I don't see any other way to do one of those.

What I would expect to be the winning plan would be something sort-free like:

Limit
  merge join
    index scan yielding books in asin order (already being done)
    nested loop
       index scan yielding asins in value order
       index scan probing asins_statistics driven by asins_statistics.asin_id = asins.id

Or possibly a 2nd nested loop rather than the merge join just below the limit, but with the rest the same

In addition to the "books" index already evident in your current plan, you would also need an index leading with asins_statistics.asin_id, and one leading with asins.value.  But if all those indexes exists, it is hard to see why setting enable_seqscan=off wouldn't have forced them to be used.

 Cheers,

Jeff