Обсуждение: Performance of a nested loop, whose inner loop uses an index scan.

Поиск
Список
Период
Сортировка

Performance of a nested loop, whose inner loop uses an index scan.

От
negora
Дата:
Hello:

I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?

I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:

    ---------------------
    SELECT answers.*
    FROM answers
    JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
    JOIN courses ON courses.course_id = test_completions.course_id
    WHERE courses.group_id = 2;
    ---------------------


This yields the following plan:

    ---------------------
    Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
        ->  Hash Join  (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
                    Hash Cond: (test_completions.course_id =
courses.course_id)
                    ->  Seq Scan on test_completions  (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
                    ->  Hash  (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
                                Buckets: 4096  Batches: 1  Memory Usage:
161kB
                                ->  Bitmap Heap Scan on courses
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
                                            Recheck Cond: (group_id = 2)
                                            Heap Blocks: exact=117
                                            ->  Bitmap Index Scan on
fki_courses_group_id_fkey  (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
                                                        Index Cond:
(group_id = 2)
       ### HERE ###
        ->  Index Scan using fki_answers_test_completion_id_fkey on
answers  (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
       ### HERE ###
                    Index Cond: (test_completion_id =
test_completions.test_completion_id)
    Planning time: 0.523 ms
    Execution time: 2805.530 ms
    ---------------------

My doubt is about the inner loop of the nested loop, the one that I've
delimited with  ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:

    * Clustering the "answers" table.
    * Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
    * Upgrading the hardware.

Thank you!



Re: Performance of a nested loop, whose inner loop uses an index scan.

От
Matheus de Oliveira
Дата:

On Wed, Oct 19, 2016 at 8:54 AM, negora <public@negora.com> wrote:
    Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)

I wonder about the use-case for this query, because it returns more than 8M rows, so 2.6 seconds that sounds that much for so many rows. Is it for an end user application? Isn't there any kind of pagination?


--
Matheus de Oliveira


Re: Performance of a nested loop, whose inner loop uses an index scan.

От
negora
Дата:
<p>Hi Matheus:<p>Thanks for your prompt answer. It's for a web application. This part of the application allows to
exportthe answers to a CSV file. So pagination isn't possible here. The user can choose among several filters. The
groupof the courses is one of them. She can combine as many filters as she wants. So the query that I presented in my
previousmessage was one of the "broadest" examples. But it's the one that I'm interested in.<br /><p>Really, <b>I'm
moreinterested in the relative time than in the absolute time.</b> Because I could create the file asynchronously, in
thebackground, so that the user downloaded it at a later time. That's not the problem. My doubt is if 2.8 seconds is
thebest that I can do. Is it an acceptable time?<p>Thank you! ;)<p><br /><div class="moz-cite-prefix">On 19/10/16
13:15,Matheus de Oliveira wrote:<br /></div><blockquote
cite="mid:CAJghg4Lbs-a-_xTmdTWcuQTiZCpv8hTjv2-kWc2_teyNAWiD8w@mail.gmail.com"type="cite"><div dir="ltr"><div
class="gmail_extra"><br/><div class="gmail_quote">On Wed, Oct 19, 2016 at 8:54 AM, negora <span dir="ltr"><<a
href="mailto:public@negora.com"moz-do-not-send="true" target="_blank">public@negora.com</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0             .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
class="a3saXjCH m157dc958f4b446ce" id=":1ai">     Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38)
(actual<br/> time=1.091..2616.553 rows=8906075 loops=1)</div></blockquote></div><br /></div><div class="gmail_extra">I
wonderabout the use-case for this query, because it returns more than 8M rows, so 2.6 seconds that sounds that much for
somany rows. Is it for an end user application? Isn't there any kind of pagination?<br /></div><div
class="gmail_extra"><brclear="all" /><br /> -- <br /><div class="gmail_signature" data-smartmail="gmail_signature"><div
dir="ltr"><div>Matheusde Oliveira<br /><br /><br /></div></div></div></div></div></blockquote><br />