Re: Query optimization

Поиск
Список
Период
Сортировка
От Jorge Arevalo
Тема Re: Query optimization
Дата
Msg-id CAMhtMNMA2_usWDiDCo89qJfb4iW43rOYTLjdqh7qS60w9=2ZpQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query optimization  (David Johnston <david.g.johnston@gmail.com>)
Ответы Re: Query optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Query optimization  (David Johnston <david.g.johnston@gmail.com>)
Список pgsql-general


On Wed, Oct 29, 2014 at 8:05 PM, David Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Oct 29, 2014 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jorge Arevalo <jorgearevalo@libregis.org> writes:

> This is the result of EXPLAIN ANALYZE

>                                                                    QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=67846.38..395773.45
> rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127
> loops=1)
>    InitPlan 2 (returns $1)
>      ->  Result  (cost=67846.29..67846.29 rows=1 width=0) (actual
> time=7009.063..7009.065 rows=1 loops=1)
>            InitPlan 1 (returns $0)
>              ->  Seq Scan on table2 p  (cost=0.00..67846.29 rows=12689
> width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)
>                    Filter: (f3 = field7)

Hm.  If I'm reading that right, you're building an array containing
2537787 entries, each of which is a composite datum containing two
columns of unmentioned datatypes.  I suspect a big chunk of your
runtime is going into manipulating that array -- PG is not terribly
efficient with big arrays containing variable-width values.


The seq scan over table2 is for finding entries in table2 (which contains 2537787) that matches a condition using a column from table1 (entries that match table1.field7 = table2.f3). But the array isn't going to contain all the entries, Just a few of them.

I think the time is being used in scanning table2 for all the rows of table1 (plus than 8 million).

 
I'm also a bit confused as to why the planner is saying that the (SELECT
ARRAY(...)) bit is an InitPlan and not a SubPlan.  That implies that
"field7" in the innermost WHERE clause is not a reference to table1 but a
reference to table2.  Is that really what you meant?  IOW, are you sure
this query is performing the right calculation in the first place?


I thought the InitPlan was in place because the planner choose to execute the correlated subquery as a standalone query since it realizes that it is going to have to end up processing the entire table anyway due to the lack of a filter on the outer query.  In effect executing "table1 JOIN (table2 subquery) ON (f3 = field7)"​.

David J.


Yes, for each row of table1, table2 is being scanned, to find all the entries that satisfy table1.field7 = table2.f3. Sounds that a really heavy task. I guess I should avoid it, right?

BTW, Tom, this is the query with all the parentheses/brackets

SELECT value1,value2,value3,value4, value5, hstore(ARRAY['field9', 'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9, field10, field11, field12, field13, field14]) as metadata, value7, (select array((select row(f1, f2) from table2 p where p.f3 = field7))) as values_array FROM table1
 
Oh, and sorry for the top posting!


--
Jorge Arevalo
Freelance developer

http://about.me/jorgeas80

В списке pgsql-general по дате отправления:

Предыдущее
От: David Johnston
Дата:
Сообщение: Re: Query optimization
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Query optimization