Re: Query optimization

Поиск
Список
Период
Сортировка
От David Johnston
Тема Re: Query optimization
Дата
Msg-id CAKFQuwZ+Rw+b=by+fOCCy-L0LqssvY9CvQdgeqm-5DFPtjrxmg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query optimization  (Jorge Arevalo <jorgearevalo@libregis.org>)
Список pgsql-general
List preference is to inline post or, at worse, bottom post.  Please do not top post.

On Wed, Oct 29, 2014 at 11:06 AM, Jorge Arevalo <jorgearevalo@libregis.org> wrote:
Hello David, many thanks for your responses,

Sorry for not providing the content of the fill_table3_function, but it just executes 3 insert queries in 3 different tables. And I've checked the time consuming operation is in this query (by the way, there was a little mistake in the name of the fields of the inner select, I've corrected it)

SELECT value1,value2,value3,value4,
value5, hstore(ARRAY['field9', 'field10', 'field11', 'field12', 'field13', 'field14'], ARRAY[field9, field10, field11, field12, field13, field14) as metadata, value7, (select array((select row(f1, f2) from table2 p where p.f3 = field7))) as values_array FROM table1

This is the result of EXPLAIN ANALYZE

                                                                   QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using table1_pkey on table1  (cost=67846.38..395773.45 rows=8419127 width=88) (actual time=7122.704..22670.680 rows=8419127 loops=1)
   InitPlan 2 (returns $1)
     ->  Result  (cost=67846.29..67846.29 rows=1 width=0) (actual time=7009.063..7009.065 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Seq Scan on table2 p  (cost=0.00..67846.29 rows=12689 width=20) (actual time=14.971..5069.840 rows=2537787 loops=1)
                   Filter: (f3 = field7)

So, there's a sequential scan over table2 (in the query to get values_array), instead of a index scan. Could it be because the SELECT returns more than approximately 5-10% of all rows in the table? (I've heard that, under those conditions, a sequential scan is faster than index scan, because the amount of I/O operations required for each row)

​What concerns me here is that the rows estimate on that table2 scan is ~13k while the actual count is ~ 2.5​M; you need to run ANALYZE on both tables and see if your get similar results.  Though given the need for sequential scan regardless (see next comment) the estimate miss likely doesn't affect actual performance or the plan that is chosen.  But it is still worth looking into.
 

Anyway, if I understood well, I should try:

- Avoiding that inner query by using a JOIN instead

I don't know...looking at your explain (and some reasoning) it looks as if it is already doing that for you since there is only a single loop for the InitPlan 1.  This is a little beyond my comfort zone but you've now provided a decent amount of information for others to speculate...though it would help to enable various timings as well and try and run the full query (with the function) in a development environment so that the entire routine can be evaluated.​

 
- Return a composite type instead of an array


​Worth looking into but impossible to recommend without knowing what your make believe fields are and are used for.  More style than performance since I do not know the relative costs of building up an array and creating a composite.

 

В списке pgsql-general по дате отправления:

Предыдущее
От: Jorge Arevalo
Дата:
Сообщение: Re: Query optimization
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Query optimization