Re: extremly bad select performance on huge table

Поиск
Список
Период
Сортировка
От Björn Wittich
Тема Re: extremly bad select performance on huge table
Дата
Msg-id 54473B2E.2080007@gmx.de
обсуждение исходный текст
Ответ на Re: extremly bad select performance on huge table  (Igor Neyman <ineyman@perceptron.com>)
Ответы Re: extremly bad select performance on huge table  (Igor Neyman <ineyman@perceptron.com>)
Re: extremly bad select performance on huge table  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-performance
Hi Igor,

that was also my assumption, but unfortunately this isn't true.
I am using the explain analyze.

Example which is fast "explain analyze select value from smallertable inner join myhugetable ON smallertable.mycolumn =
myhugetable.mycolumn"

130 - 140 sec

Example which is fast "explain analyze select value,c1 from smallertable inner join myhugetable ON
smallertable.mycolumn= myhugetable.mycolumn" 


does not complete after several hours although the c1 coulmn should only
be relevant for retrieval.

Comparing the explain comparison of both statements gave me a hint:

adding the c1 column changes the query planner to make a sequential scan
on myhugetable as well as on smallertable. This is much slower.

When I set enable_seqscan=false the queryplanner shows the same query
plan for both statements but the statement including the c1 column does
not complete after several hours.

How can this be explained?

I do not want the db server to prepare the whole query result at once,
my intention is that the asynchronous retrieval starts as fast as possible.

Thanks
Björn




>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Björn
Wittich
> Sent: Tuesday, October 21, 2014 3:32 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] extremly bad select performance on huge table
>
> Hi Tom and Igor,
>
> thanks for your help. With the reindex the select query running time was reduced from 5200 sec to 130 sec.
Impressive!
>
> Even a join on this table is now fast.
>
> Unfortunately, there is now another problem: The table in my example has
> 500 columns which I want to retrieve with my join command.
>
> Example which is fast "select value from smallertable inner join myhugetable ON smallertable.mycolumn =
myhugetable.mycolumn"
>
> Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner join myhugetable ON
smallertable.mycolumn= myhugetable.mycolumn" 
>
>
> Which is the number of columns to fetch so bad ? Which action is done in
> the db system when querying this via pgadmin? I think that there is no
> real retrieval included, why is the number of additional columns so bad
> for the join performance?
>
>> =?ISO-8859-15?Q?Bj=F6rn_Wittich?= <Bjoern_Wittich@gmx.de> writes:
>>> Here is the explain (analyze,buffers) select mycolumn from myhugetable
>>> "Index Only Scan using myprimkey on myhugetable  (cost=0.00..8224444.82
>>> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999
>>> loops=1)"
>>> "  Heap Fetches: 356861"
>>> "  Buffers: shared hit=71799472 read=613813"
>>> "Total runtime: 2503009.611 ms"
>> So that works out to about 4 msec per page fetched considering only I/O
>> costs, which is about as good as you're likely to get if the data is
>> sitting on spinning rust.
>>
>> You could potentially make it faster with a VACUUM (to mark all pages
>> all-visible and eliminate the "heap fetches" costs), or a REINDEX
>> (so that the index scan becomes more nearly sequential instead of random
>> access).  However, unless the data is nearly static those will just be
>> temporary fixes: the time will degrade again as you update the table.
>>
>>> Note: This select is just for testing. My final statement will be a join
>>> on this table via the "mycolumn" column.
>> In that case it's probably a waste of time to worry about the performance
>> of this query as such.  In the first place, a join is not likely to use
>> the index at all unless it's fetching a relatively small number of rows,
>> and in the second place it seems unlikely that the join query can use
>> an IndexOnlyScan on this index --- I imagine that the purpose of the join
>> will require fetching additional columns.
>>
>>             regards, tom lane
>>
>>
> Björn,
>
> I think, the timing difference you see between 2 queries is caused by delivering to the front-end (PgAdmin) and
displayingall additional columns that you include in the second query (much bigger amount of data to pass from the db
tothe client). 
> Pretty sure, if you do explain analyze on both queries, you'll see the same timing, because it'll reflect only db
timewithout what's spent on delivering data to the client. 
>
> Regards,
> Igor Neyman
>
>
>



В списке pgsql-performance по дате отправления:

Предыдущее
От: Montana Low
Дата:
Сообщение: Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine
Следующее
От: Montana Low
Дата:
Сообщение: Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine