Re: extremly bad select performance on huge table
От | Björn Wittich |
---|---|
Тема | Re: extremly bad select performance on huge table |
Дата | |
Msg-id | 54473B2E.2080007@gmx.de обсуждение исходный текст |
Ответ на | Re: extremly bad select performance on huge table (Igor Neyman <ineyman@perceptron.com>) |
Ответы |
Re: extremly bad select performance on huge table
(Igor Neyman <ineyman@perceptron.com>)
Re: extremly bad select performance on huge table (Kevin Grittner <kgrittn@ymail.com>) |
Список | pgsql-performance |
Hi Igor, that was also my assumption, but unfortunately this isn't true. I am using the explain analyze. Example which is fast "explain analyze select value from smallertable inner join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn" 130 - 140 sec Example which is fast "explain analyze select value,c1 from smallertable inner join myhugetable ON smallertable.mycolumn= myhugetable.mycolumn" does not complete after several hours although the c1 coulmn should only be relevant for retrieval. Comparing the explain comparison of both statements gave me a hint: adding the c1 column changes the query planner to make a sequential scan on myhugetable as well as on smallertable. This is much slower. When I set enable_seqscan=false the queryplanner shows the same query plan for both statements but the statement including the c1 column does not complete after several hours. How can this be explained? I do not want the db server to prepare the whole query result at once, my intention is that the asynchronous retrieval starts as fast as possible. Thanks Björn > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Björn Wittich > Sent: Tuesday, October 21, 2014 3:32 PM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] extremly bad select performance on huge table > > Hi Tom and Igor, > > thanks for your help. With the reindex the select query running time was reduced from 5200 sec to 130 sec. Impressive! > > Even a join on this table is now fast. > > Unfortunately, there is now another problem: The table in my example has > 500 columns which I want to retrieve with my join command. > > Example which is fast "select value from smallertable inner join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn" > > Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner join myhugetable ON smallertable.mycolumn= myhugetable.mycolumn" > > > Which is the number of columns to fetch so bad ? Which action is done in > the db system when querying this via pgadmin? I think that there is no > real retrieval included, why is the number of additional columns so bad > for the join performance? > >> =?ISO-8859-15?Q?Bj=F6rn_Wittich?= <Bjoern_Wittich@gmx.de> writes: >>> Here is the explain (analyze,buffers) select mycolumn from myhugetable >>> "Index Only Scan using myprimkey on myhugetable (cost=0.00..8224444.82 >>> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999 >>> loops=1)" >>> " Heap Fetches: 356861" >>> " Buffers: shared hit=71799472 read=613813" >>> "Total runtime: 2503009.611 ms" >> So that works out to about 4 msec per page fetched considering only I/O >> costs, which is about as good as you're likely to get if the data is >> sitting on spinning rust. >> >> You could potentially make it faster with a VACUUM (to mark all pages >> all-visible and eliminate the "heap fetches" costs), or a REINDEX >> (so that the index scan becomes more nearly sequential instead of random >> access). However, unless the data is nearly static those will just be >> temporary fixes: the time will degrade again as you update the table. >> >>> Note: This select is just for testing. My final statement will be a join >>> on this table via the "mycolumn" column. >> In that case it's probably a waste of time to worry about the performance >> of this query as such. In the first place, a join is not likely to use >> the index at all unless it's fetching a relatively small number of rows, >> and in the second place it seems unlikely that the join query can use >> an IndexOnlyScan on this index --- I imagine that the purpose of the join >> will require fetching additional columns. >> >> regards, tom lane >> >> > Björn, > > I think, the timing difference you see between 2 queries is caused by delivering to the front-end (PgAdmin) and displayingall additional columns that you include in the second query (much bigger amount of data to pass from the db tothe client). > Pretty sure, if you do explain analyze on both queries, you'll see the same timing, because it'll reflect only db timewithout what's spent on delivering data to the client. > > Regards, > Igor Neyman > > >
В списке pgsql-performance по дате отправления:
Предыдущее
От: Montana LowДата:
Сообщение: Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine
Следующее
От: Montana LowДата:
Сообщение: Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine