Обсуждение: Large tables, ORDER BY and sequence/index scans
My problem is that retrieving sorted data from large tables is sometimes very slow in PostgreSQL (8.4.1, FWIW). I typically retrieve the data using cursors, to display them in UI: BEGIN; DECLARE ... SELECT ... ORDER BY ...; FETCH ...; ... On a newly created table of about 10 million rows the FETCH command takes about one minute by default, with additional delay during the contingent following COMMIT command. This is because PostgreSQL uses sequence scan on the table even when there is an index on the ORDER BY column. When I can force PostgreSQL to perform index scan (e.g. by setting one of the options enable_seqscan or enable_sort to off), FETCH response is immediate. PostgreSQL manual explains motivation for sequence scans of large tables and I can understand the motivation. Nevertheless such behavior leads to unacceptably poor performance in my particular case. It is important to get first resulting rows quickly, to display them to the user without delay. My questions are: - What is your experience with using ORDER BY + indexes on large tables? - Is there a way to convince PostgreSQL to use index scans automatically in cases where it is much more efficient? I tried using ANALYZE, VACUUM and SET STATISTICS, but without success. - Is it a good idea to set enable_seqscan or enable_sort to "off" globally in my case? Or to set them to "off" just before working with large tables? My databases contain short and long tables, often connected through REFERENCES or joined into views and many of shorter tables serve as codebooks. Can setting one of the parameters to off have clearly negative impacts? - Is there a recommended way to keep indexes in good shape so that the performance of initial rows retrievals remains good? The large tables are typically append-only tables with a SERIAL primary key. Thanks for any tips.
2010/1/5 Milan Zamazal <pdm@brailcom.org>
Without a WHERE clause postgres will almost always choose a sequential scan.
By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to optimize the whole operation.
IMHO, no, no and yes.
Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree layout. But that's just a half-solution.
tips:
1. get rid of cursors, unless you have a strong need for them (eg. seeking back and forth and updating).
2. switch to "chunked" processing, like this:
SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch] ORDER by idxcol LIMIT 1000;
... and so on.
pozdrawiam,
Filip
My problem is that retrieving sorted data from large tables is sometimes
very slow in PostgreSQL (8.4.1, FWIW).
I typically retrieve the data using cursors, to display them in UI:
BEGIN;
DECLARE ... SELECT ... ORDER BY ...;
FETCH ...;
...
On a newly created table of about 10 million rows the FETCH command
takes about one minute by default, with additional delay during the
contingent following COMMIT command. This is because PostgreSQL uses
sequence scan on the table even when there is an index on the ORDER BY
column. When I can force PostgreSQL to perform index scan (e.g. by
setting one of the options enable_seqscan or enable_sort to off), FETCH
response is immediate.
PostgreSQL manual explains motivation for sequence scans of large tables
and I can understand the motivation. Nevertheless such behavior leads
to unacceptably poor performance in my particular case. It is important
to get first resulting rows quickly, to display them to the user without
delay.
My questions are:
- What is your experience with using ORDER BY + indexes on large tables?
Without a WHERE clause postgres will almost always choose a sequential scan.
- Is there a way to convince PostgreSQL to use index scans automatically
in cases where it is much more efficient? I tried using ANALYZE,
VACUUM and SET STATISTICS, but without success.
By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to optimize the whole operation.
- Is it a good idea to set enable_seqscan or enable_sort to "off"
globally in my case? Or to set them to "off" just before working with
large tables? My databases contain short and long tables, often
connected through REFERENCES or joined into views and many of shorter
tables serve as codebooks. Can setting one of the parameters to off
have clearly negative impacts?
IMHO, no, no and yes.
- Is there a recommended way to keep indexes in good shape so that the
performance of initial rows retrievals remains good? The large tables
are typically append-only tables with a SERIAL primary key.
Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree layout. But that's just a half-solution.
Thanks for any tips.
tips:
1. get rid of cursors, unless you have a strong need for them (eg. seeking back and forth and updating).
2. switch to "chunked" processing, like this:
SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch] ORDER by idxcol LIMIT 1000;
... and so on.
pozdrawiam,
Filip
--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/
Hello please, send explain result postgres=# explain analyze declare x cursor for select * from foo; QUERY PLAN ------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (actual time=0.010..0.012 rows=2 loops=1) Total runtime: 0.033 ms (2 rows) regards Pavel Stehule 2010/1/5 Milan Zamazal <pdm@brailcom.org>: > My problem is that retrieving sorted data from large tables is sometimes > very slow in PostgreSQL (8.4.1, FWIW). > > I typically retrieve the data using cursors, to display them in UI: > > BEGIN; > DECLARE ... SELECT ... ORDER BY ...; > FETCH ...; > ... > > On a newly created table of about 10 million rows the FETCH command > takes about one minute by default, with additional delay during the > contingent following COMMIT command. This is because PostgreSQL uses > sequence scan on the table even when there is an index on the ORDER BY > column. When I can force PostgreSQL to perform index scan (e.g. by > setting one of the options enable_seqscan or enable_sort to off), FETCH > response is immediate. > > PostgreSQL manual explains motivation for sequence scans of large tables > and I can understand the motivation. Nevertheless such behavior leads > to unacceptably poor performance in my particular case. It is important > to get first resulting rows quickly, to display them to the user without > delay. > > My questions are: > > - What is your experience with using ORDER BY + indexes on large tables? > > - Is there a way to convince PostgreSQL to use index scans automatically > in cases where it is much more efficient? I tried using ANALYZE, > VACUUM and SET STATISTICS, but without success. > > - Is it a good idea to set enable_seqscan or enable_sort to "off" > globally in my case? Or to set them to "off" just before working with > large tables? My databases contain short and long tables, often > connected through REFERENCES or joined into views and many of shorter > tables serve as codebooks. Can setting one of the parameters to off > have clearly negative impacts? > > - Is there a recommended way to keep indexes in good shape so that the > performance of initial rows retrievals remains good? The large tables > are typically append-only tables with a SERIAL primary key. > > Thanks for any tips. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
>>>>> "FR" == Filip Rembiałkowski <plk.zuber@gmail.com> writes: FR> 2010/1/5 Milan Zamazal <pdm@brailcom.org> >> - Is it a good idea to set enable_seqscan or enable_sort to "off" >> globally in my case? Or to set them to "off" just before working >> with large tables? My databases contain short and long tables, >> often connected through REFERENCES or joined into views and many >> of shorter tables serve as codebooks. Can setting one of the >> parameters to off have clearly negative impacts? FR> IMHO, no, no and yes. Why (especially the "yes" part)? Any details and/or pointers? FR> 1. get rid of cursors, unless you have a strong need for them FR> (eg. seeking back and forth and updating). Cursors are very convenient for me, because they allow easy browsing data in the user interface (fetching limited sets of rows while seeking forward and backward) and they prevent contingent seeking and other troubles when concurrent updates happen. FR> 2. switch to "chunked" processing, like this: FR> SELECT * FROM bigtable ORDER by idxcol LIMIT 1000; FR> (process the records) FR> SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch] FR> ORDER by idxcol LIMIT 1000; FR> ... and so on. Not counting the convenience of cursors, this wouldn't work as the values in idxcol needn't be unique. Thanks, Milan Zamazal
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: PS> please, send explain result For ~ 10 million rows table: explain declare c cursor for select * from foo2 order by value; QUERY PLAN ---------------------------------------------------------------------- Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) Sort Key: value -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (3 rows) For ~ 1 million rows table, containing the first million rows from foo2 (`value' column contains random integer data): explain declare c cursor for select * from foo order by value; QUERY PLAN ----------------------------------------------------------------------------------- Index Scan using foo_value_idx on foo (cost=0.00..47604.02 rows=999999 width=10) (1 row) When seqscan is disabled for the 10 million rows table: set enable_seqscan = off; explain declare c cursor for select * from foo2 order by value; QUERY PLAN ----------------------------------------------------------------------------------------- Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10) (1 row) Regards, Milan Zamazal
Milan Zamazal wrote: > My problem is that retrieving sorted data from large tables > is sometimes > very slow in PostgreSQL (8.4.1, FWIW). > > I typically retrieve the data using cursors, to display them in UI: > > BEGIN; > DECLARE ... SELECT ... ORDER BY ...; > FETCH ...; > ... > > On a newly created table of about 10 million rows the FETCH command > takes about one minute by default, with additional delay during the > contingent following COMMIT command. This is because PostgreSQL uses > sequence scan on the table even when there is an index on the ORDER BY > column. When I can force PostgreSQL to perform index scan (e.g. by > setting one of the options enable_seqscan or enable_sort to off), FETCH > response is immediate. > > PostgreSQL manual explains motivation for sequence scans of large tables > and I can understand the motivation. Nevertheless such behavior leads > to unacceptably poor performance in my particular case. It is important > to get first resulting rows quickly, to display them to the user without > delay. Did you try to reduce the cursor_tuple_fraction parameter? Yours, Laurenz Albe
2010/1/5 Milan Zamazal <pdm@brailcom.org>: > Cursors are very convenient for me, because they allow easy browsing > data in the user interface (fetching limited sets of rows while seeking > forward and backward) and they prevent contingent seeking and other > troubles when concurrent updates happen. > Sounds to me like a borked app design. Do you seriously need to walk the user through couple of million rows of data ? I mean, databases are designed to work on data, and give you the result back. Use this capability intelligently. -- GJ
please EXPLAIN ANALYZE Pavel 2010/1/5 Milan Zamazal <pdm@brailcom.org>: >>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: > > PS> please, send explain result > > For ~ 10 million rows table: > > explain declare c cursor for select * from foo2 order by value; > QUERY PLAN > ---------------------------------------------------------------------- > Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) > Sort Key: value > -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) > (3 rows) > > For ~ 1 million rows table, containing the first million rows from foo2 > (`value' column contains random integer data): > > explain declare c cursor for select * from foo order by value; > QUERY PLAN > ----------------------------------------------------------------------------------- > Index Scan using foo_value_idx on foo (cost=0.00..47604.02 rows=999999 width=10) > (1 row) > > When seqscan is disabled for the 10 million rows table: > > set enable_seqscan = off; > explain declare c cursor for select * from foo2 order by value; > QUERY PLAN > ----------------------------------------------------------------------------------------- > Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10) > (1 row) > > Regards, > Milan Zamazal > >
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: PS> please EXPLAIN ANALYZE Pavel I see, I'm sorry. Here are the results: set enable_seqscan = on; explain analyze declare c cursor for select * from foo2 order by value; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1) Sort Key: value Sort Method: external merge Disk: 204208kB -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999 loops=1) Total runtime: 53693.012 ms (5 rows) set enable_seqscan = off; explain analyze declare c cursor for select * from foo2 order by value; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10) (actual time=0.225..55055.583rows=9999999 loops=1) Total runtime: 56394.283 ms (2 rows)
>>>>> "AL" == Albe Laurenz <laurenz.albe@wien.gv.at> writes: AL> Did you try to reduce the cursor_tuple_fraction parameter? No, good idea, thanks. It helps. The question is whether it's a good idea to reduce cursor_tuple_fraction universally, without knowing the table size before (and I'm not going to use SELECT COUNT(*) for well known reasons). But I can probably experiment a bit and will see, it looks promising.
>>>>> "GJ" == Grzegorz Jaśkiewicz <gryzman@gmail.com> writes: GJ> Do you seriously need to walk the user through couple of million GJ> rows of data ? Typically not. Data can be of any size. Some tables may be large and I'd like to understand what happens. It is a general data browser.
2010/1/5 Milan Zamazal <pdm@brailcom.org>: >>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: > > PS> please EXPLAIN ANALYZE Pavel > > I see, I'm sorry. Here are the results: Have you original values random_page_cost and seq_page_cost in postgres.conf? it is strange. Pavel > > set enable_seqscan = on; > explain analyze declare c cursor for select * from foo2 order by value; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43432.727..49303.902 rows=9999999 loops=1) > Sort Key: value > Sort Method: external merge Disk: 204208kB > -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.058..1775.928 rows=9999999 loops=1) > Total runtime: 53693.012 ms > (5 rows) > > set enable_seqscan = off; > explain analyze declare c cursor for select * from foo2 order by value; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using foo2_value_idx on foo2 (cost=0.00..32220140.73 rows=9999999 width=10) (actual time=0.225..55055.583rows=9999999 loops=1) > Total runtime: 56394.283 ms > (2 rows) > >
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: PS> Have you original values random_page_cost and seq_page_cost in PS> postgres.conf? Yes. To be sure I uncommented the values in postgresql.conf seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 4.0 # same scale as above and restarted PostgreSQL. The result looks basically the same: explain analyze declare c cursor for select * from foo2 order by value; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1) Sort Key: value Sort Method: external merge Disk: 204208kB -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999 loops=1) Total runtime: 54399.967 ms (5 rows)
2010/1/5 Milan Zamazal <pdm@brailcom.org>: >>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: > > PS> Have you original values random_page_cost and seq_page_cost in > PS> postgres.conf? > > Yes. To be sure I uncommented the values in postgresql.conf > > seq_page_cost = 1.0 # measured on an arbitrary scale > random_page_cost = 4.0 # same scale as above and value efective_cache_size ??? what is CREATE INDEX stament for index? Pavel > > and restarted PostgreSQL. The result looks basically the same: > > explain analyze declare c cursor for select * from foo2 order by value; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Sort (cost=1829429.20..1854429.20 rows=9999999 width=10) (actual time=43709.313..49265.244 rows=9999999 loops=1) > Sort Key: value > Sort Method: external merge Disk: 204208kB > -> Seq Scan on foo2 (cost=0.00..154049.99 rows=9999999 width=10) (actual time=0.072..1760.585 rows=9999999 loops=1) > Total runtime: 54399.967 ms > (5 rows) > >
>>>>> "PS" == Pavel Stehule <pavel.stehule@gmail.com> writes: PS> and value efective_cache_size ??? effective_cache_size = 128MB PS> what is CREATE INDEX stament for index? create index foo2_value_idx on foo2(value);
Milan Zamazal wrote: > PS> and value efective_cache_size ??? > > effective_cache_size = 128MB > thats rather small unless your systme is very memory constrained. assuming postgres is the primary disk IO consumer on this ysstem, take a look at the 'cached' value on TOP or whatever after its been running, thats a good first order estimate of effective_cache_size.... this is often half or more of your physical memory.
>>>>> "JRP" == John R Pierce <pierce@hogranch.com> writes: >> effective_cache_size = 128MB JRP> thats rather small unless your systme is very memory JRP> constrained. assuming postgres is the primary disk IO consumer JRP> on this ysstem, take a look at the cached' value on TOP or JRP> whatever after its been running, thats a good first order JRP> estimate of effective_cache_size.... this is often half or more JRP> of your physical memory. Indeed, increasing effective_cache_size helps, thanks. Thank you all for the tips, I hope I understand the planner behavior for my queries better now. Regards, Milan Zamazal