Обсуждение: Slow query with a lot of data
Hi, I run this query: select max(a."user"), b.category, count(1) from result a, domain_categories b where a."domain" = b."domain" group by b.category; the table result contains all websites a user visited. And the table domain_categories contains all categories a domain is in. result has 20 Mio rows and domain_categories has about 12 Mio. There are 500.000 different users. I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain. explain analyze says (limited to one user with id 1337): "HashAggregate (cost=2441577.16..2441614.72 rows=2504 width=8) (actual time=94667.335..94671.508 rows=3361 loops=1)" " -> Merge Join (cost=2119158.02..2334105.00 rows=14329622 width=8) (actual time=63559.938..94621.557 rows=36308 loops=1)" " Merge Cond: (a.domain = b.domain)" " -> Sort (cost=395.52..405.49 rows=3985 width=8) (actual time=0.189..0.211 rows=19 loops=1)" " Sort Key: a.domain" " Sort Method: quicksort Memory: 27kB" " -> Index Scan using result_user_idx on result a (cost=0.00..157.21 rows=3985 width=8) (actual time=0.027..0.108 rows=61 loops=1)" " Index Cond: ("user" = 1337)" " -> Materialize (cost=2118752.28..2270064.64 rows=12104989 width=8) (actual time=46460.599..82336.116 rows=12123161 loops=1)" " -> Sort (cost=2118752.28..2149014.75 rows=12104989 width=8) (actual time=46460.592..59595.851 rows=12104989 loops=1)" " Sort Key: b.domain" " Sort Method: external sort Disk: 283992kB" " -> Seq Scan on domain_categories b (cost=0.00..198151.89 rows=12104989 width=8) (actual time=14.352..22572.869 rows=12104989 loops=1)" "Total runtime: 94817.058 ms" This is running on a pretty small server with 1gb of ram and a slow sata hd. Shared_buffers is 312mb, max_fsm_pages = 153600. Everything else is commented out. Postgresql v8.3.3. Operating system Ubuntu 8.04. It would be great if someone could help improve this query. This is for a research project at my university. Thanks in advance, Moritz
On Mon, 18 Aug 2008, Moritz Onken wrote: > I have indexes on result.domain, domain_categories.domain, result.user, > domain_categories.category. Clustered result on user and domain_categories on > domain. > " -> Materialize (cost=2118752.28..2270064.64 rows=12104989 width=8) > (actual time=46460.599..82336.116 rows=12123161 loops=1)" > " -> Sort (cost=2118752.28..2149014.75 rows=12104989 width=8) > (actual time=46460.592..59595.851 rows=12104989 loops=1)" > " Sort Key: b.domain" > " Sort Method: external sort Disk: 283992kB" > " -> Seq Scan on domain_categories b > (cost=0.00..198151.89 rows=12104989 width=8) (actual time=14.352..22572.869 > rows=12104989 loops=1)" This is weird, given you say you have clustered domain_categories on domain. Have you analysed? You should be able to run: EXPLAIN SELECT * from domain_categories ORDER BY domain and have it say "Index scan" instead of "Seq Scan followed by disc sort)". Matthew -- Patron: "I am looking for a globe of the earth." Librarian: "We have a table-top model over here." Patron: "No, that's not good enough. Don't you have a life-size?" Librarian: (pause) "Yes, but it's in use right now."
Am 18.08.2008 um 16:30 schrieb Matthew Wakeling: > On Mon, 18 Aug 2008, Moritz Onken wrote: >> I have indexes on result.domain, domain_categories.domain, >> result.user, domain_categories.category. Clustered result on user >> and domain_categories on domain. > >> " -> Materialize (cost=2118752.28..2270064.64 >> rows=12104989 width=8) (actual time=46460.599..82336.116 >> rows=12123161 loops=1)" >> " -> Sort (cost=2118752.28..2149014.75 rows=12104989 >> width=8) (actual time=46460.592..59595.851 rows=12104989 loops=1)" >> " Sort Key: b.domain" >> " Sort Method: external sort Disk: 283992kB" >> " -> Seq Scan on domain_categories b >> (cost=0.00..198151.89 rows=12104989 width=8) (actual >> time=14.352..22572.869 rows=12104989 loops=1)" > > This is weird, given you say you have clustered domain_categories on > domain. Have you analysed? You should be able to run: > > EXPLAIN SELECT * from domain_categories ORDER BY domain > > and have it say "Index scan" instead of "Seq Scan followed by disc > sort)". > > Matthew > Thanks, the index was created but I forgot to run analyze again on that table. I had a little mistake in my previous sql query. The corrected version is this: explain analyze select a."user", b.category, count(1) from result a, domain_categories b where a."domain" = b."domain" and a."user" = 1337 group by a."user", b.category; (notice the additional group by column). explain analyze: "HashAggregate (cost=817397.78..817428.92 rows=2491 width=8) (actual time=42874.339..42878.419 rows=3361 loops=1)" " -> Merge Join (cost=748.47..674365.50 rows=19070970 width=8) (actual time=15702.449..42829.388 rows=36308 loops=1)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..391453.79 rows=12105014 width=8) (actual time=39.018..30166.349 rows=12104989 loops=1)" " -> Sort (cost=395.52..405.49 rows=3985 width=8) (actual time=0.188..32.345 rows=36309 loops=1)" " Sort Key: a.domain" " Sort Method: quicksort Memory: 27kB" " -> Index Scan using result_user_idx on result a (cost=0.00..157.21 rows=3985 width=8) (actual time=0.021..0.101 rows=61 loops=1)" " Index Cond: ("user" = 1337)" "Total runtime: 42881.382 ms" This is still very slow...
On Mon, 18 Aug 2008, Moritz Onken wrote: > "HashAggregate (cost=817397.78..817428.92 rows=2491 width=8) (actual time=42874.339..42878.419 rows=3361 loops=1)" > " -> Merge Join (cost=748.47..674365.50 rows=19070970 width=8) (actual > time=15702.449..42829.388 rows=36308 loops=1)" > " Merge Cond: (b.domain = a.domain)" > " -> Index Scan using domain_categories_domain on domain_categories b > (cost=0.00..391453.79 rows=12105014 width=8)(actual time=39.018..30166.349 > rows=12104989 loops=1)" > " -> Sort (cost=395.52..405.49 rows=3985 width=8) (actual > time=0.188..32.345 rows=36309 loops=1)" > " Sort Key: a.domain" > " Sort Method: quicksort Memory: 27kB" > " -> Index Scan using result_user_idx on result a > (cost=0.00..157.21 rows=3985 width=8) (actual time=0.021..0.101rows=61 > loops=1)" > " Index Cond: ("user" = 1337)" > "Total runtime: 42881.382 ms" > > This is still very slow... Well, you're getting the database to read the entire contents of the domain_categories table in order. That's 12 million rows - a fair amount of work. You may find that removing the "user = 1337" constraint doesn't make the query much slower - that's where you get a big win by clustering on domain. You might also want to cluster the results table on domain. If you want the results for just one user, it would be very helpful to have a user column on the domain_categories table, and an index on that column. However, that will slow down the query for all users a little. Matthew -- For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken
>> > > Well, you're getting the database to read the entire contents of the > domain_categories table in order. That's 12 million rows - a fair > amount of work. > > You may find that removing the "user = 1337" constraint doesn't make > the query much slower - that's where you get a big win by clustering > on domain. You might also want to cluster the results table on domain. Running the query for more than one user is indeed not much slower. That's what I need. I'm clustering the results table on domain right now. But why is this better than clustering it on "user"? > > > If you want the results for just one user, it would be very helpful > to have a user column on the domain_categories table, and an index > on that column. However, that will slow down the query for all users > a little. A row in domain_categories can belong to more than one user. But I don't need to run this query for only one user anyway. Thanks so far,
On Mon, 18 Aug 2008, Moritz Onken wrote: > Running the query for more than one user is indeed not much slower. That's > what I need. I'm clustering the results table on domain right now. But why is > this better than clustering it on "user"? The reason is the way that the merge join algorithm works. What it does is takes two tables, and sorts them both by the join fields. Then it can stream through both tables producing results as it goes. It's the best join algorithm, but it does require both tables to be sorted by the same thing, which is domain in this case. The aggregating on user happens after the join has been done, and the hash aggregate can accept the users in random order. If you look at your last EXPLAIN, see that it has to sort the result table on domain, although it can read the domain_categories in domain order due to the clustered index. "HashAggregate " -> Merge Join " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b " -> Sort " Sort Key: a.domain" " Sort Method: quicksort Memory: 27kB" " -> Index Scan using result_user_idx on result a " Index Cond: ("user" = 1337)" Without the user restriction and re-clustering, this should become: "HashAggregate " -> Merge Join " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b " -> Index Scan using result_domain on result a Matthew -- Vacuums are nothings. We only mention them to let them know we know they're there.
Am 18.08.2008 um 18:05 schrieb Matthew Wakeling: > On Mon, 18 Aug 2008, Moritz Onken wrote: >> Running the query for more than one user is indeed not much slower. >> That's what I need. I'm clustering the results table on domain >> right now. But why is this better than clustering it on "user"? > > The reason is the way that the merge join algorithm works. What it > does is takes two tables, and sorts them both by the join fields. > Then it can stream through both tables producing results as it goes. > It's the best join algorithm, but it does require both tables to be > sorted by the same thing, which is domain in this case. The > aggregating on user happens after the join has been done, and the > hash aggregate can accept the users in random order. > > If you look at your last EXPLAIN, see that it has to sort the result > table on domain, although it can read the domain_categories in > domain order due to the clustered index. explain select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" group by a."user", b.category; "GroupAggregate (cost=21400443313.69..22050401897.13 rows=35049240 width=12)" " -> Sort (cost=21400443313.69..21562757713.35 rows=64925759864 width=12)" " Sort Key: a."user", b.category" " -> Merge Join (cost=4000210.40..863834009.08 rows=64925759864 width=12)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..391453.79 rows=12105014 width=12)" " -> Materialize (cost=3999931.73..4253766.93 rows=20306816 width=8)" " -> Sort (cost=3999931.73..4050698.77 rows=20306816 width=8)" " Sort Key: a.domain" " -> Seq Scan on result a (cost=0.00..424609.16 rows=20306816 width=8)" Both results and domain_categories are clustered on domain and analyzed. It took 50 minutes to run this query for 280 users ("and "user" IN ([280 ids])"), 78000 rows were returned and stored in a table. Is this reasonable? Why is it still sorting on domain? I thought the clustering should prevent the planner from doing this? moritz
On Tue, 19 Aug 2008, Moritz Onken wrote: > explain select > a."user", b.category, sum(1.0/b.cat_count)::float > from result a, domain_categories b > where a."domain" = b."domain" > group by a."user", b.category; > Both results and domain_categories are clustered on domain and analyzed. > Why is it still sorting on domain? I thought the clustering should prevent > the planner from doing this? As far as I can tell, it should. If it is clustered on an index on domain, and then analysed, it should no longer have to sort on domain. Could you post here the results of running: select * from pg_stats where attname = 'domain'; > It took 50 minutes to run this query for 280 users ("and "user" IN ([280 > ids])"), 78000 rows were returned and stored in a table. Is this reasonable? Sounds like an awfully long time to me. Also, I think restricting it to 280 users is probably not making it any faster. Matthew -- It's one of those irregular verbs - "I have an independent mind," "You are an eccentric," "He is round the twist." -- Bernard Woolly, Yes Prime Minister
>> > As far as I can tell, it should. If it is clustered on an index on > domain, and then analysed, it should no longer have to sort on domain. > > Could you post here the results of running: > > select * from pg_stats where attname = 'domain'; > schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation public | result | domain | 0 | 4 | 1642 | {3491378,3213829,3316634,3013831,3062500,3242775,3290846,3171997,3412018,3454092 } | {0.352333,0.021,0.01,0.00766667,0.00566667,0.00533333,0.00533333,0.005,0.00266667,0.00266667 } | {3001780,3031753,3075043,3129688,3176566,3230067,3286784,3341445,3386233,3444374,3491203 } | 1 No idea what that means :) >> > > Sounds like an awfully long time to me. Also, I think restricting it > to 280 users is probably not making it any faster. If I hadn't restricted it to 280 users it would have run ~350days... Thanks for your help! moritz
On Tue, 19 Aug 2008, Moritz Onken wrote: > tablename | attname | n_distinct | correlation > result | domain | 1642 | 1 Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no idea why it is sorting the entire table. What happens when you run EXPLAIN SELECT * FROM result ORDER BY domain? >> Sounds like an awfully long time to me. Also, I think restricting it to 280 >> users is probably not making it any faster. > > If I hadn't restricted it to 280 users it would have run ~350days... What makes you say that? Perhaps you could post EXPLAINs of both of the queries. Matthew -- What goes up must come down. Ask any system administrator.
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling: > On Tue, 19 Aug 2008, Moritz Onken wrote: >> tablename | attname | n_distinct | correlation >> result | domain | 1642 | 1 > > Well, the important thing is the correlation, which is 1, indicating > that Postgres knows that the table is clustered. So I have no idea > why it is sorting the entire table. > > What happens when you run EXPLAIN SELECT * FROM result ORDER BY > domain? > "Index Scan using result_domain_idx on result (cost=0.00..748720.72 rows=20306816 width=49)" ... as it should be. >>> Sounds like an awfully long time to me. Also, I think restricting >>> it to 280 users is probably not making it any faster. >> >> If I hadn't restricted it to 280 users it would have run ~350days... > > What makes you say that? Perhaps you could post EXPLAINs of both of > the queries. > > Matthew That was just a guess. The query needs to retrieve the data for about 50,000 users. But it should be fast if I don't retrieve the data for specific users but let in run through all rows. explain insert into setup1 (select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" and b.depth < 4 and a.results > 100 and a."user" < 30000 group by a."user", b.category); "GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)" " -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)" " Sort Key: a."user", b.category" " -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)" " Filter: (depth < 4)" " -> Materialize (cost=148954.16..149446.36 rows=39376 width=8)" " -> Sort (cost=148954.16..149052.60 rows=39376 width=8)" " Sort Key: a.domain" " -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)" " Recheck Cond: ("user" < 30000)" " Filter: (results > 100)" " -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)" " Index Cond: ("user" < 30000)" This query limits the number of users to 215 and this query took about 50 minutes. I could create to temp tables which have only those records which I need for this query. Would this be a good idea? moritz
What is your work_mem set to? The default?
Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all honesty, it may be faster to overflow to OS swap space than sort too many rows, but ONLY if it changes the plan to a significantly more efficient one.
Simply type
'SET work_mem = '500MB';
before running your explain. Set it to even more RAM if you have the space for this experiment.
In my experience the performance of aggregates on large tables is significantly affected by work_mem and the optimizer will chosse poorly without enough of it. It will rule out plans that may be fast enough when overflowing to disk in preference to colossal sized sorts (which likely also overflow to disk but take hours or days).
Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all honesty, it may be faster to overflow to OS swap space than sort too many rows, but ONLY if it changes the plan to a significantly more efficient one.
Simply type
'SET work_mem = '500MB';
before running your explain. Set it to even more RAM if you have the space for this experiment.
In my experience the performance of aggregates on large tables is significantly affected by work_mem and the optimizer will chosse poorly without enough of it. It will rule out plans that may be fast enough when overflowing to disk in preference to colossal sized sorts (which likely also overflow to disk but take hours or days).
On Tue, Aug 19, 2008 at 5:47 AM, Moritz Onken <onken@houseofdesign.de> wrote:
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling:"Index Scan using result_domain_idx on result (cost=0.00..748720.72 rows=20306816 width=49)"On Tue, 19 Aug 2008, Moritz Onken wrote:tablename | attname | n_distinct | correlation
result | domain | 1642 | 1
Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no idea why it is sorting the entire table.
What happens when you run EXPLAIN SELECT * FROM result ORDER BY domain?
... as it should be.That was just a guess. The query needs to retrieve the data for about 50,000 users. But it should be fast if I don't retrieve the data for specific users but let in run through all rows.Sounds like an awfully long time to me. Also, I think restricting it to 280 users is probably not making it any faster.
If I hadn't restricted it to 280 users it would have run ~350days...
What makes you say that? Perhaps you could post EXPLAINs of both of the queries.
Matthew
explain insert into setup1 (selectand b.depth < 4
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and a.results > 100
and a."user" < 30000
group by a."user", b.category);
"GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
" Sort Key: a."user", b.category"" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Merge Cond: (b.domain = a.domain)"
" Filter: (depth < 4)"
" -> Materialize (cost=148954.16..149446.36 rows=39376 width=8)"
" -> Sort (cost=148954.16..149052.60 rows=39376 width=8)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"
This query limits the number of users to 215 and this query took about 50 minutes.
I could create to temp tables which have only those records which I need for this query. Would this be a good idea?
moritz
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Am 19.08.2008 um 16:49 schrieb Scott Carey: > What is your work_mem set to? The default? > > Try increasing it significantly if you have the RAM and seeing if > that affects the explain plan. You may even want to set it to a > number larger than the RAM you have just to see what happens. In > all honesty, it may be faster to overflow to OS swap space than sort > too many rows, but ONLY if it changes the plan to a significantly > more efficient one. > > Simply type > 'SET work_mem = '500MB'; > before running your explain. Set it to even more RAM if you have > the space for this experiment. > > In my experience the performance of aggregates on large tables is > significantly affected by work_mem and the optimizer will chosse > poorly without enough of it. It will rule out plans that may be > fast enough when overflowing to disk in preference to colossal sized > sorts (which likely also overflow to disk but take hours or days). Thanks for that advice but the explain is not different :-( moritz
Am 19.08.2008 um 17:23 schrieb Moritz Onken: > > Am 19.08.2008 um 16:49 schrieb Scott Carey: > >> What is your work_mem set to? The default? >> >> Try increasing it significantly if you have the RAM and seeing if >> that affects the explain plan. You may even want to set it to a >> number larger than the RAM you have just to see what happens. In >> all honesty, it may be faster to overflow to OS swap space than >> sort too many rows, but ONLY if it changes the plan to a >> significantly more efficient one. >> >> Simply type >> 'SET work_mem = '500MB'; >> before running your explain. Set it to even more RAM if you have >> the space for this experiment. >> >> In my experience the performance of aggregates on large tables is >> significantly affected by work_mem and the optimizer will chosse >> poorly without enough of it. It will rule out plans that may be >> fast enough when overflowing to disk in preference to colossal >> sized sorts (which likely also overflow to disk but take hours or >> days). > > Thanks for that advice but the explain is not different :-( > > moritz > > -- Hi, I started the query with work_mem set to 3000MB. The explain output didn't change but it runs now much faster (about 10 times). The swap isn't used. How can you explain that? moritz
Moritz Onken írta: > > Am 19.08.2008 um 17:23 schrieb Moritz Onken: > >> >> Am 19.08.2008 um 16:49 schrieb Scott Carey: >> >>> What is your work_mem set to? The default? >>> >>> Try increasing it significantly if you have the RAM and seeing if >>> that affects the explain plan. You may even want to set it to a >>> number larger than the RAM you have just to see what happens. In >>> all honesty, it may be faster to overflow to OS swap space than sort >>> too many rows, but ONLY if it changes the plan to a significantly >>> more efficient one. >>> >>> Simply type >>> 'SET work_mem = '500MB'; >>> before running your explain. Set it to even more RAM if you have >>> the space for this experiment. >>> >>> In my experience the performance of aggregates on large tables is >>> significantly affected by work_mem and the optimizer will chosse >>> poorly without enough of it. It will rule out plans that may be >>> fast enough when overflowing to disk in preference to colossal sized >>> sorts (which likely also overflow to disk but take hours or days). >> >> Thanks for that advice but the explain is not different :-( >> >> moritz >> >> -- > > Hi, > > I started the query with work_mem set to 3000MB. The explain output > didn't change but it runs now much faster (about 10 times). The swap > isn't used. How can you explain that? $ cat /proc/sys/vm/overcommit_memory 0 $ less linux/Documentation/filesystems/proc.txt ... overcommit_memory ----------------- Controls overcommit of system memory, possibly allowing processes to allocate (but not use) more memory than is actually available. 0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default. 1 - Always overcommit. Appropriate for some scientific applications. 2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap plus a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while attempting to use already-allocated memory but will receive errors on memory allocation as appropriate. ... I guess you are running on 64-bit because "obvious overcommit" exceeds 3GB already. Or you're running 32-bit and overcommit_memory=1 on your system. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
More work_mem will make the sort fit more in memory and less on disk, even with the same query plan.
On Wed, Aug 20, 2008 at 12:54 AM, Moritz Onken <onken@houseofdesign.de> wrote:
Am 19.08.2008 um 17:23 schrieb Moritz Onken:Hi,
Am 19.08.2008 um 16:49 schrieb Scott Carey:What is your work_mem set to? The default?
Try increasing it significantly if you have the RAM and seeing if that affects the explain plan. You may even want to set it to a number larger than the RAM you have just to see what happens. In all honesty, it may be faster to overflow to OS swap space than sort too many rows, but ONLY if it changes the plan to a significantly more efficient one.
Simply type
'SET work_mem = '500MB';
before running your explain. Set it to even more RAM if you have the space for this experiment.
In my experience the performance of aggregates on large tables is significantly affected by work_mem and the optimizer will chosse poorly without enough of it. It will rule out plans that may be fast enough when overflowing to disk in preference to colossal sized sorts (which likely also overflow to disk but take hours or days).
Thanks for that advice but the explain is not different :-(
moritz
--
I started the query with work_mem set to 3000MB. The explain output didn't change but it runs now much faster (about 10 times). The swap isn't used. How can you explain that?
moritz
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Ok, so the problem boils down to the sort at the end.
The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should hash_aggregate if it thinks there is enough space to do so.
The query planner is going to choose the sort > agg over the hash-agg if it estimates the total number of resulting rows to be large enough so that the hash won't fit in work_mem. However, there seems to be another factor here based on this:
GroupAggregate (cost=11745105.66..12277396.
select
and a.results > 100
and a."user" < 30000 ) c
group by c."user", c.category
It shouldn't make a difference, but I've seen things like this help before so its worth a try. Make sure work_mem is reasonably sized for this test.
Another thing that won't be that fast, but may avoid the sort, is to select the subselection above into a temporary table, analyze it, and then do the outer select. Make sure your settings for temporary space (temp_buffers in 8.3) are large enough for the intermediate results (700MB should do it). That won't be that fast, but it will most likely be faster than sorting 50 million + rows. There are lots of problems with this approach but it may be worth the experiment.
The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should hash_aggregate if it thinks there is enough space to do so.
The query planner is going to choose the sort > agg over the hash-agg if it estimates the total number of resulting rows to be large enough so that the hash won't fit in work_mem. However, there seems to be another factor here based on this:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
" Sort Key: a."user", b.category"
" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
" Merge Cond: (b.domain = a.domain)"
The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is the wrong way to aggregate those results no matter how much work_mem you have unless I'm completely missing something...
You can try rearranging the query just to see if you can work around this. What happens if you compare the explain on:
select
and a.results > 100
and a."user" < 30000
group by a."user", b.category
to
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
" Sort Key: a."user", b.category"
" Merge Cond: (b.domain = a.domain)"
The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is the wrong way to aggregate those results no matter how much work_mem you have unless I'm completely missing something...
You can try rearranging the query just to see if you can work around this. What happens if you compare the explain on:
select
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4from result a, domain_categories b
where a."domain" = b."domain"
and a.results > 100
and a."user" < 30000
group by a."user", b.category
to
select
c."user", c.category, sum(1.0/c.cat_count)::float
from (select a."user", b.category, b.cat_count
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4from (select a."user", b.category, b.cat_count
from result a, domain_categories b
where a."domain" = b."domain"
and a.results > 100
and a."user" < 30000 ) c
group by c."user", c.category
It shouldn't make a difference, but I've seen things like this help before so its worth a try. Make sure work_mem is reasonably sized for this test.
Another thing that won't be that fast, but may avoid the sort, is to select the subselection above into a temporary table, analyze it, and then do the outer select. Make sure your settings for temporary space (temp_buffers in 8.3) are large enough for the intermediate results (700MB should do it). That won't be that fast, but it will most likely be faster than sorting 50 million + rows. There are lots of problems with this approach but it may be worth the experiment.
On Tue, Aug 19, 2008 at 5:47 AM, Moritz Onken <onken@houseofdesign.de> wrote:
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling:"Index Scan using result_domain_idx on result (cost=0.00..748720.72 rows=20306816 width=49)"On Tue, 19 Aug 2008, Moritz Onken wrote:tablename | attname | n_distinct | correlation
result | domain | 1642 | 1
Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no idea why it is sorting the entire table.
What happens when you run EXPLAIN SELECT * FROM result ORDER BY domain?
... as it should be.That was just a guess. The query needs to retrieve the data for about 50,000 users. But it should be fast if I don't retrieve the data for specific users but let in run through all rows.Sounds like an awfully long time to me. Also, I think restricting it to 280 users is probably not making it any faster.
If I hadn't restricted it to 280 users it would have run ~350days...
What makes you say that? Perhaps you could post EXPLAINs of both of the queries.
Matthew
explain insert into setup1 (selectand b.depth < 4
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and a.results > 100
and a."user" < 30000
group by a."user", b.category);
"GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
" Sort Key: a."user", b.category"" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Merge Cond: (b.domain = a.domain)"
" Filter: (depth < 4)"
" -> Materialize (cost=148954.16..149446.36 rows=39376 width=8)"
" -> Sort (cost=148954.16..149052.60 rows=39376 width=8)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"
This query limits the number of users to 215 and this query took about 50 minutes.
I could create to temp tables which have only those records which I need for this query. Would this be a good idea?
moritz
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
"Scott Carey" <scott@richrelevance.com> writes: > The planner actually thinks there will only be 28704 rows returned of width > 12. But it chooses to sort 53 million rows before aggregating. Thats > either a bug or there's something else wrong here. That is the wrong way > to aggregate those results no matter how much work_mem you have unless I'm > completely missing something... That does look weird. What are the datatypes of the columns being grouped by? Maybe they're not hashable? Another forcing function that prevents use of HashAgg is DISTINCT aggregates, but you don't seem to have any in this query... regards, tom lane
Am 20.08.2008 um 20:06 schrieb Scott Carey: > Ok, so the problem boils down to the sort at the end. > > The query up through the merge join on domain is as fast as its > going to get. The sort at the end however, should not happen > ideally. There are not that many rows returned, and it should > hash_aggregate if it thinks there is enough space to do so. > > The query planner is going to choose the sort > agg over the hash- > agg if it estimates the total number of resulting rows to be large > enough so that the hash won't fit in work_mem. However, there > seems to be another factor here based on this: > > > GroupAggregate (cost=11745105.66..12277396. > 81 rows=28704 width=12)" > " -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)" > > " Sort Key: a."user", b.category" > " -> Merge Join (cost=149241.25..1287278.89 rows=53171707 > width=12)" > > " Merge Cond: (b.domain = a.domain)" > > > The planner actually thinks there will only be 28704 rows returned > of width 12. But it chooses to sort 53 million rows before > aggregating. Thats either a bug or there's something else wrong > here. That is the wrong way to aggregate those results no matter > how much work_mem you have unless I'm completely missing something... > > You can try rearranging the query just to see if you can work around > this. What happens if you compare the explain on: > > select > a."user", b.category, sum(1.0/b.cat_count)::float > from result a, domain_categories b > where a."domain" = b."domain" > and b.depth < 4 > and a.results > 100 > and a."user" < 30000 > group by a."user", b.category > > "HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)" " -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)" " Filter: (depth < 4)" " -> Sort (cost=148415.16..148513.60 rows=39376 width=8)" " Sort Key: a.domain" " -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)" " Recheck Cond: ("user" < 30000)" " Filter: (results > 100)" " -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)" " Index Cond: ("user" < 30000)" > to > > select > c."user", c.category, sum(1.0/c.cat_count)::float > from (select a."user", b.category, b.cat_count > from result a, domain_categories b > where a."domain" = b."domain" > and b.depth < 4 > and a.results > 100 > and a."user" < 30000 ) c > group by c."user", c.category > "HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)" " -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)" " Filter: (depth < 4)" " -> Sort (cost=148415.16..148513.60 rows=39376 width=8)" " Sort Key: a.domain" " -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)" " Recheck Cond: ("user" < 30000)" " Filter: (results > 100)" " -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)" " Index Cond: ("user" < 30000)" > It shouldn't make a difference, but I've seen things like this help > before so its worth a try. Make sure work_mem is reasonably sized > for this test. It's exactly the same. work_mem was set to 3000MB. > > > Another thing that won't be that fast, but may avoid the sort, is to > select the subselection above into a temporary table, analyze it, > and then do the outer select. Make sure your settings for temporary > space (temp_buffers in 8.3) are large enough for the intermediate > results (700MB should do it). That won't be that fast, but it will > most likely be faster than sorting 50 million + rows. There are > lots of problems with this approach but it may be worth the > experiment. > I'll try this. Thanks so far!
Am 20.08.2008 um 20:28 schrieb Tom Lane: > "Scott Carey" <scott@richrelevance.com> writes: >> The planner actually thinks there will only be 28704 rows returned >> of width >> 12. But it chooses to sort 53 million rows before aggregating. >> Thats >> either a bug or there's something else wrong here. That is the >> wrong way >> to aggregate those results no matter how much work_mem you have >> unless I'm >> completely missing something... > > That does look weird. What are the datatypes of the columns being > grouped by? Maybe they're not hashable? > > Another forcing function that prevents use of HashAgg is DISTINCT > aggregates, but you don't seem to have any in this query... > > regards, tom lane The datatypes are both integers. There is no DISTINCT in this query. Thanks anyway!
Am 21.08.2008 um 09:04 schrieb Moritz Onken: > > Am 20.08.2008 um 20:28 schrieb Tom Lane: > >> "Scott Carey" <scott@richrelevance.com> writes: >>> The planner actually thinks there will only be 28704 rows returned >>> of width >>> 12. But it chooses to sort 53 million rows before aggregating. >>> Thats >>> either a bug or there's something else wrong here. That is the >>> wrong way >>> to aggregate those results no matter how much work_mem you have >>> unless I'm >>> completely missing something... >> >> That does look weird. What are the datatypes of the columns being >> grouped by? Maybe they're not hashable? >> >> Another forcing function that prevents use of HashAgg is DISTINCT >> aggregates, but you don't seem to have any in this query... >> >> regards, tom lane > > The datatypes are both integers. There is no DISTINCT in this query. > Thanks anyway! > insert into setup1 (select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" and b.depth < 4 and a.results > 100 group by a."user", b.category); This query inserted a total of 16,000,000 rows and, with work_mem set to 3000mb, took about 24 hours. Any more ideas to speed this up?
It looks to me like the work_mem did have an effect.
Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from.
The top of the plan was:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
-> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
-> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
and now it is:
"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"
-> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
The HashAggregate replaced the Sort followed by GroupAggregate at about 1/10 the cost.
It probably only took the first couple hundred MB of work_mem to do this, or less given that you were at the default originally.
Note how the estimated cost on the latter is 1.6 million, and it is 11 million in the first one.
You won't get a large table aggregate significantly faster than this -- you're asking it to scan through 53 million records and aggregate. An explain analyze will be somewhat instructive to help identify if there is more I/O or CPU bound overall as we can compare the estimated cost with the actual times, but this probably won't get very far.
After that, inserting 16M rows requires rather different tuning and bottleneck identification.
Your earlier queries had a sort followed by group aggregate at the top, and now its a hash-aggregate. So the query plan DID change. That is likely where the first 10x performance gain came from.
The top of the plan was:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
-> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
-> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
and now it is:
"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"
-> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"
Merge Cond: (b.domain = a.domain)"
The HashAggregate replaced the Sort followed by GroupAggregate at about 1/10 the cost.
Note how the estimated cost on the latter is 1.6 million, and it is 11 million in the first one.
You won't get a large table aggregate significantly faster than this -- you're asking it to scan through 53 million records and aggregate. An explain analyze will be somewhat instructive to help identify if there is more I/O or CPU bound overall as we can compare the estimated cost with the actual times, but this probably won't get very far.
After that, inserting 16M rows requires rather different tuning and bottleneck identification.
On Thu, Aug 21, 2008 at 12:03 AM, Moritz Onken <onken@houseofdesign.de> wrote:
Am 20.08.2008 um 20:06 schrieb Scott Carey:"HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"Ok, so the problem boils down to the sort at the end.
The query up through the merge join on domain is as fast as its going to get. The sort at the end however, should not happen ideally. There are not that many rows returned, and it should hash_aggregate if it thinks there is enough space to do so.
The query planner is going to choose the sort > agg over the hash-agg if it estimates the total number of resulting rows to be large enough so that the hash won't fit in work_mem. However, there seems to be another factor here based on this:
GroupAggregate (cost=11745105.66..12277396.
81 rows=28704 width=12)"
" -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)"
" Sort Key: a."user", b.category"
" -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)"
" Merge Cond: (b.domain = a.domain)"
The planner actually thinks there will only be 28704 rows returned of width 12. But it chooses to sort 53 million rows before aggregating. Thats either a bug or there's something else wrong here. That is the wrong way to aggregate those results no matter how much work_mem you have unless I'm completely missing something...
You can try rearranging the query just to see if you can work around this. What happens if you compare the explain on:
select
a."user", b.category, sum(1.0/b.cat_count)::float
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
and a."user" < 30000
group by a."user", b.category
" -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"" -> Sort (cost=148415.16..148513.60 rows=39376 width=8)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Filter: (depth < 4)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)""HashAggregate (cost=1685527.69..1686101.77 rows=28704 width=12)"to
select
c."user", c.category, sum(1.0/c.cat_count)::float
from (select a."user", b.category, b.cat_count
from result a, domain_categories b
where a."domain" = b."domain"
and b.depth < 4
and a.results > 100
and a."user" < 30000 ) c
group by c."user", c.category
" -> Merge Join (cost=148702.25..1286739.89 rows=53171707 width=12)"" -> Sort (cost=148415.16..148513.60 rows=39376 width=8)"
" Merge Cond: (b.domain = a.domain)"
" -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)"
" Filter: (depth < 4)"
" Sort Key: a.domain"
" -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)"
" Recheck Cond: ("user" < 30000)"
" Filter: (results > 100)"
" -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)"
" Index Cond: ("user" < 30000)"It's exactly the same. work_mem was set to 3000MB.It shouldn't make a difference, but I've seen things like this help before so its worth a try. Make sure work_mem is reasonably sized for this test.I'll try this.
Another thing that won't be that fast, but may avoid the sort, is to select the subselection above into a temporary table, analyze it, and then do the outer select. Make sure your settings for temporary space (temp_buffers in 8.3) are large enough for the intermediate results (700MB should do it). That won't be that fast, but it will most likely be faster than sorting 50 million + rows. There are lots of problems with this approach but it may be worth the experiment.
Thanks so far!
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Am 21.08.2008 um 16:39 schrieb Scott Carey: > It looks to me like the work_mem did have an effect. > > Your earlier queries had a sort followed by group aggregate at the > top, and now its a hash-aggregate. So the query plan DID change. > That is likely where the first 10x performance gain came from. But it didn't change as I added the sub select. Thank you guys very much. The speed is now ok and I hope I can finish tihs work soon. But there is another problem. If I run this query without the limitation of the user id, postgres consumes about 150GB of disk space and dies with ERROR: could not write block 25305351 of temporary file: No space left on device After that the avaiable disk space is back to normal. Is this normal? The resulting table (setup1) is not bigger than 1.5 GB. moritz
On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <onken@houseofdesign.de> wrote: > > Am 21.08.2008 um 16:39 schrieb Scott Carey: > >> It looks to me like the work_mem did have an effect. >> >> Your earlier queries had a sort followed by group aggregate at the top, >> and now its a hash-aggregate. So the query plan DID change. That is likely >> where the first 10x performance gain came from. > > But it didn't change as I added the sub select. > Thank you guys very much. The speed is now ok and I hope I can finish tihs > work soon. > > But there is another problem. If I run this query without the limitation of > the user id, postgres consumes about 150GB of disk space and dies with > > ERROR: could not write block 25305351 of temporary file: No space left on > device > > After that the avaiable disk space is back to normal. > > Is this normal? The resulting table (setup1) is not bigger than 1.5 GB. Maybe the result is too big. if you explain the query, you should get an estimate of rows returned. If this is the case, you need to rethink your query or do something like a cursor to browse the result. merlin
Am 21.08.2008 um 19:08 schrieb Merlin Moncure: > On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken > <onken@houseofdesign.de> wrote: >> >> Am 21.08.2008 um 16:39 schrieb Scott Carey: >> >>> It looks to me like the work_mem did have an effect. >>> >>> Your earlier queries had a sort followed by group aggregate at the >>> top, >>> and now its a hash-aggregate. So the query plan DID change. That >>> is likely >>> where the first 10x performance gain came from. >> >> But it didn't change as I added the sub select. >> Thank you guys very much. The speed is now ok and I hope I can >> finish tihs >> work soon. >> >> But there is another problem. If I run this query without the >> limitation of >> the user id, postgres consumes about 150GB of disk space and dies >> with >> >> ERROR: could not write block 25305351 of temporary file: No space >> left on >> device >> >> After that the avaiable disk space is back to normal. >> >> Is this normal? The resulting table (setup1) is not bigger than 1.5 >> GB. > > Maybe the result is too big. if you explain the query, you should get > an estimate of rows returned. If this is the case, you need to > rethink your query or do something like a cursor to browse the result. > > merlin There will be a few million rows. But I don't understand why these rows bloat up so much. If the query is done the new table is about 1 GB in size. But while the query is running it uses >150GB of disk space. moritz
On Fri, Aug 22, 2008 at 2:31 AM, Moritz Onken <onken@houseofdesign.de> wrote: > Am 21.08.2008 um 19:08 schrieb Merlin Moncure: > >> On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken <onken@houseofdesign.de> >> wrote: >>> >>> Am 21.08.2008 um 16:39 schrieb Scott Carey: >>> >>>> It looks to me like the work_mem did have an effect. >>>> >>>> Your earlier queries had a sort followed by group aggregate at the top, >>>> and now its a hash-aggregate. So the query plan DID change. That is >>>> likely >>>> where the first 10x performance gain came from. >>> >>> But it didn't change as I added the sub select. >>> Thank you guys very much. The speed is now ok and I hope I can finish >>> tihs >>> work soon. >>> >>> But there is another problem. If I run this query without the limitation >>> of >>> the user id, postgres consumes about 150GB of disk space and dies with >>> >>> ERROR: could not write block 25305351 of temporary file: No space left >>> on >>> device >>> >>> After that the avaiable disk space is back to normal. >>> >>> Is this normal? The resulting table (setup1) is not bigger than 1.5 GB. >> >> Maybe the result is too big. if you explain the query, you should get >> an estimate of rows returned. If this is the case, you need to >> rethink your query or do something like a cursor to browse the result. >> >> merlin > > There will be a few million rows. But I don't understand why these rows > bloat up so much. If the query is done the new table is about 1 GB in size. > But while the query is running it uses >150GB of disk space. can we see explain? merlin