RE: BUG #16031: Group by returns duplicate groups
От | David Raymond |
---|---|
Тема | RE: BUG #16031: Group by returns duplicate groups |
Дата | |
Msg-id | VI1PR07MB6029A84A61C5A88328B50AC3879D0@VI1PR07MB6029.eurprd07.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Ответы |
Re: BUG #16031: Group by returns duplicate groups
(Andrew Gierth <andrew@tao11.riddles.org.uk>)
|
Список | pgsql-bugs |
psql output for those two queries, along with explain output, pasted below. testing=> set enable_indexscan = off; SET Time: 0.265 ms testing=> set enable_bitmapscan = off; SET Time: 0.236 ms testing=> select count(*) c0, testing-> count(*) filter (where bttextcmp(name,'DK') < 0) c1_lt, testing-> count(*) filter (where bttextcmp(name,'DK') = 0) c1_eq, testing-> count(*) filter (where bttextcmp(name,'DK') > 0) c1_gt, testing-> count(*) filter (where bttextcmp('DK',name) > 0) c2_lt, testing-> count(*) filter (where bttextcmp('DK',name) = 0) c2_eq, testing-> count(*) filter (where bttextcmp('DK',name) < 0) c2_gt testing-> from big_table; c0 | c1_lt | c1_eq | c1_gt | c2_lt | c2_eq | c2_gt -------------+------------+-------+------------+------------+-------+------------ 108,565,086 | 27,900,023 | 25 | 80,665,038 | 27,900,023 | 25 | 80,665,038 (1 row) Time: 311710.895 ms (05:11.711) testing=> explain (analyze, verbose, costs, buffers, timing, summary) select count(*) c0, testing-> count(*) filter (where bttextcmp(name,'DK') < 0) c1_lt, testing-> count(*) filter (where bttextcmp(name,'DK') = 0) c1_eq, testing-> count(*) filter (where bttextcmp(name,'DK') > 0) c1_gt, testing-> count(*) filter (where bttextcmp('DK',name) > 0) c2_lt, testing-> count(*) filter (where bttextcmp('DK',name) = 0) c2_eq, testing-> count(*) filter (where bttextcmp('DK',name) < 0) c2_gt testing-> from big_table; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=4119436.80..4119436.81 rows=1 width=56) (actual time=315815.338..315815.338 rows=1 loops=1) Output: count(*), count(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) < 0)), count(*) FILTER (WHERE (bttextcmp((name)::text,'DK'::text) = 0)), count(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) > 0)), count(*) FILTER(WHERE (bttextcmp('DK'::text, (name)::text) > 0)), count(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) = 0)),count(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) < 0)) Buffers: shared hit=258 read=1517140 -> Gather (cost=4119436.55..4119436.76 rows=2 width=56) (actual time=315814.594..315834.986 rows=3 loops=1) Output: (PARTIAL count(*)), (PARTIAL count(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) < 0))), (PARTIALcount(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) = 0))), (PARTIAL count(*) FILTER (WHERE (bttextcmp((name)::text,'DK'::text) > 0))), (PARTIAL count(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) > 0))),(PARTIAL count(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) = 0))), (PARTIAL count(*) FILTER (WHERE (bttextcmp('DK'::text,(name)::text) < 0))) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=258 read=1517140 -> Partial Aggregate (cost=4118436.55..4118436.56 rows=1 width=56) (actual time=315571.326..315571.326 rows=1loops=3) Output: PARTIAL count(*), PARTIAL count(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) < 0)), PARTIALcount(*) FILTER (WHERE (bttextcmp((name)::text, 'DK'::text) = 0)), PARTIAL count(*) FILTER (WHERE (bttextcmp((name)::text,'DK'::text) > 0)), PARTIAL count(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) > 0)), PARTIALcount(*) FILTER (WHERE (bttextcmp('DK'::text, (name)::text) = 0)), PARTIAL count(*) FILTER (WHERE (bttextcmp('DK'::text,(name)::text) < 0)) Buffers: shared hit=258 read=1517140 Worker 0: actual time=315491.284..315491.284 rows=1 loops=1 Buffers: shared hit=80 read=324165 Worker 1: actual time=315411.542..315411.542 rows=1 loops=1 Buffers: shared hit=81 read=592546 -> Parallel Seq Scan on name_stuff.big_table (cost=0.00..1969752.53 rows=45235453 width=22) (actual time=8.834..245716.269rows=36188362 loops=3) Output: id_1, field_2, name, field_4, field_5, field_6, field_7, field_8, arr_field_1, arr_field_2,arr_field_3, arr_field_4, arr_field_5, arr_field_6, field_15, field_16, id_2 Buffers: shared hit=258 read=1517140 Worker 0: actual time=13.873..213354.668 rows=23076339 loops=1 Buffers: shared hit=80 read=324165 Worker 1: actual time=0.232..260652.945 rows=42461913 loops=1 Buffers: shared hit=81 read=592546 Planning Time: 1.746 ms Execution Time: 315835.098 ms (24 rows) Time: 315860.795 ms (05:15.861) testing=> with sd as (select name, row_number() over (order by name) rnum testing(> from big_table) testing-> select name from sd testing-> where rnum >= (select min(rnum) from sd where name='DK') testing-> and rnum <= (select max(rnum) from sd where name='DK') testing-> and name <> 'DK'; name ----------------------------------- Dk'bus Marine Dk's Auto's Dk's Bar & Grill Dk's Barbers & Stylist Dk's Beach Boutique Dk's Cabinets & Countertops Dk's Cleaning Service Dk's Clothing Dk's Communications Dk's Dancewear & Fitnesswear Dk's Dancewear Boutique Dk's Discount Dance & Fitnesswear DK's Drywall Service DK'S DUSTBUSTERS Dk's Family Five Star Trophies DK's Family Five Star Trophies Dk's Food Mart Dk'S Group Pte. Ltd. Dk's Hair Designs Dk's Hair Happenings Dk's Hair Supply Dk's Home Decor DK's Informática Dk's Janitorial DK's Liquors Dk's Market Dk's Moda Masculina Dk's Nails And Spa DK's Pawn Shop Dk's Pet Grooming DK's Quality Service DK's Restoration Dk's Sports Center Dk's Statuary Dk's Style Hut Dk's Temiskaming Shore Taxi Dk's Towing DK's Travel Dk'Style DK'Z Car Wash Dk- (41 rows) Time: 848889.096 ms (14:08.889) testing=> explain (analyze, verbose, costs, buffers, timing, summary) with sd as (select name, row_number() over (order byname) rnum testing(> from big_table) testing-> select name from sd testing-> where rnum >= (select min(rnum) from sd where name='DK') testing-> and rnum <= (select max(rnum) from sd where name='DK') testing-> and name <> 'DK'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on sd (cost=30560557.18..33546097.10 rows=540111 width=516) (actual time=794958.814..805397.008 rows=41 loops=1) Output: sd.name Filter: ((sd.rnum >= $1) AND (sd.rnum <= $2) AND ((sd.name)::text <> 'DK'::text)) Rows Removed by Filter: 108565045 Buffers: shared hit=386 read=1517012, temp read=3202642 written=2621799 CTE sd -> WindowAgg (cost=23772525.03..25672414.07 rows=108565088 width=30) (actual time=668354.380..753149.449 rows=108565086loops=1) Output: big_table.name, row_number() OVER (?) Buffers: shared hit=386 read=1517012, temp read=2035717 written=2038337 -> Sort (cost=23772525.03..24043937.75 rows=108565088 width=22) (actual time=668354.361..724059.209 rows=108565086loops=1) Output: big_table.name Sort Key: big_table.name Sort Method: external merge Disk: 3453888kB Buffers: shared hit=386 read=1517012, temp read=2035717 written=2038337 -> Seq Scan on name_stuff.big_table (cost=0.00..2603048.88 rows=108565088 width=22) (actual time=18.886..106731.751rows=108565086 loops=1) Output: big_table.name Buffers: shared hit=386 read=1517012 InitPlan 2 (returns $1) -> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=111513.923..111513.923 rows=1 loops=1) Output: min(sd_1.rnum) Buffers: temp read=431506 written=583461 -> CTE Scan on sd sd_1 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=28796.843..111513.916 rows=25loops=1) Output: sd_1.name, sd_1.rnum Filter: ((sd_1.name)::text = 'DK'::text) Rows Removed by Filter: 108565061 Buffers: temp read=431506 written=583461 InitPlan 3 (returns $2) -> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=12135.464..12135.464 rows=1 loops=1) Output: max(sd_2.rnum) Buffers: temp read=583462 -> CTE Scan on sd sd_2 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=2785.283..12135.457 rows=25loops=1) Output: sd_2.name, sd_2.rnum Filter: ((sd_2.name)::text = 'DK'::text) Rows Removed by Filter: 108565061 Buffers: temp read=583462 Planning Time: 0.159 ms Execution Time: 807095.184 ms (37 rows) Time: 807103.763 ms (13:27.104) testing=> -----Original Message----- From: Andrew Gierth <andrew@tao11.riddles.org.uk> Sent: Tuesday, October 1, 2019 1:41 AM To: David Raymond <David.Raymond@tomtom.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #16031: Group by returns duplicate groups >>>>> "David" == David Raymond <David.Raymond@tomtom.com> writes: David> Looking possibly like indexing is part of the issue at the David> moment. Your original EXPLAIN didn't show any index scans being used...? I can think of a possible explanation if there's some other value in the big table which, due to some collation bug, is not consistently being compared as < 'DK' or > 'DK'. Unfortunately, we have two code paths for comparison, and one of them can't easily be exercised directly from SQL, since it is only used for sorts (and therefore also index builds). Can you try these queries: set enable_indexscan=off; set enable_bitmapscan=off; select count(*) c0, count(*) filter (where bttextcmp(name,'DK') < 0) c1_lt, count(*) filter (where bttextcmp(name,'DK') = 0) c1_eq, count(*) filter (where bttextcmp(name,'DK') > 0) c1_gt, count(*) filter (where bttextcmp('DK',name) > 0) c2_lt, count(*) filter (where bttextcmp('DK',name) = 0) c2_eq, count(*) filter (where bttextcmp('DK',name) < 0) c2_gt from big_table; with sd as (select name, row_number() over (order by name) rnum from big_table) select name from sd where rnum >= (select min(rnum) from sd where name='DK') and rnum <= (select max(rnum) from sd where name='DK') and name <> 'DK'; -- Andrew (irc:RhodiumToad)
В списке pgsql-bugs по дате отправления: