RE: BUG #16031: Group by returns duplicate groups
От | David Raymond |
---|---|
Тема | RE: BUG #16031: Group by returns duplicate groups |
Дата | |
Msg-id | AM6PR07MB6022C91A384768FAB540B4C0879C0@AM6PR07MB6022.eurprd07.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Ответы |
Re: BUG #16031: Group by returns duplicate groups
(Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Список | pgsql-bugs |
As an update, I've currently got a dump that consistently shows weirdness when loaded. It's just the "name" field, has 1.3million records, is 15 MB zipped, and has things garbled enough that I don't mind sending it. How small does it need to be before it's good to send to someone? Output after re-loading the table: testing=> \d+ weird_grouping Table "name_stuff.weird_grouping" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- name | character varying(254) | | not null | | extended | | testing=> select count(*), count(distinct name) from weird_grouping; count | count -----------+----------- 1,297,265 | 1,176,103 (1 row) Time: 7616.186 ms (00:07.616) testing=> explain (analyze, verbose, costs, buffers, timing, summary) select count(*), count(distinct name) from weird_grouping; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10187.10..10187.11 rows=1 width=16) (actual time=7805.463..7805.463 rows=1 loops=1) Output: count(*), count(DISTINCT name) Buffers: shared hit=2080 read=6236, temp read=5484 written=5506 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..9563.40 rows=124740 width=516) (actual time=0.031..128.203 rows=1297265loops=1) Output: name Buffers: shared hit=2080 read=6236 Planning Time: 0.027 ms Execution Time: 7805.483 ms (8 rows) Time: 7805.822 ms (00:07.806) testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*)> 1; name ------ (0 rows) Time: 1950.401 ms (00:01.950) testing=> explain (analyze, verbose, costs, buffers, timing, summary) with foo as (select name from weird_grouping groupby name) select name from foo group by name having count(*) > 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=9882.25..9884.75 rows=67 width=516) (actual time=1466.738..1466.738 rows=0 loops=1) Output: foo.name Group Key: foo.name Filter: (count(*) > 1) Rows Removed by Filter: 1176101 Buffers: shared hit=2144 read=6172, temp written=4533 CTE foo -> HashAggregate (cost=9875.25..9877.25 rows=200 width=516) (actual time=494.343..734.005 rows=1176101 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Buffers: shared hit=2144 read=6172 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..9563.40 rows=124740 width=516) (actual time=0.283..120.898rows=1297265 loops=1) Output: weird_grouping.name Buffers: shared hit=2144 read=6172 -> CTE Scan on foo (cost=0.00..4.00 rows=200 width=516) (actual time=494.346..1035.185 rows=1176101 loops=1) Output: foo.name Buffers: shared hit=2144 read=6172, temp written=4533 Planning Time: 0.464 ms Execution Time: 2230.238 ms (19 rows) Time: 2231.291 ms (00:02.231) testing=> analyze verbose weird_grouping; INFO: analyzing "name_stuff.weird_grouping" INFO: "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in sample,1297265 estimated total rows ANALYZE Time: 474.963 ms testing=> select count(*), count(distinct name) from weird_grouping; count | count -----------+----------- 1,297,265 | 1,176,103 (1 row) Time: 7449.983 ms (00:07.450) testing=> explain (analyze, verbose, costs, buffers, timing, summary) select count(*), count(distinct name) from weird_grouping; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=7754.578..7754.578 rows=1 loops=1) Output: count(*), count(DISTINCT name) Buffers: shared hit=2240 read=6076, temp read=5484 written=5506 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.036..102.144 rows=1297265loops=1) Output: name Buffers: shared hit=2240 read=6076 Planning Time: 0.031 ms Execution Time: 7754.598 ms (8 rows) Time: 7754.902 ms (00:07.755) testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*)> 1; name ------- DCT DELTA (2 rows) Time: 9561.382 ms (00:09.561) testing=> explain (analyze, verbose, costs, buffers, timing, summary) with foo as (select name from weird_grouping groupby name) select name from foo group by name having count(*) > 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=209783.95..209786.45 rows=67 width=516) (actual time=8591.210..8686.132 rows=2 loops=1) Output: foo.name Group Key: foo.name Filter: (count(*) > 1) Rows Removed by Filter: 1176099 Buffers: shared hit=2304 read=6012, temp read=7800 written=12363 CTE foo -> Group (cost=179613.72..186100.05 rows=947356 width=20) (actual time=6416.900..7842.634 rows=1176103 loops=1) Output: weird_grouping.name Group Key: weird_grouping.name Buffers: shared hit=2304 read=6012, temp read=7800 written=7830 -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=6416.899..7665.158 rows=1297265 loops=1) Output: weird_grouping.name Sort Key: weird_grouping.name Sort Method: external merge Disk: 39048kB Buffers: shared hit=2304 read=6012, temp read=7800 written=7830 -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.294..137.486rows=1297265 loops=1) Output: weird_grouping.name Buffers: shared hit=2304 read=6012 -> CTE Scan on foo (cost=0.00..18947.12 rows=947356 width=516) (actual time=6416.902..8105.771 rows=1176103 loops=1) Output: foo.name Buffers: shared hit=2304 read=6012, temp read=7800 written=12363 Planning Time: 0.258 ms Execution Time: 10305.891 ms (24 rows) Time: 10306.990 ms (00:10.307) testing=> -----Original Message----- From: Andrew Gierth <andrew@tao11.riddles.org.uk> Sent: Tuesday, October 1, 2019 3:27 PM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #16031: Group by returns duplicate groups >>>>> "David" == David Raymond <David.Raymond@tomtom.com> writes: David> I checked for other places where there was weirdness going on David> and there are more. You could try something like, select name, lname from (select name, lag(name) over (order by name) as lname from big_table) s where name < lname; That should show all cases where the sort order was inconsistent with the < operator (which obviously should never happen). However, there's not much more I can do to help with this, since I don't use Windows myself and have no useful access to any Windows system. You might try and cut down the data to the smallest set that shows inconsistencies using the above; particularly relevant is whether the problem only shows up for external merge sorts or whether it happens for in-memory sorts too. -- Andrew (irc:RhodiumToad)
В списке pgsql-bugs по дате отправления:
Предыдущее
От: "Daniel Verite"Дата:
Сообщение: Re: BUG #16034: `\set ECHO all` doesn't work for \e command
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #16035: STATEMENT_TIMEOUT not working when we have single quote usage inside CTE which is used in inner sql