Обсуждение: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 15922 Logged by: David Raymond Email address: david.raymond@tomtom.com PostgreSQL version: 11.4 Operating system: Windows 7 Description: Running version 11.4 on Windows 7, EnterpriseDB installer. I have a case where I'm doing a simple select from a table but I'm getting returned duplicates from its primary key field when I have two different exists statements in the where clause. I simplified the tables down to the bare bones, dumped them out with pg_dump and re-loaded them. Immediately after loading the data the query returns the correct, consistent answer. Then after running analyze it starts returning the duplicates. The query is this: select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); If either one of the 2 exists statements is not included then the results don't have duplicates, it's only when they are both included that there's a problem. The three tables involved have 2 million rows in them, so I can't include the full dump in the form here. Below is output from psql starting immediately after the fresh table load and before analyze is done. Shows the table layouts, the record counts, and the explain results for the same query before and after analyze is done. Please let me know what the next bits you need from me are. Thank you, -David Raymond <david.raymond@tomtom.com> mnr=> set search_path to bug_test; SET Time: 1.427 ms mnr=> \d+ List of relations Schema | Name | Type | Owner | Size | Description ----------+------+-------+-------+---------+------------- bug_test | n | table | mnr | 16 MB | bug_test | n2a | table | mnr | 89 MB | bug_test | n2h | table | mnr | 5800 kB | (3 rows) mnr=> \d+ n Table "bug_test.n" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- id | uuid | | not null | | plain | | Indexes: "n_pkey" PRIMARY KEY, btree (id) Referenced by: TABLE "n2a" CONSTRAINT "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) TABLE "n2h" CONSTRAINT "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> \d+ n2a Table "bug_test.n2a" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- n_id | uuid | | not null | | plain | | a_id | uuid | | not null | | plain | | Indexes: "n2a_pkey" PRIMARY KEY, btree (n_id, a_id) "n2a_a_id_n_id_idx" btree (a_id, n_id) Foreign-key constraints: "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> \d+ n2h Table "bug_test.n2h" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------+-----------+----------+---------+---------+--------------+------------- n_id | uuid | | not null | | plain | | h_id | uuid | | not null | | plain | | Indexes: "n2h_pkey" PRIMARY KEY, btree (n_id, h_id) "n2h_h_id_n_id_idx" btree (h_id, n_id) Foreign-key constraints: "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id) mnr=> select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all select 'n2h', count(*) from n2h; table_name | record_count ------------+-------------- n | 366,869 n2a | 1,546,626 n2h | 98,180 (3 rows) Time: 342.223 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 6,531 | 6,531 (1 row) Time: 426.043 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10609.94..10609.95 rows=1 width=16) (actual time=300.783..300.783 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Nested Loop (cost=1950.10..10590.61 rows=3866 width=16) (actual time=30.074..295.967 rows=6531 loops=1) Output: n.id Inner Unique: true Join Filter: (n2h.n_id = n.id) -> Nested Loop (cost=1949.68..2044.12 rows=3866 width=32) (actual time=30.064..270.423 rows=6531 loops=1) Output: n2a.n_id, n2h.n_id Inner Unique: true -> HashAggregate (cost=1949.25..1951.25 rows=200 width=16) (actual time=29.997..46.915 rows=61325 loops=1) Output: n2h.n_id Group Key: n2h.n_id -> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.023..6.345 rows=98180 loops=1) Output: n2h.n_id, n2h.h_id -> Index Only Scan using n2a_pkey on bug_test.n2a (cost=0.43..1.67 rows=39 width=16) (actual time=0.003..0.003 rows=0 loops=61325) Output: n2a.n_id, n2a.a_id Index Cond: ((n2a.n_id = n2h.n_id) AND (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)) Heap Fetches: 6531 -> Index Only Scan using n_pkey on bug_test.n (cost=0.42..2.20 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=6531) Output: n.id Index Cond: (n.id = n2a.n_id) Heap Fetches: 6531 Planning Time: 0.601 ms Execution Time: 301.614 ms (24 rows) Time: 302.919 ms mnr=> analyze n, n2a, n2h; ANALYZE Time: 252.862 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 8,858 | 6,531 (1 row) Time: 170.372 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=21318.14..21318.15 rows=1 width=16) (actual time=148.410..148.410 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Gather (cost=7592.81..21291.01 rows=5426 width=16) (actual time=65.294..146.754 rows=8858 loops=1) Output: n.id Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=6592.81..19748.41 rows=2261 width=16) (actual time=21.688..48.404 rows=2953 loops=3) Output: n.id Inner Unique: true Worker 0: actual time=1.096..4.734 rows=24 loops=1 Worker 1: actual time=0.470..6.426 rows=225 loops=1 -> Parallel Hash Join (cost=6592.38..18216.45 rows=2263 width=32) (actual time=21.544..31.739 rows=2953 loops=3) Output: n2a.n_id, n2h.n_id Hash Cond: (n2a.n_id = n2h.n_id) Worker 0: actual time=0.828..3.801 rows=24 loops=1 Worker 1: actual time=0.333..4.385 rows=225 loops=1 -> Parallel Bitmap Heap Scan on bug_test.n2a (cost=1113.32..12667.03 rows=14457 width=16) (actual time=1.187..5.693 rows=10193 loops=3) Output: n2a.n_id, n2a.a_id Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) Heap Blocks: exact=2173 Worker 0: actual time=0.196..2.202 rows=246 loops=1 Worker 1: actual time=0.105..2.785 rows=1394 loops=1 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..1104.65 rows=34696 width=0) (actual time=2.861..2.861 rows=30578 loops=1) Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) -> Parallel Hash (cost=4967.71..4967.71 rows=40908 width=16) (actual time=19.568..19.569 rows=32727 loops=3) Output: n2h.n_id Buckets: 131072 Batches: 1 Memory Usage: 5632kB Worker 0: actual time=0.077..0.077 rows=0 loops=1 Worker 1: actual time=0.050..0.050 rows=0 loops=1 -> Parallel Index Only Scan using n2h_pkey on bug_test.n2h (cost=0.42..4967.71 rows=40908 width=16) (actual time=0.024..32.132 rows=98180 loops=1) Output: n2h.n_id Heap Fetches: 98180 -> Index Only Scan using n_pkey on bug_test.n (cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=8858) Output: n.id Index Cond: (n.id = n2h.n_id) Heap Fetches: 8858 Worker 0: actual time=0.035..0.035 rows=1 loops=24 Worker 1: actual time=0.007..0.007 rows=1 loops=225 Planning Time: 0.755 ms Execution Time: 155.490 ms (40 rows) Time: 156.952 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count --------+-------- 30,578 | 30,578 (1 row) Time: 95.252 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id); count | count --------+-------- 61,325 | 61,325 (1 row) Time: 574.642 ms mnr=> select version(); version ------------------------------------------------------------ PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit (1 row) Time: 0.305 ms mnr=>
PG Bug reporting form <noreply@postgresql.org> writes: > I have a case where I'm doing a simple select from a table but I'm getting > returned duplicates from its primary key field when I have two different > exists statements in the where clause. Hm. Seems like pretty clearly a planner bug --- the second plan seems to be missing the de-duplication step that's done by the HashAggregate in the first plan. But I don't want to try to reproduce it with just the info in this report. Can you generate dummy or sanitized data to make a self-contained test case that reproduces the problem? It'd be useful to know also if the problem goes away if you disable parallel query (SET max_parallel_workers_per_gather = 0, for instance). Also, does the issue go away if you drop either or both of the foreign key constraints? regards, tom lane
Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather = 0;"thing did alter it and it's giving consistent results. I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I don'tthink there'd be a problem with sending that other than the size of it all. mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 8,858 | 6,531 (1 row) Time: 151.132 ms mnr=> set max_parallel_workers_per_gather = 0; SET Time: 0.234 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 6,531 | 6,531 (1 row) Time: 457.642 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Hash Semi Join (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1) Output: n.id Hash Cond: (n.id = n2h.n_id) -> Hash Join (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1) Output: n.id, n2a.n_id Inner Unique: true Hash Cond: (n.id = n2a.n_id) -> Seq Scan on bug_test.n (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268 rows=366869loops=1) Output: n.id -> Hash (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1) Output: n2a.n_id Buckets: 32768 Batches: 1 Memory Usage: 1690kB -> Bitmap Heap Scan on bug_test.n2a (cost=996.96..12757.91 rows=31036 width=16) (actual time=2.533..10.997rows=30578 loops=1) Output: n2a.n_id Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) Heap Blocks: exact=2814 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..989.20 rows=31036 width=0) (actual time=2.190..2.190rows=30578 loops=1) Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) -> Hash (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1) Output: n2h.n_id Buckets: 131072 Batches: 2 Memory Usage: 3325kB -> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656 rows=98180loops=1) Output: n2h.n_id Planning Time: 0.760 ms Execution Time: 500.745 ms (27 rows) Time: 502.287 ms mnr=> David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States e-mail: david.raymond@tomtom.com | office +1 603 306 8498 | www.tomtom.com -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 23, 2019 1:52 PM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field PG Bug reporting form <noreply@postgresql.org> writes: > I have a case where I'm doing a simple select from a table but I'm getting > returned duplicates from its primary key field when I have two different > exists statements in the where clause. Hm. Seems like pretty clearly a planner bug --- the second plan seems to be missing the de-duplication step that's done by the HashAggregate in the first plan. But I don't want to try to reproduce it with just the info in this report. Can you generate dummy or sanitized data to make a self-contained test case that reproduces the problem? It'd be useful to know also if the problem goes away if you disable parallel query (SET max_parallel_workers_per_gather = 0, for instance). Also, does the issue go away if you drop either or both of the foreign key constraints? regards, tom lane
Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too bigto send in to you? mnr=> analyze n, n2a, n2h; ANALYZE Time: 13152.868 ms (00:13.153) mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); count | count -------+------- 6,531 | 6,531 (1 row) Time: 14685.529 ms (00:14.686) mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=57772.45..57772.46 rows=1 width=16) (actual time=877.457..877.457 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Gather (cost=6251.99..57749.48 rows=4595 width=16) (actual time=568.882..890.117 rows=6531 loops=1) Output: n.id Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=5251.99..56289.98 rows=1915 width=16) (actual time=451.244..752.614 rows=2177 loops=3) Output: n.id Inner Unique: true Join Filter: (n2h.n_id = n2a.n_id) Worker 0: actual time=372.423..675.329 rows=2007 loops=1 Worker 1: actual time=415.902..719.048 rows=2483 loops=1 -> Parallel Hash Semi Join (cost=5251.43..15379.95 rows=23878 width=32) (actual time=67.824..159.168 rows=20442loops=3) Output: n.id, n2h.n_id Hash Cond: (n.id = n2h.n_id) Worker 0: actual time=0.718..86.042 rows=18097 loops=1 Worker 1: actual time=26.684..123.731 rows=21898 loops=1 -> Parallel Seq Scan on bug_test.n (cost=0.00..9461.62 rows=152862 width=16) (actual time=30.176..69.082rows=122290 loops=3) Output: n.id Worker 0: actual time=0.456..29.446 rows=107115 loops=1 Worker 1: actual time=25.198..58.556 rows=137270 loops=1 -> Parallel Hash (cost=4740.08..4740.08 rows=40908 width=16) (actual time=29.188..29.189 rows=32727loops=3) Output: n2h.n_id Buckets: 131072 Batches: 1 Memory Usage: 5632kB Worker 0: actual time=0.049..0.049 rows=0 loops=1 Worker 1: actual time=0.052..0.052 rows=0 loops=1 -> Parallel Seq Scan on bug_test.n2h (cost=0.00..4740.08 rows=40908 width=16) (actual time=22.989..61.789rows=98180 loops=1) Output: n2h.n_id -> Index Only Scan using n2a_pkey on bug_test.n2a (cost=0.55..1.70 rows=1 width=16) (actual time=0.027..0.027rows=0 loops=61325) Output: n2a.n_id, n2a.a_id Index Cond: ((n2a.n_id = n.id) AND (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)) Heap Fetches: 6531 Worker 0: actual time=0.031..0.031 rows=0 loops=18097 Worker 1: actual time=0.025..0.025 rows=0 loops=21898 Planning Time: 0.940 ms Execution Time: 896.125 ms (36 rows) Time: 898.028 ms mnr=> vacuum full freeze verbose analyze n, n2a, n2h; INFO: vacuuming "bug_test.n" INFO: "n": found 0 removable, 366869 nonremovable row versions in 7933 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 0.10 s, system: 0.12 s, elapsed: 0.71 s. INFO: analyzing "bug_test.n" INFO: "n": scanned 1984 of 1984 pages, containing 366869 live rows and 0 dead rows; 30000 rows in sample, 366869 estimatedtotal rows INFO: vacuuming "bug_test.n2a" INFO: "n2a": found 3093218 removable, 1546626 nonremovable row versions in 56861 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 1.01 s, system: 0.96 s, elapsed: 3.92 s. INFO: analyzing "bug_test.n2a" INFO: "n2a": scanned 11373 of 11373 pages, containing 1546626 live rows and 0 dead rows; 30000 rows in sample, 1546626 estimatedtotal rows INFO: vacuuming "bug_test.n2h" INFO: "n2h": found 96 removable, 98180 nonremovable row versions in 4331 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 0.06 s, system: 0.03 s, elapsed: 0.24 s. INFO: analyzing "bug_test.n2h" INFO: "n2h": scanned 722 of 722 pages, containing 98180 live rows and 0 dead rows; 30000 rows in sample, 98180 estimatedtotal rows VACUUM Time: 30115.734 ms (00:30.116) mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); count | count -------+------- 8,858 | 6,531 (1 row) Time: 193.787 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=22165.61..22165.62 rows=1 width=16) (actual time=209.476..209.477 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Gather (cost=8697.91..22141.26 rows=4871 width=16) (actual time=80.311..247.967 rows=8858 loops=1) Output: n.id Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=7697.91..20654.16 rows=2030 width=16) (actual time=25.238..47.870 rows=2953 loops=3) Output: n.id Inner Unique: true Worker 0: actual time=0.135..0.135 rows=0 loops=1 Worker 1: actual time=0.174..0.174 rows=0 loops=1 -> Parallel Hash Join (cost=7697.49..19295.63 rows=2007 width=32) (actual time=25.229..31.540 rows=2953loops=3) Output: n2a.n_id, n2h.n_id Hash Cond: (n2a.n_id = n2h.n_id) Worker 0: actual time=0.135..0.135 rows=0 loops=1 Worker 1: actual time=0.173..0.173 rows=0 loops=1 -> Parallel Bitmap Heap Scan on bug_test.n2a (cost=1001.75..12536.94 rows=12975 width=16) (actualtime=2.951..6.906 rows=30578 loops=1) Output: n2a.n_id, n2a.a_id Recheck Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid) Heap Blocks: exact=226 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..993.97 rows=31139 width=0) (actual time=2.904..2.905rows=30578 loops=1) Index Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid) -> Parallel Hash (cost=6184.38..6184.38 rows=40908 width=16) (actual time=23.653..23.654 rows=32727loops=3) Output: n2h.n_id Buckets: 131072 Batches: 1 Memory Usage: 5632kB Worker 0: actual time=0.041..0.041 rows=0 loops=1 Worker 1: actual time=0.040..0.040 rows=0 loops=1 -> Parallel Index Only Scan using n2h_pkey on bug_test.n2h (cost=0.42..6184.38 rows=40908 width=16)(actual time=0.022..49.775 rows=98180 loops=1) Output: n2h.n_id Heap Fetches: 98180 -> Index Only Scan using n_pkey on bug_test.n (cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004rows=1 loops=8858) Output: n.id Index Cond: (n.id = n2h.n_id) Heap Fetches: 8858 Planning Time: 0.567 ms Execution Time: 255.986 ms (36 rows) Time: 257.446 ms mnr=> David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States e-mail: david.raymond@tomtom.com | office +1 603 306 8498 | www.tomtom.com -----Original Message----- From: David Raymond Sent: Tuesday, July 23, 2019 2:15 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: pgsql-bugs@lists.postgresql.org Subject: RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather = 0;"thing did alter it and it's giving consistent results. I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I don'tthink there'd be a problem with sending that other than the size of it all. mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 8,858 | 6,531 (1 row) Time: 151.132 ms mnr=> set max_parallel_workers_per_gather = 0; SET Time: 0.234 ms mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); count | count -------+------- 6,531 | 6,531 (1 row) Time: 457.642 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Hash Semi Join (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1) Output: n.id Hash Cond: (n.id = n2h.n_id) -> Hash Join (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1) Output: n.id, n2a.n_id Inner Unique: true Hash Cond: (n.id = n2a.n_id) -> Seq Scan on bug_test.n (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268 rows=366869loops=1) Output: n.id -> Hash (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1) Output: n2a.n_id Buckets: 32768 Batches: 1 Memory Usage: 1690kB -> Bitmap Heap Scan on bug_test.n2a (cost=996.96..12757.91 rows=31036 width=16) (actual time=2.533..10.997rows=30578 loops=1) Output: n2a.n_id Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) Heap Blocks: exact=2814 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..989.20 rows=31036 width=0) (actual time=2.190..2.190rows=30578 loops=1) Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid) -> Hash (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1) Output: n2h.n_id Buckets: 131072 Batches: 2 Memory Usage: 3325kB -> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656 rows=98180loops=1) Output: n2h.n_id Planning Time: 0.760 ms Execution Time: 500.745 ms (27 rows) Time: 502.287 ms mnr=> David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States e-mail: david.raymond@tomtom.com | office +1 603 306 8498 | www.tomtom.com -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 23, 2019 1:52 PM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field PG Bug reporting form <noreply@postgresql.org> writes: > I have a case where I'm doing a simple select from a table but I'm getting > returned duplicates from its primary key field when I have two different > exists statements in the where clause. Hm. Seems like pretty clearly a planner bug --- the second plan seems to be missing the de-duplication step that's done by the HashAggregate in the first plan. But I don't want to try to reproduce it with just the info in this report. Can you generate dummy or sanitized data to make a self-contained test case that reproduces the problem? It'd be useful to know also if the problem goes away if you disable parallel query (SET max_parallel_workers_per_gather = 0, for instance). Also, does the issue go away if you drop either or both of the foreign key constraints? regards, tom lane
David Raymond <David.Raymond@tomtom.com> writes: > Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too bigto send in to you? Given that the problem seems to be specific to parallel query, likely the reason is that reducing the number of rows brings it below the threshold where the planner wants to use parallel query. So you could probably reduce the parallel-query cost parameters to get a failure with a smaller test case. However, if you don't feel like doing that, that's fine. Please *don't* send a 30MB message to the whole list, but you can send it to me privately. regards, tom lane
The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the mailinglist will strip it, but you're also directly in the To: list. After the load here're the basic commands I ran to show the issue: set search_path to bug_test; select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all select 'n2h',count(*) from n2h; select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); analyze n, n2a, n2h; select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); set min_parallel_table_scan_size to 0; set min_parallel_index_scan_size to 0; set parallel_setup_cost to 1; select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) andexists(select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); mnr=> set search_path to bug_test; SET Time: 0.299 ms mnr=> select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all select'n2h', count(*) from n2h; table_name | record_count ------------+-------------- n | 413 n2a | 1,738 n2h | 109 (3 rows) Time: 5.241 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); count | count -------+------- 10 | 10 (1 row) Time: 2.651 ms mnr=> analyze n, n2a, n2h; ANALYZE Time: 4.590 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); count | count -------+------- 10 | 10 (1 row) Time: 1.909 ms mnr=> set min_parallel_table_scan_size to 0; SET Time: 0.181 ms mnr=> set min_parallel_index_scan_size to 0; SET Time: 0.131 ms mnr=> set parallel_setup_cost to 1; SET Time: 0.195 ms mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); count | count -------+------- 15 | 10 (1 row) Time: 154.525 ms mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id)and exists(select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=24.09..24.10 rows=1 width=16) (actual time=186.389..186.389 rows=1 loops=1) Output: count(*), count(DISTINCT n.id) -> Gather (cost=8.30..24.06 rows=6 width=16) (actual time=2.254..196.932 rows=15 loops=1) Output: n.id Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=7.30..22.46 rows=2 width=16) (actual time=0.277..0.310 rows=5 loops=3) Output: n.id Inner Unique: true Worker 0: actual time=0.180..0.180 rows=0 loops=1 Worker 1: actual time=0.412..0.412 rows=0 loops=1 -> Parallel Hash Join (cost=7.03..20.33 rows=3 width=32) (actual time=0.268..0.284 rows=5 loops=3) Output: n2a.n_id, n2h.n_id Hash Cond: (n2a.n_id = n2h.n_id) Worker 0: actual time=0.179..0.179 rows=0 loops=1 Worker 1: actual time=0.412..0.413 rows=0 loops=1 -> Parallel Bitmap Heap Scan on bug_test.n2a (cost=4.59..17.80 rows=17 width=16) (actual time=0.037..0.061rows=40 loops=1) Output: n2a.n_id, n2a.a_id Recheck Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid) Heap Blocks: exact=13 -> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..4.58 rows=40 width=0) (actual time=0.025..0.025rows=40 loops=1) Index Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid) -> Parallel Hash (cost=1.64..1.64 rows=64 width=16) (actual time=0.069..0.069 rows=36 loops=3) Output: n2h.n_id Buckets: 1024 Batches: 1 Memory Usage: 40kB Worker 0: actual time=0.052..0.053 rows=0 loops=1 Worker 1: actual time=0.092..0.092 rows=0 loops=1 -> Parallel Seq Scan on bug_test.n2h (cost=0.00..1.64 rows=64 width=16) (actual time=0.014..0.030rows=109 loops=1) Output: n2h.n_id -> Index Only Scan using n_pkey on bug_test.n (cost=0.27..0.71 rows=1 width=16) (actual time=0.004..0.004rows=1 loops=15) Output: n.id Index Cond: (n.id = n2h.n_id) Heap Fetches: 15 Planning Time: 1.201 ms Execution Time: 197.087 ms (35 rows) Time: 199.122 ms mnr=> David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States e-mail: david.raymond@tomtom.com | office +1 603 306 8498 | www.tomtom.com -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 23, 2019 5:50 PM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field David Raymond <David.Raymond@tomtom.com> writes: > Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too bigto send in to you? Given that the problem seems to be specific to parallel query, likely the reason is that reducing the number of rows brings it below the threshold where the planner wants to use parallel query. So you could probably reduce the parallel-query cost parameters to get a failure with a smaller test case. However, if you don't feel like doing that, that's fine. Please *don't* send a 30MB message to the whole list, but you can send it to me privately. regards, tom lane
Вложения
David Raymond <David.Raymond@tomtom.com> writes: > The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the mailinglist will strip it, but you're also directly in the To: list. Thanks for the test case! However ... I fail to reproduce any problem on HEAD or v11 branch tip. After perusing the commit log, I believe the reason why not is this post-11.4 bug fix: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=14d8b539d which I'd forgotten about, but it seems to match your problem exactly. That makes this a duplicate of bug #15857. Don't know if you're in a position to build from source, but if so, that's just about a one-line fix. Otherwise, it looks like turning off enable_parallel_hash would be the minimal workaround until 11.5 comes out next month. regards, tom lane
Not in a position to re-compile, no, but good to know that it's been fixed. I turned off enable_parallel_hash and the originalqueries that led me to this are now returning much more sane results, so that works for me until 11.5 comes out. Thank you very much for the help. David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States e-mail: david.raymond@tomtom.com | office +1 603 306 8498 | www.tomtom.com -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Wednesday, July 24, 2019 1:07 PM To: David Raymond <David.Raymond@tomtom.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field David Raymond <David.Raymond@tomtom.com> writes: > The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the mailinglist will strip it, but you're also directly in the To: list. Thanks for the test case! However ... I fail to reproduce any problem on HEAD or v11 branch tip. After perusing the commit log, I believe the reason why not is this post-11.4 bug fix: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=14d8b539d which I'd forgotten about, but it seems to match your problem exactly. That makes this a duplicate of bug #15857. Don't know if you're in a position to build from source, but if so, that's just about a one-line fix. Otherwise, it looks like turning off enable_parallel_hash would be the minimal workaround until 11.5 comes out next month. regards, tom lane