Обсуждение: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

Поиск
Список
Период
Сортировка

BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15922
Logged by:          David Raymond
Email address:      david.raymond@tomtom.com
PostgreSQL version: 11.4
Operating system:   Windows 7
Description:

Running version 11.4 on Windows 7, EnterpriseDB installer.
I have a case where I'm doing a simple select from a table but I'm getting
returned duplicates from its primary key field when I have two different
exists statements in the where clause.

I simplified the tables down to the bare bones, dumped them out with pg_dump
and re-loaded them. Immediately after loading the data the query returns the
correct, consistent answer. Then after running analyze it starts returning
the duplicates.

The query is this:
select count(*), count(distinct id) from n where exists(select 1 from n2h
where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id =
'00005831-4900-1200-0000-0000773ae45f');

If either one of the 2 exists statements is not included then the results
don't have duplicates, it's only when they are both included that there's a
problem.

The three tables involved have 2 million rows in them, so I can't include
the full dump in the form here. Below is output from psql starting
immediately after the fresh table load and before analyze is done. Shows the
table layouts, the record counts, and the explain results for the same query
before and after analyze is done.

Please let me know what the next bits you need from me are.
Thank you,
-David Raymond <david.raymond@tomtom.com>

mnr=> set search_path to bug_test;
SET
Time: 1.427 ms
mnr=> \d+
                    List of relations
  Schema  | Name | Type  | Owner |  Size   | Description
----------+------+-------+-------+---------+-------------
 bug_test | n    | table | mnr   | 16 MB   |
 bug_test | n2a  | table | mnr   | 89 MB   |
 bug_test | n2h  | table | mnr   | 5800 kB |
(3 rows)

mnr=> \d+ n
                                  Table "bug_test.n"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 id     | uuid |           | not null |         | plain   |              |
Indexes:
    "n_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "n2a" CONSTRAINT "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)
    TABLE "n2h" CONSTRAINT "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)

mnr=> \d+ n2a
                                 Table "bug_test.n2a"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 n_id   | uuid |           | not null |         | plain   |              |
 a_id   | uuid |           | not null |         | plain   |              |
Indexes:
    "n2a_pkey" PRIMARY KEY, btree (n_id, a_id)
    "n2a_a_id_n_id_idx" btree (a_id, n_id)
Foreign-key constraints:
    "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> \d+ n2h
                                 Table "bug_test.n2h"
 Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
 n_id   | uuid |           | not null |         | plain   |              |
 h_id   | uuid |           | not null |         | plain   |              |
Indexes:
    "n2h_pkey" PRIMARY KEY, btree (n_id, h_id)
    "n2h_h_id_n_id_idx" btree (h_id, n_id)
Foreign-key constraints:
    "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> select 'n' as table_name, count(*) as record_count from n union all
select 'n2a', count(*) from n2a union all select 'n2h', count(*) from n2h;
 table_name | record_count
------------+--------------
 n          |      366,869
 n2a        |    1,546,626
 n2h        |       98,180
(3 rows)

Time: 342.223 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 6,531 | 6,531
(1 row)

Time: 426.043 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
                                                                     QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10609.94..10609.95 rows=1 width=16) (actual
time=300.783..300.783 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Nested Loop  (cost=1950.10..10590.61 rows=3866 width=16) (actual
time=30.074..295.967 rows=6531 loops=1)
         Output: n.id
         Inner Unique: true
         Join Filter: (n2h.n_id = n.id)
         ->  Nested Loop  (cost=1949.68..2044.12 rows=3866 width=32) (actual
time=30.064..270.423 rows=6531 loops=1)
               Output: n2a.n_id, n2h.n_id
               Inner Unique: true
               ->  HashAggregate  (cost=1949.25..1951.25 rows=200 width=16)
(actual time=29.997..46.915 rows=61325 loops=1)
                     Output: n2h.n_id
                     Group Key: n2h.n_id
                     ->  Seq Scan on bug_test.n2h  (cost=0.00..1703.80
rows=98180 width=16) (actual time=0.023..6.345 rows=98180 loops=1)
                           Output: n2h.n_id, n2h.h_id
               ->  Index Only Scan using n2a_pkey on bug_test.n2a
(cost=0.43..1.67 rows=39 width=16) (actual time=0.003..0.003 rows=0
loops=61325)
                     Output: n2a.n_id, n2a.a_id
                     Index Cond: ((n2a.n_id = n2h.n_id) AND (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid))
                     Heap Fetches: 6531
         ->  Index Only Scan using n_pkey on bug_test.n  (cost=0.42..2.20
rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=6531)
               Output: n.id
               Index Cond: (n.id = n2a.n_id)
               Heap Fetches: 6531
 Planning Time: 0.601 ms
 Execution Time: 301.614 ms
(24 rows)

Time: 302.919 ms
mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 252.862 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 8,858 | 6,531
(1 row)

Time: 170.372 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');

      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21318.14..21318.15 rows=1 width=16) (actual
time=148.410..148.410 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Gather  (cost=7592.81..21291.01 rows=5426 width=16) (actual
time=65.294..146.754 rows=8858 loops=1)
         Output: n.id
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=6592.81..19748.41 rows=2261 width=16)
(actual time=21.688..48.404 rows=2953 loops=3)
               Output: n.id
               Inner Unique: true
               Worker 0: actual time=1.096..4.734 rows=24 loops=1
               Worker 1: actual time=0.470..6.426 rows=225 loops=1
               ->  Parallel Hash Join  (cost=6592.38..18216.45 rows=2263
width=32) (actual time=21.544..31.739 rows=2953 loops=3)
                     Output: n2a.n_id, n2h.n_id
                     Hash Cond: (n2a.n_id = n2h.n_id)
                     Worker 0: actual time=0.828..3.801 rows=24 loops=1
                     Worker 1: actual time=0.333..4.385 rows=225 loops=1
                     ->  Parallel Bitmap Heap Scan on bug_test.n2a
(cost=1113.32..12667.03 rows=14457 width=16) (actual time=1.187..5.693
rows=10193 loops=3)
                           Output: n2a.n_id, n2a.a_id
                           Recheck Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
                           Heap Blocks: exact=2173
                           Worker 0: actual time=0.196..2.202 rows=246
loops=1
                           Worker 1: actual time=0.105..2.785 rows=1394
loops=1
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx
(cost=0.00..1104.65 rows=34696 width=0) (actual time=2.861..2.861 rows=30578
loops=1)
                                 Index Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
                     ->  Parallel Hash  (cost=4967.71..4967.71 rows=40908
width=16) (actual time=19.568..19.569 rows=32727 loops=3)
                           Output: n2h.n_id
                           Buckets: 131072  Batches: 1  Memory Usage:
5632kB
                           Worker 0: actual time=0.077..0.077 rows=0
loops=1
                           Worker 1: actual time=0.050..0.050 rows=0
loops=1
                           ->  Parallel Index Only Scan using n2h_pkey on
bug_test.n2h  (cost=0.42..4967.71 rows=40908 width=16) (actual
time=0.024..32.132 rows=98180 loops=1)
                                 Output: n2h.n_id
                                 Heap Fetches: 98180
               ->  Index Only Scan using n_pkey on bug_test.n
(cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1
loops=8858)
                     Output: n.id
                     Index Cond: (n.id = n2h.n_id)
                     Heap Fetches: 8858
                     Worker 0: actual time=0.035..0.035 rows=1 loops=24
                     Worker 1: actual time=0.007..0.007 rows=1 loops=225
 Planning Time: 0.755 ms
 Execution Time: 155.490 ms
(40 rows)

Time: 156.952 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 count  | count
--------+--------
 30,578 | 30,578
(1 row)

Time: 95.252 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2h where n_id = n.id);
 count  | count
--------+--------
 61,325 | 61,325
(1 row)

Time: 574.642 ms
mnr=> select version();
                          version
------------------------------------------------------------
 PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.305 ms
mnr=>


PG Bug reporting form <noreply@postgresql.org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.

Hm.  Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan.  But I don't want to try to reproduce it with just
the info in this report.  Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?

It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).

Also, does the issue go away if you drop either or both of the foreign
key constraints?

            regards, tom lane



Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather =
0;"thing did alter it and it's giving consistent results.
 

I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I
don'tthink there'd be a problem with sending that other than the size of it all.
 


mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1
 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 8,858 | 6,531
(1 row)

Time: 151.132 ms
mnr=> set max_parallel_workers_per_gather = 0;
SET
Time: 0.234 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 
 count | count
-------+-------
 6,531 | 6,531
(1 row)

Time: 457.642 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id =
n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 
                                                                         QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Hash Semi Join  (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1)
         Output: n.id
         Hash Cond: (n.id = n2h.n_id)
         ->  Hash Join  (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1)
               Output: n.id, n2a.n_id
               Inner Unique: true
               Hash Cond: (n.id = n2a.n_id)
               ->  Seq Scan on bug_test.n  (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268
rows=366869loops=1)
 
                     Output: n.id
               ->  Hash  (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1)
                     Output: n2a.n_id
                     Buckets: 32768  Batches: 1  Memory Usage: 1690kB
                     ->  Bitmap Heap Scan on bug_test.n2a  (cost=996.96..12757.91 rows=31036 width=16) (actual
time=2.533..10.997rows=30578 loops=1)
 
                           Output: n2a.n_id
                           Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
                           Heap Blocks: exact=2814
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx  (cost=0.00..989.20 rows=31036 width=0) (actual
time=2.190..2.190rows=30578 loops=1)
 
                                 Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
         ->  Hash  (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1)
               Output: n2h.n_id
               Buckets: 131072  Batches: 2  Memory Usage: 3325kB
               ->  Seq Scan on bug_test.n2h  (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656
rows=98180loops=1)
 
                     Output: n2h.n_id
 Planning Time: 0.760 ms
 Execution Time: 500.745 ms
(27 rows)

Time: 502.287 ms
mnr=>




David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david.raymond@tomtom.com  | office +1 603 306 8498 | www.tomtom.com

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Tuesday, July 23, 2019 1:52 PM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

PG Bug reporting form <noreply@postgresql.org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.

Hm.  Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan.  But I don't want to try to reproduce it with just
the info in this report.  Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?

It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).

Also, does the issue go away if you drop either or both of the foreign
key constraints?

            regards, tom lane

Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a
sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would
changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze
andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The
dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent
answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too
bigto send in to you?
 



mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 13152.868 ms (00:13.153)
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
 count | count
-------+-------
 6,531 | 6,531
(1 row)

Time: 14685.529 ms (00:14.686)
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id =
n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
                                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=57772.45..57772.46 rows=1 width=16) (actual time=877.457..877.457 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Gather  (cost=6251.99..57749.48 rows=4595 width=16) (actual time=568.882..890.117 rows=6531 loops=1)
         Output: n.id
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=5251.99..56289.98 rows=1915 width=16) (actual time=451.244..752.614 rows=2177 loops=3)
               Output: n.id
               Inner Unique: true
               Join Filter: (n2h.n_id = n2a.n_id)
               Worker 0: actual time=372.423..675.329 rows=2007 loops=1
               Worker 1: actual time=415.902..719.048 rows=2483 loops=1
               ->  Parallel Hash Semi Join  (cost=5251.43..15379.95 rows=23878 width=32) (actual time=67.824..159.168
rows=20442loops=3)
 
                     Output: n.id, n2h.n_id
                     Hash Cond: (n.id = n2h.n_id)
                     Worker 0: actual time=0.718..86.042 rows=18097 loops=1
                     Worker 1: actual time=26.684..123.731 rows=21898 loops=1
                     ->  Parallel Seq Scan on bug_test.n  (cost=0.00..9461.62 rows=152862 width=16) (actual
time=30.176..69.082rows=122290 loops=3)
 
                           Output: n.id
                           Worker 0: actual time=0.456..29.446 rows=107115 loops=1
                           Worker 1: actual time=25.198..58.556 rows=137270 loops=1
                     ->  Parallel Hash  (cost=4740.08..4740.08 rows=40908 width=16) (actual time=29.188..29.189
rows=32727loops=3)
 
                           Output: n2h.n_id
                           Buckets: 131072  Batches: 1  Memory Usage: 5632kB
                           Worker 0: actual time=0.049..0.049 rows=0 loops=1
                           Worker 1: actual time=0.052..0.052 rows=0 loops=1
                           ->  Parallel Seq Scan on bug_test.n2h  (cost=0.00..4740.08 rows=40908 width=16) (actual
time=22.989..61.789rows=98180 loops=1)
 
                                 Output: n2h.n_id
               ->  Index Only Scan using n2a_pkey on bug_test.n2a  (cost=0.55..1.70 rows=1 width=16) (actual
time=0.027..0.027rows=0 loops=61325)
 
                     Output: n2a.n_id, n2a.a_id
                     Index Cond: ((n2a.n_id = n.id) AND (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid))
                     Heap Fetches: 6531
                     Worker 0: actual time=0.031..0.031 rows=0 loops=18097
                     Worker 1: actual time=0.025..0.025 rows=0 loops=21898
 Planning Time: 0.940 ms
 Execution Time: 896.125 ms
(36 rows)

Time: 898.028 ms
mnr=> vacuum full freeze verbose analyze n, n2a, n2h;
INFO:  vacuuming "bug_test.n"
INFO:  "n": found 0 removable, 366869 nonremovable row versions in 7933 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.10 s, system: 0.12 s, elapsed: 0.71 s.
INFO:  analyzing "bug_test.n"
INFO:  "n": scanned 1984 of 1984 pages, containing 366869 live rows and 0 dead rows; 30000 rows in sample, 366869
estimatedtotal rows
 
INFO:  vacuuming "bug_test.n2a"
INFO:  "n2a": found 3093218 removable, 1546626 nonremovable row versions in 56861 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 1.01 s, system: 0.96 s, elapsed: 3.92 s.
INFO:  analyzing "bug_test.n2a"
INFO:  "n2a": scanned 11373 of 11373 pages, containing 1546626 live rows and 0 dead rows; 30000 rows in sample, 1546626
estimatedtotal rows
 
INFO:  vacuuming "bug_test.n2h"
INFO:  "n2h": found 96 removable, 98180 nonremovable row versions in 4331 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.06 s, system: 0.03 s, elapsed: 0.24 s.
INFO:  analyzing "bug_test.n2h"
INFO:  "n2h": scanned 722 of 722 pages, containing 98180 live rows and 0 dead rows; 30000 rows in sample, 98180
estimatedtotal rows
 
VACUUM
Time: 30115.734 ms (00:30.116)
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
 count | count
-------+-------
 8,858 | 6,531
(1 row)

Time: 193.787 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id =
n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
                                                                                   QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22165.61..22165.62 rows=1 width=16) (actual time=209.476..209.477 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Gather  (cost=8697.91..22141.26 rows=4871 width=16) (actual time=80.311..247.967 rows=8858 loops=1)
         Output: n.id
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=7697.91..20654.16 rows=2030 width=16) (actual time=25.238..47.870 rows=2953 loops=3)
               Output: n.id
               Inner Unique: true
               Worker 0: actual time=0.135..0.135 rows=0 loops=1
               Worker 1: actual time=0.174..0.174 rows=0 loops=1
               ->  Parallel Hash Join  (cost=7697.49..19295.63 rows=2007 width=32) (actual time=25.229..31.540
rows=2953loops=3)
 
                     Output: n2a.n_id, n2h.n_id
                     Hash Cond: (n2a.n_id = n2h.n_id)
                     Worker 0: actual time=0.135..0.135 rows=0 loops=1
                     Worker 1: actual time=0.173..0.173 rows=0 loops=1
                     ->  Parallel Bitmap Heap Scan on bug_test.n2a  (cost=1001.75..12536.94 rows=12975 width=16)
(actualtime=2.951..6.906 rows=30578 loops=1)
 
                           Output: n2a.n_id, n2a.a_id
                           Recheck Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
                           Heap Blocks: exact=226
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx  (cost=0.00..993.97 rows=31139 width=0) (actual
time=2.904..2.905rows=30578 loops=1)
 
                                 Index Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
                     ->  Parallel Hash  (cost=6184.38..6184.38 rows=40908 width=16) (actual time=23.653..23.654
rows=32727loops=3)
 
                           Output: n2h.n_id
                           Buckets: 131072  Batches: 1  Memory Usage: 5632kB
                           Worker 0: actual time=0.041..0.041 rows=0 loops=1
                           Worker 1: actual time=0.040..0.040 rows=0 loops=1
                           ->  Parallel Index Only Scan using n2h_pkey on bug_test.n2h  (cost=0.42..6184.38 rows=40908
width=16)(actual time=0.022..49.775 rows=98180 loops=1)
 
                                 Output: n2h.n_id
                                 Heap Fetches: 98180
               ->  Index Only Scan using n_pkey on bug_test.n  (cost=0.42..0.68 rows=1 width=16) (actual
time=0.004..0.004rows=1 loops=8858)
 
                     Output: n.id
                     Index Cond: (n.id = n2h.n_id)
                     Heap Fetches: 8858
 Planning Time: 0.567 ms
 Execution Time: 255.986 ms
(36 rows)

Time: 257.446 ms
mnr=>



David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david.raymond@tomtom.com  | office +1 603 306 8498 | www.tomtom.com






-----Original Message-----
From: David Raymond 
Sent: Tuesday, July 23, 2019 2:15 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather =
0;"thing did alter it and it's giving consistent results.
 

I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I
don'tthink there'd be a problem with sending that other than the size of it all.
 


mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1
 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 count | count
-------+-------
 8,858 | 6,531
(1 row)

Time: 151.132 ms
mnr=> set max_parallel_workers_per_gather = 0;
SET
Time: 0.234 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select
1from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 
 count | count
-------+-------
 6,531 | 6,531
(1 row)

Time: 457.642 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id =
n.id)and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
 
                                                                         QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Hash Semi Join  (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1)
         Output: n.id
         Hash Cond: (n.id = n2h.n_id)
         ->  Hash Join  (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1)
               Output: n.id, n2a.n_id
               Inner Unique: true
               Hash Cond: (n.id = n2a.n_id)
               ->  Seq Scan on bug_test.n  (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268
rows=366869loops=1)
 
                     Output: n.id
               ->  Hash  (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1)
                     Output: n2a.n_id
                     Buckets: 32768  Batches: 1  Memory Usage: 1690kB
                     ->  Bitmap Heap Scan on bug_test.n2a  (cost=996.96..12757.91 rows=31036 width=16) (actual
time=2.533..10.997rows=30578 loops=1)
 
                           Output: n2a.n_id
                           Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
                           Heap Blocks: exact=2814
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx  (cost=0.00..989.20 rows=31036 width=0) (actual
time=2.190..2.190rows=30578 loops=1)
 
                                 Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
         ->  Hash  (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1)
               Output: n2h.n_id
               Buckets: 131072  Batches: 2  Memory Usage: 3325kB
               ->  Seq Scan on bug_test.n2h  (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656
rows=98180loops=1)
 
                     Output: n2h.n_id
 Planning Time: 0.760 ms
 Execution Time: 500.745 ms
(27 rows)

Time: 502.287 ms
mnr=>




David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david.raymond@tomtom.com  | office +1 603 306 8498 | www.tomtom.com






-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Tuesday, July 23, 2019 1:52 PM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

PG Bug reporting form <noreply@postgresql.org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.

Hm.  Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan.  But I don't want to try to reproduce it with just
the info in this report.  Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?

It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).

Also, does the issue go away if you drop either or both of the foreign
key constraints?

            regards, tom lane

David Raymond <David.Raymond@tomtom.com> writes:
> Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a
sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would
changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze
andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The
dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent
answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too
bigto send in to you? 

Given that the problem seems to be specific to parallel query, likely
the reason is that reducing the number of rows brings it below the
threshold where the planner wants to use parallel query.  So you could
probably reduce the parallel-query cost parameters to get a failure
with a smaller test case.  However, if you don't feel like doing that,
that's fine.

Please *don't* send a 30MB message to the whole list, but you can
send it to me privately.

            regards, tom lane



The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the
mailinglist will strip it, but you're also directly in the To: list.
 

After the load here're the basic commands I ran to show the issue:


set search_path to bug_test;
select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all select
'n2h',count(*) from n2h;
 
select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
analyze n, n2a, n2h;
select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
set min_parallel_table_scan_size to 0;
set min_parallel_index_scan_size to 0;
set parallel_setup_cost to 1;
select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2awhere n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id)
andexists(select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 



mnr=> set search_path to bug_test;
SET
Time: 0.299 ms
mnr=> select 'n' as table_name, count(*) as record_count from n union all select 'n2a', count(*) from n2a union all
select'n2h', count(*) from n2h;
 
 table_name | record_count
------------+--------------
 n          |          413
 n2a        |        1,738
 n2h        |          109
(3 rows)

Time: 5.241 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1
fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
 count | count
-------+-------
    10 |    10
(1 row)

Time: 2.651 ms
mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 4.590 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1
fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
 count | count
-------+-------
    10 |    10
(1 row)

Time: 1.909 ms
mnr=> set min_parallel_table_scan_size to 0;
SET
Time: 0.181 ms
mnr=> set min_parallel_index_scan_size to 0;
SET
Time: 0.131 ms
mnr=> set parallel_setup_cost to 1;
SET
Time: 0.195 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id = n.id) and exists(select 1
fromn2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
 count | count
-------+-------
    15 |    10
(1 row)

Time: 154.525 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists(select 1 from n2h where n_id =
n.id)and exists(select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
 
                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=24.09..24.10 rows=1 width=16) (actual time=186.389..186.389 rows=1 loops=1)
   Output: count(*), count(DISTINCT n.id)
   ->  Gather  (cost=8.30..24.06 rows=6 width=16) (actual time=2.254..196.932 rows=15 loops=1)
         Output: n.id
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop  (cost=7.30..22.46 rows=2 width=16) (actual time=0.277..0.310 rows=5 loops=3)
               Output: n.id
               Inner Unique: true
               Worker 0: actual time=0.180..0.180 rows=0 loops=1
               Worker 1: actual time=0.412..0.412 rows=0 loops=1
               ->  Parallel Hash Join  (cost=7.03..20.33 rows=3 width=32) (actual time=0.268..0.284 rows=5 loops=3)
                     Output: n2a.n_id, n2h.n_id
                     Hash Cond: (n2a.n_id = n2h.n_id)
                     Worker 0: actual time=0.179..0.179 rows=0 loops=1
                     Worker 1: actual time=0.412..0.413 rows=0 loops=1
                     ->  Parallel Bitmap Heap Scan on bug_test.n2a  (cost=4.59..17.80 rows=17 width=16) (actual
time=0.037..0.061rows=40 loops=1)
 
                           Output: n2a.n_id, n2a.a_id
                           Recheck Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
                           Heap Blocks: exact=13
                           ->  Bitmap Index Scan on n2a_a_id_n_id_idx  (cost=0.00..4.58 rows=40 width=0) (actual
time=0.025..0.025rows=40 loops=1)
 
                                 Index Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
                     ->  Parallel Hash  (cost=1.64..1.64 rows=64 width=16) (actual time=0.069..0.069 rows=36 loops=3)
                           Output: n2h.n_id
                           Buckets: 1024  Batches: 1  Memory Usage: 40kB
                           Worker 0: actual time=0.052..0.053 rows=0 loops=1
                           Worker 1: actual time=0.092..0.092 rows=0 loops=1
                           ->  Parallel Seq Scan on bug_test.n2h  (cost=0.00..1.64 rows=64 width=16) (actual
time=0.014..0.030rows=109 loops=1)
 
                                 Output: n2h.n_id
               ->  Index Only Scan using n_pkey on bug_test.n  (cost=0.27..0.71 rows=1 width=16) (actual
time=0.004..0.004rows=1 loops=15)
 
                     Output: n.id
                     Index Cond: (n.id = n2h.n_id)
                     Heap Fetches: 15
 Planning Time: 1.201 ms
 Execution Time: 197.087 ms
(35 rows)

Time: 199.122 ms
mnr=>



David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david.raymond@tomtom.com  | office +1 603 306 8498 | www.tomtom.com






-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Tuesday, July 23, 2019 5:50 PM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

David Raymond <David.Raymond@tomtom.com> writes:
> Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a
sanitizedversion now. No real luck with trimming down the record count though. When deleting too many records it would
changethe query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze
andit came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The
dumpfile is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent
answer,but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too
bigto send in to you?
 

Given that the problem seems to be specific to parallel query, likely
the reason is that reducing the number of rows brings it below the
threshold where the planner wants to use parallel query.  So you could
probably reduce the parallel-query cost parameters to get a failure
with a smaller test case.  However, if you don't feel like doing that,
that's fine.

Please *don't* send a 30MB message to the whole list, but you can
send it to me privately.

            regards, tom lane



Вложения
David Raymond <David.Raymond@tomtom.com> writes:
> The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the
mailinglist will strip it, but you're also directly in the To: list. 

Thanks for the test case!

However ... I fail to reproduce any problem on HEAD or v11 branch tip.
After perusing the commit log, I believe the reason why not is this
post-11.4 bug fix:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=14d8b539d

which I'd forgotten about, but it seems to match your problem exactly.
That makes this a duplicate of bug #15857.

Don't know if you're in a position to build from source, but if so,
that's just about a one-line fix.  Otherwise, it looks like turning
off enable_parallel_hash would be the minimal workaround until 11.5
comes out next month.

            regards, tom lane



Not in a position to re-compile, no, but good to know that it's been fixed. I turned off enable_parallel_hash and the
originalqueries that led me to this are now returning much more sane results, so that works for me until 11.5 comes
out.

Thank you very much for the help.


David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david.raymond@tomtom.com  | office +1 603 306 8498 | www.tomtom.com



-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Wednesday, July 24, 2019 1:07 PM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

David Raymond <David.Raymond@tomtom.com> writes:
> The cost tweaks let me slash the size right down to where the dump is 150 kB, which I've attached. Don't know if the
mailinglist will strip it, but you're also directly in the To: list.
 

Thanks for the test case!

However ... I fail to reproduce any problem on HEAD or v11 branch tip.
After perusing the commit log, I believe the reason why not is this
post-11.4 bug fix:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=14d8b539d

which I'd forgotten about, but it seems to match your problem exactly.
That makes this a duplicate of bug #15857.

Don't know if you're in a position to build from source, but if so,
that's just about a one-line fix.  Otherwise, it looks like turning
off enable_parallel_hash would be the minimal workaround until 11.5
comes out next month.

            regards, tom lane