Обсуждение: [HACKERS] Problem in Parallel Bitmap Heap Scan?
Hi, I noticed a failure in the inet.sql test while running the regression tests with parallelism cranked up, and can reproduce it interactively as follows. After an spgist index is created and the plan changes to the one shown below, the query returns no rows. regression=# set force_parallel_mode = regress; SET regression=# set max_parallel_workers_per_gather = 2; SET regression=# set parallel_tuple_cost = 0; SET regression=# set parallel_setup_cost = 0; SET regression=# set min_parallel_table_scan_size = 0; SET regression=# set min_parallel_index_scan_size = 0; SET regression=# set enable_seqscan = off; SET regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c | i --------------------+------------------10.0.0.0/8 | 9.1.2.3/810.0.0.0/8 | 10.1.2.3/810.0.0.0/32 |10.1.2.3/810.0.0.0/8 | 10.1.2.3/810.1.0.0/16 | 10.1.2.3/1610.1.2.0/24 | 10.1.2.3/2410.1.2.3/32 | 10.1.2.310.0.0.0/8 | 11.1.2.3/8192.168.1.0/24 | 192.168.1.226/24192.168.1.0/24 | 192.168.1.255/24192.168.1.0/24 | 192.168.1.0/25192.168.1.0/24 | 192.168.1.255/25192.168.1.0/26 | 192.168.1.22610.0.0.0/8 | 10::/8::ffff:1.2.3.4/128 | ::4.3.2.1/2410:23::f1/128 | 10:23::f1/6410:23::8000/113 | 10:23::ffff (17 rows) regression=# CREATE INDEX inet_idx3 ON inet_tbl using spgist (i); CREATE INDEX regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i;c | i ---+--- (0 rows) regression=# explain SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; QUERY PLAN -----------------------------------------------------------------------------------------Gather Merge (cost=16.57..16.67rows=10 width=64) Workers Planned: 1 -> Sort (cost=16.56..16.58 rows=10 width=64) Sort Key:i -> Parallel Bitmap Heap Scan on inet_tbl (cost=12.26..16.39 rows=10 width=64) Recheck Cond: (i <> '192.168.1.0/24'::inet) -> Bitmap Index Scan on inet_idx3 (cost=0.00..12.26 rows=17 width=0) Index Cond: (i <> '192.168.1.0/24'::inet) (8 rows) -- Thomas Munro http://www.enterprisedb.com
On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I noticed a failure in the inet.sql test while running the regression > tests with parallelism cranked up, and can reproduce it interactively > as follows. After an spgist index is created and the plan changes to > the one shown below, the query returns no rows. Thanks for reporting. Seems like we are getting issues related to TBM_ONE_PAGE and TBM_EMPTY. I think in this area we need more testing, reason these are not tested properly because these are not the natural case for parallel bitmap. I think in next few days I will test more such cases by forcing the parallel bitmap. Here is the patch to fix the issue in hand. I have also run the regress suit with force_parallel_mode=regress and all the test are passing. Results after fix. postgres=# explain analyze SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=16.53..16.62 rows=9 width=64) (actual time=4.467..4.478 rows=16 loops=1) Workers Planned: 1 Workers Launched: 1 -> Sort (cost=16.52..16.54 rows=9 width=64) (actual time=0.090..0.093 rows=8 loops=2) Sort Key: i Sort Method: quicksort Memory: 25kB -> Parallel Bitmap Heap Scan on inet_tbl (cost=12.26..16.37 rows=9 width=64) (actual time=0.048..0.050 rows=8 loops=2) Recheck Cond: (i <> '192.168.1.0/24'::inet) Heap Blocks: exact=1 -> Bitmap Index Scan on inet_idx3 (cost=0.00..12.25 rows=16 width=0) (actual time=0.016..0.016 rows=16 loops=1) Index Cond: (i <> '192.168.1.0/24'::inet) Planning time: 0.149 ms Execution time: 5.143 ms (13 rows) postgres=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c | i --------------------+------------------ 10.0.0.0/8 | 9.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.0.0.0/32 | 10.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.1.0.0/16 | 10.1.2.3/16 10.1.2.0/24 | 10.1.2.3/24 10.1.2.3/32 | 10.1.2.3 10.0.0.0/8 | 11.1.2.3/8 192.168.1.0/24 | 192.168.1.226/24 192.168.1.0/24 | 192.168.1.255/24 192.168.1.0/24 | 192.168.1.0/25 192.168.1.0/24 | 192.168.1.255/25 192.168.1.0/26 | 192.168.1.226 ::ffff:1.2.3.4/128 | ::4.3.2.1/24 10:23::f1/128 | 10:23::f1/64 10:23::8000/113 | 10:23::ffff (16 rows) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, Mar 22, 2017 at 1:21 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > postgres=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr > ORDER BY i; > c | i > --------------------+------------------ > 10.0.0.0/8 | 9.1.2.3/8 > 10.0.0.0/8 | 10.1.2.3/8 > 10.0.0.0/32 | 10.1.2.3/8 > 10.0.0.0/8 | 10.1.2.3/8 > 10.1.0.0/16 | 10.1.2.3/16 > 10.1.2.0/24 | 10.1.2.3/24 > 10.1.2.3/32 | 10.1.2.3 > 10.0.0.0/8 | 11.1.2.3/8 > 192.168.1.0/24 | 192.168.1.226/24 > 192.168.1.0/24 | 192.168.1.255/24 > 192.168.1.0/24 | 192.168.1.0/25 > 192.168.1.0/24 | 192.168.1.255/25 > 192.168.1.0/26 | 192.168.1.226 > ::ffff:1.2.3.4/128 | ::4.3.2.1/24 > 10:23::f1/128 | 10:23::f1/64 > 10:23::8000/113 | 10:23::ffff > (16 rows) Isn't that one row short? What happened to this one? 10.0.0.0/8 | 10::/8 -- Thomas Munro http://www.enterprisedb.com
On Wed, Mar 22, 2017 at 5:38 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Isn't that one row short? What happened to this one? > > 10.0.0.0/8 | 10::/8 Actually, In my last test I did not connect to regression database, I have simply taken table and the few rows from inet.sql so it was only 16 rows even with seqscan. Here are the updated results when I connect to regression database and re-test. regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c | i --------------------+------------------10.0.0.0/8 | 9.1.2.3/810.0.0.0/8 | 10.1.2.3/810.0.0.0/32 |10.1.2.3/810.0.0.0/8 | 10.1.2.3/810.1.0.0/16 | 10.1.2.3/1610.1.2.0/24 | 10.1.2.3/2410.1.2.3/32 | 10.1.2.310.0.0.0/8 | 11.1.2.3/8192.168.1.0/24 | 192.168.1.226/24192.168.1.0/24 | 192.168.1.255/24192.168.1.0/24 | 192.168.1.0/25192.168.1.0/24 | 192.168.1.255/25192.168.1.0/26 | 192.168.1.22610.0.0.0/8 | 10::/8::ffff:1.2.3.4/128 | ::4.3.2.1/2410:23::f1/128 | 10:23::f1/6410:23::8000/113 | 10:23::ffff (17 rows) regression=# explain analyze SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------Gather Merge (cost=16.57..16.67 rows=10 width=64) (actual time=4.972..4.983 rows=17 loops=1) Workers Planned: 1 Workers Launched: 1 -> Sort (cost=16.56..16.58 rows=10 width=64)(actual time=0.107..0.110 rows=8 loops=2) Sort Key: i Sort Method: quicksort Memory: 26kB -> Parallel BitmapHeap Scan on inet_tbl (cost=12.26..16.39 rows=10 width=64) (actual time=0.051..0.053 rows=8 loops=2) Recheck Cond: (i <> '192.168.1.0/24'::inet) Heap Blocks: exact=1 -> Bitmap Index Scan on inet_idx3 (cost=0.00..12.26 rows=17 width=0) (actual time=0.016..0.016 rows=17 loops=1) Index Cond: (i <> '192.168.1.0/24'::inet)Planningtime: 0.113 msExecution time: 5.691 ms (13 rows) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 21, 2017 at 5:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I noticed a failure in the inet.sql test while running the regression >> tests with parallelism cranked up, and can reproduce it interactively >> as follows. After an spgist index is created and the plan changes to >> the one shown below, the query returns no rows. > > Thanks for reporting. Seems like we are getting issues related to > TBM_ONE_PAGE and TBM_EMPTY. > > I think in this area we need more testing, reason these are not tested > properly because these are not the natural case for parallel bitmap. > I think in next few days I will test more such cases by forcing the > parallel bitmap. > Okay, is your testing complete? > Here is the patch to fix the issue in hand. I have also run the > regress suit with force_parallel_mode=regress and all the test are > passing. > Thomas, did you get chance to verify Dilip's latest patch? I have added this issue in PostgreSQL 10 Open Items list so that we don't loose track of this issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 25, 2017 at 6:04 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Mar 21, 2017 at 5:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> I noticed a failure in the inet.sql test while running the regression >>> tests with parallelism cranked up, and can reproduce it interactively >>> as follows. After an spgist index is created and the plan changes to >>> the one shown below, the query returns no rows. >> >> Thanks for reporting. Seems like we are getting issues related to >> TBM_ONE_PAGE and TBM_EMPTY. >> >> I think in this area we need more testing, reason these are not tested >> properly because these are not the natural case for parallel bitmap. >> I think in next few days I will test more such cases by forcing the >> parallel bitmap. >> > > Okay, is your testing complete? > >> Here is the patch to fix the issue in hand. I have also run the >> regress suit with force_parallel_mode=regress and all the test are >> passing. >> > > Thomas, did you get chance to verify Dilip's latest patch? > > I have added this issue in PostgreSQL 10 Open Items list so that we > don't loose track of this issue. The result is correct with this patch. I ran make installcheck then the same steps as above and the query result was correct after creating the index. -- Thomas Munro http://www.enterprisedb.com
On Sat, Mar 25, 2017 at 2:25 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >>> I think in this area we need more testing, reason these are not tested >>> properly because these are not the natural case for parallel bitmap. >>> I think in next few days I will test more such cases by forcing the >>> parallel bitmap. >>> >> >> Okay, is your testing complete? Yes, I have done more testing around this area with more cases, like one page with BitmapOr etc. Now it looks fine to me. >> >>> Here is the patch to fix the issue in hand. I have also run the >>> regress suit with force_parallel_mode=regress and all the test are >>> passing. >>> >> >> Thomas, did you get chance to verify Dilip's latest patch? >> >> I have added this issue in PostgreSQL 10 Open Items list so that we >> don't loose track of this issue. > > The result is correct with this patch. I ran make installcheck then > the same steps as above and the query result was correct after > creating the index. Thanks for confirming. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 25, 2017 at 09:55:21PM +1300, Thomas Munro wrote: > On Sat, Mar 25, 2017 at 6:04 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 21, 2017 at 5:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro > >> <thomas.munro@enterprisedb.com> wrote: > >>> I noticed a failure in the inet.sql test while running the regression > >>> tests with parallelism cranked up, and can reproduce it interactively > >>> as follows. After an spgist index is created and the plan changes to > >>> the one shown below, the query returns no rows. > >> > >> Thanks for reporting. Seems like we are getting issues related to > >> TBM_ONE_PAGE and TBM_EMPTY. > >> > >> I think in this area we need more testing, reason these are not tested > >> properly because these are not the natural case for parallel bitmap. > >> I think in next few days I will test more such cases by forcing the > >> parallel bitmap. > >> > > > > Okay, is your testing complete? > > > >> Here is the patch to fix the issue in hand. I have also run the > >> regress suit with force_parallel_mode=regress and all the test are > >> passing. > >> > > > > Thomas, did you get chance to verify Dilip's latest patch? > > > > I have added this issue in PostgreSQL 10 Open Items list so that we > > don't loose track of this issue. > > The result is correct with this patch. I ran make installcheck then > the same steps as above and the query result was correct after > creating the index. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Sun, Apr 9, 2017 at 11:17 PM, Noah Misch <noah@leadboat.com> wrote: > The above-described topic is currently a PostgreSQL 10 open item. I have committed the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company