Обсуждение: Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)

Поиск
Список
Период
Сортировка
A description of what you are trying to achieve and what results you expect.:

My end goal was to test the execution time difference between using an IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and when a string match was not found.  My expectation was that my 2 functions would behave fairly similarly, but they most certainly did not.  Here are the table, functions, test queries, and test query results I received, as well as comments as I present the pieces and talk about the results from my perspective.
This is the table and data that I used for my tests.  A table with 1 million sequenced records.  No indexing on any columns.  I ran ANALYZE on this table and a VACUUM on the entire database, just to be sure.
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct;
These are the 2 functions that I ran my final tests with.  My goal was to determine which function would perform the fastest and my expectation was that they would still be somewhat close in execution time comparison.
--Test Function #1
CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
--Test Function #2
CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
The first thing I did was to run some baseline tests using the basic queries inside of the IF() checks found in each of the functions to see how the query planner handled them.  I ran the following two queries.
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
The execution time results and query plans for these two were very similar, as expected.  In the results I can see that 2 workers were employed for each query plan.
--Results for the SELECT COUNT(*) query.
QUERY PLAN                                                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=172.105..172.105 rows=1 loops=1)                             
  Buffers: shared read=1912                                                                                                            
  ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=172.020..172.099 rows=3 loops=1)                                   
  Workers Planned: 2                                                                                                             
  Workers Launched: 2                                                                                                            
  Buffers: shared read=1912                                                                                                      
  ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.123..155.123 rows=1 loops=3)                  
     Buffers: shared read=5406                                                                                                
     ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.103..155.103 rows=0 loops=3)
     Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
     Rows Removed by Filter: 333333                                                                                     
     Buffers: shared read=5406                                                                                          
Planning time: 0.718 ms                                                                                                                
Execution time: 187.601 ms
--Results for the SELECT 1 query.
QUERY PLAN                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..13156.00 rows=5000 width=4) (actual time=175.682..175.682 rows=0 loops=1)                           
  Workers Planned: 2                                                                                                       
  Workers Launched: 2                                                                                                      
  Buffers: shared read=2021                                                                                                
  ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=4) (actual time=159.769..159.769 rows=0 loops=3)
  Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
  Rows Removed by Filter: 333333                                                                                     
  Buffers: shared read=5406                                                                                          
Planning time: 0.874 ms                                                                                                    
Execution time: 192.045 ms  
After running these baseline tests and viewing the fairly similar results, right or wrong, I expected my queries that tested the functions to behave similarly.  I started with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:57:22.624 EDT [17812] LOG:  duration: 155.239 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.230..155.230 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.222..155.222 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311170
   Buffers: shared read=1682
2018-04-16 14:57:22.624 EDT [9096] LOG:  duration: 154.603 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=154.576..154.576 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=154.570..154.570 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311061
   Buffers: shared read=1682
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 197.260 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Result  (cost=12661.43..12661.45 rows=1 width=1) (actual time=179.561..179.561 rows=1 loops=1)
   Buffers: shared read=2042
   InitPlan 1 (returns $1)
  ->  Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=179.559..179.559 rows=1 loops=1)
     Buffers: shared read=2042
     ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=179.529..179.556 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared read=2042
     ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=162.831..162.831 rows=1 loops=3)
        Buffers: shared read=5406
        ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=162.824..162.824 rows=0 loops=3)
        Filter: (lower(text_distinct) = 'test5000001'::text)
        Rows Removed by Filter: 333333
        Buffers: shared read=5406
2018-04-16 14:57:22.642 EDT [15132] CONTEXT:  SQL statement "SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0"
 PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 199.371 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifcount_noidx('Test5000001')
 Function Scan on zz_spx_ifcount_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=199.370..199.370 rows=1 loops=1)
   Buffers: shared hit=218 read=5446
Here I could see that the 2 workers were getting employed again, which is great.  Just what I expected.  And the execution time was in the same ballpark as my first baseline test using just the query found inside of the IF() check.  199 milliseonds.  Okay.

I moved on to test the other function with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274 rows=1 loops=1)
   Buffers: shared read=5406
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0) (actual time=426.273..426.273 rows=0 loops=1)
     Filter: (lower(text_distinct) = 'test5000001'::text)
     Rows Removed by Filter: 1000000
     Buffers: shared read=5406
2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
   Buffers: shared hit=30 read=5438
Definitely not the execution time, or query plan, results I was expecting.  As we can see, no workers were employed here and my guess was that this was the reason or the large execution time difference between these 2 tests?  199 milliseconds versus 428 milliseconds, which is a big difference.  Why are workers not being employed here like they were when I tested the query found inside of the IF() check in a standalone manner?  But then I ran another test and the results made even less sense to me.

When I ran the above query the first 5 times after starting my Postgres service, I got the same results each time (around 428 milliseconds), but when running the query 6 or more times, the execution time jumps up to almost double that.  Here are the "auto_explain" results running this query a 6th time...
--"auto_explain" results after running the same query 6 or more times.
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000 width=0) (actual time=761.841..761.841 rows=0 loops=1)
     Filter: (lower(text_distinct) = lower($1))
     Rows Removed by Filter: 1000000
     Buffers: shared hit=160 read=5246
2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
As you can see, the execution time jumps up to about 762 milliseonds.  I can see in the sequence scan node that the LOWER() function shows up on the right side of the equal operator, whereas in the first 5 runs of this test query the plan did not show this.  Why is this?

I tried increasing the "work_mem" setting to 1GB to see if this made any difference, but the results were the same.

So those were the tests that I performed and the results I received, which left me with many questions.  If anyone is able to help me understand this behavior, I'd greatly appreciate it.  This is my first post to the email list, so I hope I did a good enough job providing all the information needed.

Thanks!
Ryan
PostgreSQL version number you are running:
PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit
How you installed PostgreSQL:
Using the Enterprise DB installer.
I have also installed Enterprise DB's Postgres Enterprise Manager (PEM) 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software.  The PEM Agent service that gets installed is currently turned off.
Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all.
name                               |current_setting                        |source              
-----------------------------------|---------------------------------------|---------------------
application_name                   |DBeaver 5.0.3 - Main                   |session             
auto_explain.log_analyze           |on                                     |configuration file  
auto_explain.log_buffers           |on                                     |configuration file  
auto_explain.log_min_duration      |0                                      |configuration file  
auto_explain.log_nested_statements |on                                     |configuration file  
auto_explain.log_triggers          |on                                     |configuration file  
client_encoding                    |UTF8                                   |client              
DateStyle                          |ISO, MDY                               |client              
default_text_search_config         |pg_catalog.english                     |configuration file  
dynamic_shared_memory_type         |windows                                |configuration file  
extra_float_digits                 |3                                      |session             
lc_messages                        |English_United States.1252             |configuration file  
lc_monetary                        |English_United States.1252             |configuration file  
lc_numeric                         |English_United States.1252             |configuration file  
lc_time                            |English_United States.1252             |configuration file  
listen_addresses                   |*                                      |configuration file  
log_destination                    |stderr                                 |configuration file  
log_timezone                       |US/Eastern                             |configuration file  
logging_collector                  |on                                     |configuration file  
max_connections                    |100                                    |configuration file  
max_stack_depth                    |2MB                                    |environment variable
port                               |5432                                   |configuration file  
shared_buffers                     |128MB                                  |configuration file  
shared_preload_libraries           |$libdir/sql-profiler.dll, auto_explain |configuration file  
ssl                                |on                                     |configuration file  
ssl_ca_file                        |root.crt                               |configuration file  
ssl_cert_file                      |server.crt                             |configuration file  
ssl_crl_file                       |root.crl                               |configuration file  
ssl_key_file                       |server.key                             |configuration file  
TimeZone                           |America/New_York                       |client               
Operating system and version:
Windows 10 Pro 64-bit, Version 1709 (Build 16299.309)
Hardware:
Processor - Intel Core i7-7820HQ @ 2.90GHz
RAM - 16GB
RAID? - No
Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2
What program you're using to connect to PostgreSQL:
DBeaver Community Edition v5.0.3
Is there anything relevant or unusual in the PostgreSQL server logs?:
Not that I noticed.
For questions about any kind of error:
N/A
What you were doing when the error happened / how to cause the error:
N/A
The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot)
N/A
Hi

2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman@gmail.com>:
A description of what you are trying to achieve and what results you expect.:

My end goal was to test the execution time difference between using an IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and when a string match was not found.  My expectation was that my 2 functions would behave fairly similarly, but they most certainly did not.  Here are the table, functions, test queries, and test query results I received, as well as comments as I present the pieces and talk about the results from my perspective.
This is the table and data that I used for my tests.  A table with 1 million sequenced records.  No indexing on any columns.  I ran ANALYZE on this table and a VACUUM on the entire database, just to be sure.
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct;
These are the 2 functions that I ran my final tests with.  My goal was to determine which function would perform the fastest and my expectation was that they would still be somewhat close in execution time comparison.
--Test Function #1
CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
--Test Function #2
CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
The first thing I did was to run some baseline tests using the basic queries inside of the IF() checks found in each of the functions to see how the query planner handled them.  I ran the following two queries.
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
The execution time results and query plans for these two were very similar, as expected.  In the results I can see that 2 workers were employed for each query plan.
--Results for the SELECT COUNT(*) query.
QUERY PLAN                                                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=172.105..172.105 rows=1 loops=1)                             
  Buffers: shared read=1912                                                                                                            
  ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=172.020..172.099 rows=3 loops=1)                                   
  Workers Planned: 2                                                                                                             
  Workers Launched: 2                                                                                                            
  Buffers: shared read=1912                                                                                                      
  ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.123..155.123 rows=1 loops=3)                  
     Buffers: shared read=5406                                                                                                
     ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.103..155.103 rows=0 loops=3)
     Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
     Rows Removed by Filter: 333333                                                                                     
     Buffers: shared read=5406                                                                                          
Planning time: 0.718 ms                                                                                                                
Execution time: 187.601 ms
--Results for the SELECT 1 query.
QUERY PLAN                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..13156.00 rows=5000 width=4) (actual time=175.682..175.682 rows=0 loops=1)                           
  Workers Planned: 2                                                                                                       
  Workers Launched: 2                                                                                                      
  Buffers: shared read=2021                                                                                                
  ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=4) (actual time=159.769..159.769 rows=0 loops=3)
  Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
  Rows Removed by Filter: 333333                                                                                     
  Buffers: shared read=5406                                                                                          
Planning time: 0.874 ms                                                                                                    
Execution time: 192.045 ms  
After running these baseline tests and viewing the fairly similar results, right or wrong, I expected my queries that tested the functions to behave similarly.  I started with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:57:22.624 EDT [17812] LOG:  duration: 155.239 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.230..155.230 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.222..155.222 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311170
   Buffers: shared read=1682
2018-04-16 14:57:22.624 EDT [9096] LOG:  duration: 154.603 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=154.576..154.576 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=154.570..154.570 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311061
   Buffers: shared read=1682
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 197.260 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Result  (cost=12661.43..12661.45 rows=1 width=1) (actual time=179.561..179.561 rows=1 loops=1)
   Buffers: shared read=2042
   InitPlan 1 (returns $1)
  ->  Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=179.559..179.559 rows=1 loops=1)
     Buffers: shared read=2042
     ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=179.529..179.556 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared read=2042
     ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=162.831..162.831 rows=1 loops=3)
        Buffers: shared read=5406
        ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=162.824..162.824 rows=0 loops=3)
        Filter: (lower(text_distinct) = 'test5000001'::text)
        Rows Removed by Filter: 333333
        Buffers: shared read=5406
2018-04-16 14:57:22.642 EDT [15132] CONTEXT:  SQL statement "SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0"
 PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 199.371 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifcount_noidx('Test5000001')
 Function Scan on zz_spx_ifcount_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=199.370..199.370 rows=1 loops=1)
   Buffers: shared hit=218 read=5446
Here I could see that the 2 workers were getting employed again, which is great.  Just what I expected.  And the execution time was in the same ballpark as my first baseline test using just the query found inside of the IF() check.  199 milliseonds.  Okay.

I moved on to test the other function with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274 rows=1 loops=1)
   Buffers: shared read=5406
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0) (actual time=426.273..426.273 rows=0 loops=1)
     Filter: (lower(text_distinct) = 'test5000001'::text)
     Rows Removed by Filter: 1000000
     Buffers: shared read=5406
2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
   Buffers: shared hit=30 read=5438
Definitely not the execution time, or query plan, results I was expecting.  As we can see, no workers were employed here and my guess was that this was the reason or the large execution time difference between these 2 tests?  199 milliseconds versus 428 milliseconds, which is a big difference.  Why are workers not being employed here like they were when I tested the query found inside of the IF() check in a standalone manner?  But then I ran another test and the results made even less sense to me.

When I ran the above query the first 5 times after starting my Postgres service, I got the same results each time (around 428 milliseconds), but when running the query 6 or more times, the execution time jumps up to almost double that.  Here are the "auto_explain" results running this query a 6th time...
--"auto_explain" results after running the same query 6 or more times.
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000 width=0) (actual time=761.841..761.841 rows=0 loops=1)
     Filter: (lower(text_distinct) = lower($1))
     Rows Removed by Filter: 1000000
     Buffers: shared hit=160 read=5246
2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
As you can see, the execution time jumps up to about 762 milliseonds.  I can see in the sequence scan node that the LOWER() function shows up on the right side of the equal operator, whereas in the first 5 runs of this test query the plan did not show this.  Why is this?

I tried increasing the "work_mem" setting to 1GB to see if this made any difference, but the results were the same.

So those were the tests that I performed and the results I received, which left me with many questions.  If anyone is able to help me understand this behavior, I'd greatly appreciate it.  This is my first post to the email list, so I hope I did a good enough job providing all the information needed.

Thanks!
Ryan
PostgreSQL version number you are running:
PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit
How you installed PostgreSQL:
Using the Enterprise DB installer.
I have also installed Enterprise DB's Postgres Enterprise Manager (PEM) 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software.  The PEM Agent service that gets installed is currently turned off.
Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all.
name                               |current_setting                        |source              
-----------------------------------|---------------------------------------|---------------------
application_name                   |DBeaver 5.0.3 - Main                   |session             
auto_explain.log_analyze           |on                                     |configuration file  
auto_explain.log_buffers           |on                                     |configuration file  
auto_explain.log_min_duration      |0                                      |configuration file  
auto_explain.log_nested_statements |on                                     |configuration file  
auto_explain.log_triggers          |on                                     |configuration file  
client_encoding                    |UTF8                                   |client              
DateStyle                          |ISO, MDY                               |client              
default_text_search_config         |pg_catalog.english                     |configuration file  
dynamic_shared_memory_type         |windows                                |configuration file  
extra_float_digits                 |3                                      |session             
lc_messages                        |English_United States.1252             |configuration file  
lc_monetary                        |English_United States.1252             |configuration file  
lc_numeric                         |English_United States.1252             |configuration file  
lc_time                            |English_United States.1252             |configuration file  
listen_addresses                   |*                                      |configuration file  
log_destination                    |stderr                                 |configuration file  
log_timezone                       |US/Eastern                             |configuration file  
logging_collector                  |on                                     |configuration file  
max_connections                    |100                                    |configuration file  
max_stack_depth                    |2MB                                    |environment variable
port                               |5432                                   |configuration file  
shared_buffers                     |128MB                                  |configuration file  
shared_preload_libraries           |$libdir/sql-profiler.dll, auto_explain |configuration file  
ssl                                |on                                     |configuration file  
ssl_ca_file                        |root.crt                               |configuration file  
ssl_cert_file                      |server.crt                             |configuration file  
ssl_crl_file                       |root.crl                               |configuration file  
ssl_key_file                       |server.key                             |configuration file  
TimeZone                           |America/New_York                       |client               
Operating system and version:
Windows 10 Pro 64-bit, Version 1709 (Build 16299.309)
Hardware:
Processor - Intel Core i7-7820HQ @ 2.90GHz
RAM - 16GB
RAID? - No
Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2
What program you're using to connect to PostgreSQL:
DBeaver Community Edition v5.0.3
Is there anything relevant or unusual in the PostgreSQL server logs?:
Not that I noticed.
For questions about any kind of error:
N/A
What you were doing when the error happened / how to cause the error:
N/A
The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot)
N/A

A support of parallel query execution is not complete -  it doesn't work in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be - the worst case of EXISTS is same like COUNT), then due disabled parallel execution the COUNT(*) is faster now. It is unfortunate, because I believe so this issue will be fixed in few years.

Regards

Pavel

On 04/16/2018 10:42 PM, Hackety Man wrote:
> ...
>     The first thing I did was to run some baseline tests using the basic
>     queries inside of the IF() checks found in each of the functions to
>     see how the query planner handled them.  I ran the following two
>     queries.
> 
>         EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>         LOWER(text_distinct) = LOWER('Test5000001');
>         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(text_distinct) = LOWER('Test5000001');
> 

Those are not the interesting plans, though. The EXISTS only cares about 
the first row, so you should be looking at

     EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
     LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;

>     I moved on to test the other function with the following query...
> 
>         EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>         zz_spx_ifcount_noidx('Test5000001');
> 
>     and I got the following "auto_explain" results...
> 
>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms 
>         plan:
>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>           Result  (cost=4.08..4.09 rows=1 width=1) (actual
>         time=426.274..426.274 rows=1 loops=1)
>             Buffers: shared read=5406
>             InitPlan 1 (returns $0)
>            ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
>         width=0) (actual time=426.273..426.273 rows=0 loops=1)
>               Filter: (lower(text_distinct) = 'test5000001'::text)
>               Rows Removed by Filter: 1000000
>               Buffers: shared read=5406
>         2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms 
>         plan:
>           Query Text: explain (analyze, buffers) select * from
>         zz_spx_ifexists_noidx('Test5000001')
>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>         rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>             Buffers: shared hit=30 read=5438
> 
>     Definitely not the execution time, or query plan, results I was
>     expecting.  As we can see, no workers were employed here and my
>     guess was that this was the reason or the large execution time
>     difference between these 2 tests?  199 milliseconds versus 428
>     milliseconds, which is a big difference.  Why are workers not being
>     employed here like they were when I tested the query found inside of
>     the IF() check in a standalone manner?  But then I ran another test
>     and the results made even less sense to me.
> 

The plan difference is due to not realizing the EXISTS essentially 
implies LIMIT 1. Secondly, it expects about 5000 rows matching the 
condition,  uniformly spread through the table. But it apparently takes 
much longer to find the first one, hence the increased duration.

How did you generate the data?

>     When I ran the above query the first 5 times after starting my
>     Postgres service, I got the same results each time (around 428
>     milliseconds), but when running the query 6 or more times, the
>     execution time jumps up to almost double that.  Here are the
>     "auto_explain" results running this query a 6th time...
> 

This is likely due to generating a generic plan after the fifth 
execution. There seems to be only small difference in costs, though.

>         --"auto_explain" results after running the same query 6 or more
>         times.
>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms 
>         plan:
>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>           Result  (cost=4.58..4.59 rows=1 width=1) (actual
>         time=761.843..761.843 rows=1 loops=1)
>             Buffers: shared hit=160 read=5246
>             InitPlan 1 (returns $0)
>            ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000
>         width=0) (actual time=761.841..761.841 rows=0 loops=1)
>               Filter: (lower(text_distinct) = lower($1))
>               Rows Removed by Filter: 1000000
>               Buffers: shared hit=160 read=5246
>         2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms 
>         plan:
>           Query Text: explain (analyze, buffers) select * from
>         zz_spx_ifexists_noidx('Test5000001')
>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>         rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
>             Buffers: shared hit=160 read=5246
> 
>     As you can see, the execution time jumps up to about 762
>     milliseonds.  I can see in the sequence scan node that the LOWER()
>     function shows up on the right side of the equal operator, whereas
>     in the first 5 runs of this test query the plan did not show this. 
>     Why is this?
> 

It doesn't really matter on which side it shows, it's more about a 
generic plan built without knowledge of the parameter value.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 04/17/2018 07:17 AM, Pavel Stehule wrote:
> Hi
> 
> 2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman@gmail.com 
> <mailto:hacketyman@gmail.com>>:
> 
> ...
 >
> A support of parallel query execution is not complete -  it doesn't work 
> in PostgreSQL 11 too. So although EXISTS variant can be faster (but can 
> be - the worst case of EXISTS is same like COUNT), then due disabled 
> parallel execution the COUNT(*) is faster now. It is unfortunate, 
> because I believe so this issue will be fixed in few years.
> 

None of the issues seems to be particularly related to parallel query. 
It's much more likely a general issue with planning EXISTS / LIMIT and 
non-uniform data distribution.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 04/16/2018 10:42 PM, Hackety Man wrote:
...
    The first thing I did was to run some baseline tests using the basic
    queries inside of the IF() checks found in each of the functions to
    see how the query planner handled them.  I ran the following two
    queries.

        EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
        LOWER(text_distinct) = LOWER('Test5000001');
        EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
        LOWER(text_distinct) = LOWER('Test5000001');


Those are not the interesting plans, though. The EXISTS only cares about the first row, so you should be looking at

    EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
    LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;


Okay.  I tested this query and it did return an execution time on par with my tests of the "zz_spx_ifexists_noidx" function.

 


    I moved on to test the other function with the following query...

        EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
        zz_spx_ifcount_noidx('Test5000001');

    and I got the following "auto_explain" results...

        2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms         plan:
          Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
          Result  (cost=4.08..4.09 rows=1 width=1) (actual
        time=426.274..426.274 rows=1 loops=1)
            Buffers: shared read=5406
            InitPlan 1 (returns $0)
           ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
        width=0) (actual time=426.273..426.273 rows=0 loops=1)
              Filter: (lower(text_distinct) = 'test5000001'::text)
              Rows Removed by Filter: 1000000
              Buffers: shared read=5406
        2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
        "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
          PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
        2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms         plan:
          Query Text: explain (analyze, buffers) select * from
        zz_spx_ifexists_noidx('Test5000001')
          Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
        rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
            Buffers: shared hit=30 read=5438

    Definitely not the execution time, or query plan, results I was
    expecting.  As we can see, no workers were employed here and my
    guess was that this was the reason or the large execution time
    difference between these 2 tests?  199 milliseconds versus 428
    milliseconds, which is a big difference.  Why are workers not being
    employed here like they were when I tested the query found inside of
    the IF() check in a standalone manner?  But then I ran another test
    and the results made even less sense to me.


The plan difference is due to not realizing the EXISTS essentially implies LIMIT 1. Secondly, it expects about 5000 rows matching the condition,  uniformly spread through the table. But it apparently takes much longer to find the first one, hence the increased duration.


Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even though it doesn't show in the query plan, correct?  Is it not possible for parallel scans to be implemented while applying an implicit, or explicit, LIMIT 1?

 

How did you generate the data?


I used generate_series() to create 1 million records in sequence at the same time that I created the table using the following script...
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct; 



    When I ran the above query the first 5 times after starting my
    Postgres service, I got the same results each time (around 428
    milliseconds), but when running the query 6 or more times, the
    execution time jumps up to almost double that.  Here are the
    "auto_explain" results running this query a 6th time...


This is likely due to generating a generic plan after the fifth execution. There seems to be only small difference in costs, though.


        --"auto_explain" results after running the same query 6 or more
        times.
        2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms         plan:
          Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
          Result  (cost=4.58..4.59 rows=1 width=1) (actual
        time=761.843..761.843 rows=1 loops=1)
            Buffers: shared hit=160 read=5246
            InitPlan 1 (returns $0)
           ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000
        width=0) (actual time=761.841..761.841 rows=0 loops=1)
              Filter: (lower(text_distinct) = lower($1))
              Rows Removed by Filter: 1000000
              Buffers: shared hit=160 read=5246
        2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
        "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
          PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
        2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms         plan:
          Query Text: explain (analyze, buffers) select * from
        zz_spx_ifexists_noidx('Test5000001')
          Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
        rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
            Buffers: shared hit=160 read=5246

    As you can see, the execution time jumps up to about 762
    milliseonds.  I can see in the sequence scan node that the LOWER()
    function shows up on the right side of the equal operator, whereas
    in the first 5 runs of this test query the plan did not show this.     Why is this?


It doesn't really matter on which side it shows, it's more about a generic plan built without knowledge of the parameter value.


Right.  I was more wondering why it switched over to a generic plan, as you've stated, like clockwork starting with the 6th execution run.

 


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi Pavel,

Thanks for sharing that information.  I was not aware that the parallel query functionality was not yet fully implemented.

Thanks,
Ryan

On Tue, Apr 17, 2018 at 1:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman@gmail.com>:
A description of what you are trying to achieve and what results you expect.:

My end goal was to test the execution time difference between using an IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and when a string match was not found.  My expectation was that my 2 functions would behave fairly similarly, but they most certainly did not.  Here are the table, functions, test queries, and test query results I received, as well as comments as I present the pieces and talk about the results from my perspective.
This is the table and data that I used for my tests.  A table with 1 million sequenced records.  No indexing on any columns.  I ran ANALYZE on this table and a VACUUM on the entire database, just to be sure.
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct;
These are the 2 functions that I ran my final tests with.  My goal was to determine which function would perform the fastest and my expectation was that they would still be somewhat close in execution time comparison.
--Test Function #1
CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
--Test Function #2
CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
 RETURNS text
 LANGUAGE 'plpgsql'
 STABLE
AS $$
 
BEGIN
 IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) THEN
  RETURN 'Found';
 ELSE
  RETURN 'Not Found';
 END IF;
END;
$$;
The first thing I did was to run some baseline tests using the basic queries inside of the IF() checks found in each of the functions to see how the query planner handled them.  I ran the following two queries.
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
The execution time results and query plans for these two were very similar, as expected.  In the results I can see that 2 workers were employed for each query plan.
--Results for the SELECT COUNT(*) query.
QUERY PLAN                                                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=172.105..172.105 rows=1 loops=1)                             
  Buffers: shared read=1912                                                                                                            
  ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=172.020..172.099 rows=3 loops=1)                                   
  Workers Planned: 2                                                                                                             
  Workers Launched: 2                                                                                                            
  Buffers: shared read=1912                                                                                                      
  ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.123..155.123 rows=1 loops=3)                  
     Buffers: shared read=5406                                                                                                
     ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.103..155.103 rows=0 loops=3)
     Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
     Rows Removed by Filter: 333333                                                                                     
     Buffers: shared read=5406                                                                                          
Planning time: 0.718 ms                                                                                                                
Execution time: 187.601 ms
--Results for the SELECT 1 query.
QUERY PLAN                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..13156.00 rows=5000 width=4) (actual time=175.682..175.682 rows=0 loops=1)                           
  Workers Planned: 2                                                                                                       
  Workers Launched: 2                                                                                                      
  Buffers: shared read=2021                                                                                                
  ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=4) (actual time=159.769..159.769 rows=0 loops=3)
  Filter: (lower(text_distinct) = 'test5000001'::text)                                                               
  Rows Removed by Filter: 333333                                                                                     
  Buffers: shared read=5406                                                                                          
Planning time: 0.874 ms                                                                                                    
Execution time: 192.045 ms  
After running these baseline tests and viewing the fairly similar results, right or wrong, I expected my queries that tested the functions to behave similarly.  I started with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:57:22.624 EDT [17812] LOG:  duration: 155.239 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=155.230..155.230 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=155.222..155.222 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311170
   Buffers: shared read=1682
2018-04-16 14:57:22.624 EDT [9096] LOG:  duration: 154.603 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=154.576..154.576 rows=1 loops=1)
   Buffers: shared read=1682
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=154.570..154.570 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test5000001'::text)
   Rows Removed by Filter: 311061
   Buffers: shared read=1682
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 197.260 ms  plan:
 Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
 Result  (cost=12661.43..12661.45 rows=1 width=1) (actual time=179.561..179.561 rows=1 loops=1)
   Buffers: shared read=2042
   InitPlan 1 (returns $1)
  ->  Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=179.559..179.559 rows=1 loops=1)
     Buffers: shared read=2042
     ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=179.529..179.556 rows=3 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     Buffers: shared read=2042
     ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=162.831..162.831 rows=1 loops=3)
        Buffers: shared read=5406
        ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=162.824..162.824 rows=0 loops=3)
        Filter: (lower(text_distinct) = 'test5000001'::text)
        Rows Removed by Filter: 333333
        Buffers: shared read=5406
2018-04-16 14:57:22.642 EDT [15132] CONTEXT:  SQL statement "SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0"
 PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF
2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 199.371 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifcount_noidx('Test5000001')
 Function Scan on zz_spx_ifcount_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=199.370..199.370 rows=1 loops=1)
   Buffers: shared hit=218 read=5446
Here I could see that the 2 workers were getting employed again, which is great.  Just what I expected.  And the execution time was in the same ballpark as my first baseline test using just the query found inside of the IF() check.  199 milliseonds.  Okay.

I moved on to test the other function with the following query...
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000001');
and I got the following "auto_explain" results...
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274 rows=1 loops=1)
   Buffers: shared read=5406
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0) (actual time=426.273..426.273 rows=0 loops=1)
     Filter: (lower(text_distinct) = 'test5000001'::text)
     Rows Removed by Filter: 1000000
     Buffers: shared read=5406
2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
   Buffers: shared hit=30 read=5438
Definitely not the execution time, or query plan, results I was expecting.  As we can see, no workers were employed here and my guess was that this was the reason or the large execution time difference between these 2 tests?  199 milliseconds versus 428 milliseconds, which is a big difference.  Why are workers not being employed here like they were when I tested the query found inside of the IF() check in a standalone manner?  But then I ran another test and the results made even less sense to me.

When I ran the above query the first 5 times after starting my Postgres service, I got the same results each time (around 428 milliseconds), but when running the query 6 or more times, the execution time jumps up to almost double that.  Here are the "auto_explain" results running this query a 6th time...
--"auto_explain" results after running the same query 6 or more times.
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms  plan:
 Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
 Result  (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
   InitPlan 1 (returns $0)
  ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000 width=0) (actual time=761.841..761.841 rows=0 loops=1)
     Filter: (lower(text_distinct) = lower($1))
     Rows Removed by Filter: 1000000
     Buffers: shared hit=160 read=5246
2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
 PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms  plan:
 Query Text: explain (analyze, buffers) select * from zz_spx_ifexists_noidx('Test5000001')
 Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
   Buffers: shared hit=160 read=5246
As you can see, the execution time jumps up to about 762 milliseonds.  I can see in the sequence scan node that the LOWER() function shows up on the right side of the equal operator, whereas in the first 5 runs of this test query the plan did not show this.  Why is this?

I tried increasing the "work_mem" setting to 1GB to see if this made any difference, but the results were the same.

So those were the tests that I performed and the results I received, which left me with many questions.  If anyone is able to help me understand this behavior, I'd greatly appreciate it.  This is my first post to the email list, so I hope I did a good enough job providing all the information needed.

Thanks!
Ryan
PostgreSQL version number you are running:
PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit
How you installed PostgreSQL:
Using the Enterprise DB installer.
I have also installed Enterprise DB's Postgres Enterprise Manager (PEM) 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software.  The PEM Agent service that gets installed is currently turned off.
Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all.
name                               |current_setting                        |source              
-----------------------------------|---------------------------------------|---------------------
application_name                   |DBeaver 5.0.3 - Main                   |session             
auto_explain.log_analyze           |on                                     |configuration file  
auto_explain.log_buffers           |on                                     |configuration file  
auto_explain.log_min_duration      |0                                      |configuration file  
auto_explain.log_nested_statements |on                                     |configuration file  
auto_explain.log_triggers          |on                                     |configuration file  
client_encoding                    |UTF8                                   |client              
DateStyle                          |ISO, MDY                               |client              
default_text_search_config         |pg_catalog.english                     |configuration file  
dynamic_shared_memory_type         |windows                                |configuration file  
extra_float_digits                 |3                                      |session             
lc_messages                        |English_United States.1252             |configuration file  
lc_monetary                        |English_United States.1252             |configuration file  
lc_numeric                         |English_United States.1252             |configuration file  
lc_time                            |English_United States.1252             |configuration file  
listen_addresses                   |*                                      |configuration file  
log_destination                    |stderr                                 |configuration file  
log_timezone                       |US/Eastern                             |configuration file  
logging_collector                  |on                                     |configuration file  
max_connections                    |100                                    |configuration file  
max_stack_depth                    |2MB                                    |environment variable
port                               |5432                                   |configuration file  
shared_buffers                     |128MB                                  |configuration file  
shared_preload_libraries           |$libdir/sql-profiler.dll, auto_explain |configuration file  
ssl                                |on                                     |configuration file  
ssl_ca_file                        |root.crt                               |configuration file  
ssl_cert_file                      |server.crt                             |configuration file  
ssl_crl_file                       |root.crl                               |configuration file  
ssl_key_file                       |server.key                             |configuration file  
TimeZone                           |America/New_York                       |client               
Operating system and version:
Windows 10 Pro 64-bit, Version 1709 (Build 16299.309)
Hardware:
Processor - Intel Core i7-7820HQ @ 2.90GHz
RAM - 16GB
RAID? - No
Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2
What program you're using to connect to PostgreSQL:
DBeaver Community Edition v5.0.3
Is there anything relevant or unusual in the PostgreSQL server logs?:
Not that I noticed.
For questions about any kind of error:
N/A
What you were doing when the error happened / how to cause the error:
N/A
The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot)
N/A

A support of parallel query execution is not complete -  it doesn't work in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be - the worst case of EXISTS is same like COUNT), then due disabled parallel execution the COUNT(*) is faster now. It is unfortunate, because I believe so this issue will be fixed in few years.

Regards

Pavel


On 04/17/2018 04:01 PM, Hackety Man wrote:
> 
> 
> On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra 
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
> 
>     On 04/16/2018 10:42 PM, Hackety Man wrote:
> 
>         ...
>              The first thing I did was to run some baseline tests using
>         the basic
>              queries inside of the IF() checks found in each of the
>         functions to
>              see how the query planner handled them.  I ran the
>         following two
>              queries.
> 
>                  EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM
>         zz_noidx1 WHERE
>                  LOWER(text_distinct) = LOWER('Test5000001');
>                  EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>                  LOWER(text_distinct) = LOWER('Test5000001');
> 
> 
>     Those are not the interesting plans, though. The EXISTS only cares
>     about the first row, so you should be looking at
> 
>          EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>          LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;
> 
> 
> 
> Okay.  I tested this query and it did return an execution time on par 
> with my tests of the "zz_spx_ifexists_noidx" function.
> *
> *
> 
> 
> 
>              I moved on to test the other function with the following
>         query...
> 
>                  EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>                  zz_spx_ifcount_noidx('Test5000001');
> 
>              and I got the following "auto_explain" results...
> 
>                  2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
>         426.279 ms         plan:
>                    Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>                  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>                    Result  (cost=4.08..4.09 rows=1 width=1) (actual
>                  time=426.274..426.274 rows=1 loops=1)
>                      Buffers: shared read=5406
>                      InitPlan 1 (returns $0)
>                     ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00
>         rows=5000
>                  width=0) (actual time=426.273..426.273 rows=0 loops=1)
>                        Filter: (lower(text_distinct) = 'test5000001'::text)
>                        Rows Removed by Filter: 1000000
>                        Buffers: shared read=5406
>                  2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
>                  "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>                  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>                    PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
>         at IF
>                  2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
>         428.077 ms         plan:
>                    Query Text: explain (analyze, buffers) select * from
>                  zz_spx_ifexists_noidx('Test5000001')
>                    Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>                  rows=1 width=32) (actual time=428.076..428.076 rows=1
>         loops=1)
>                      Buffers: shared hit=30 read=5438
> 
>              Definitely not the execution time, or query plan, results I was
>              expecting.  As we can see, no workers were employed here and my
>              guess was that this was the reason or the large execution time
>              difference between these 2 tests?  199 milliseconds versus 428
>              milliseconds, which is a big difference.  Why are workers
>         not being
>              employed here like they were when I tested the query found
>         inside of
>              the IF() check in a standalone manner?  But then I ran
>         another test
>              and the results made even less sense to me.
> 
> 
>     The plan difference is due to not realizing the EXISTS essentially
>     implies LIMIT 1. Secondly, it expects about 5000 rows matching the
>     condition,  uniformly spread through the table. But it apparently
>     takes much longer to find the first one, hence the increased duration.
> 
> 
> 
> Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even 
> though it doesn't show in the query plan, correct? Is it not possible 
> for parallel scans to be implemented while applying an implicit, or 
> explicit, LIMIT 1?
> **//___^
> 

It doesn't add a limit node to the plan, but it behaves similarly to 
that. The database only needs to fetch the first row to answer the 
EXISTS predicate.

I don't think this is inherently incompatible with parallel plans, but 
the planner simply thinks it's going to bee very cheap - cheaper than 
setting up parallel workers etc. So it does not do that.

> 
>     How did you generate the data?
> 
> 
> 
> I used generate_series() to create 1 million records in sequence at the 
> same time that I created the table using the following script...
> 
>     CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
>     int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
>     text_distinct;
> 

Which means that there are actually no matching rows for 'Test5000001'. 
So the database will scan the whole table anyway, in order to answer the 
EXISTS condition. The estimate of 5000 matching rows is a default value 
(0.5% out of 1M rows), because the value is entirely out of the data 
range covered by the histogram.

The easiest solution probably is adding an index on that column, which 
will make answering the EXISTS much faster (at least in this case).

> 
> 
>              When I ran the above query the first 5 times after starting my
>              Postgres service, I got the same results each time (around 428
>              milliseconds), but when running the query 6 or more times, the
>              execution time jumps up to almost double that.  Here are the
>              "auto_explain" results running this query a 6th time...
> 
> 
>     This is likely due to generating a generic plan after the fifth
>     execution. There seems to be only small difference in costs, though.
> 
> 
>                  --"auto_explain" results after running the same query 6
>         or more
>                  times.
>                  2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
>         761.847 ms         plan:
>                    Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>                  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>                    Result  (cost=4.58..4.59 rows=1 width=1) (actual
>                  time=761.843..761.843 rows=1 loops=1)
>                      Buffers: shared hit=160 read=5246
>                      InitPlan 1 (returns $0)
>                     ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00
>         rows=5000
>                  width=0) (actual time=761.841..761.841 rows=0 loops=1)
>                        Filter: (lower(text_distinct) = lower($1))
>                        Rows Removed by Filter: 1000000
>                        Buffers: shared hit=160 read=5246
>                  2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
>                  "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>                  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>                    PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
>         at IF
>                  2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
>         762.156 ms         plan:
>                    Query Text: explain (analyze, buffers) select * from
>                  zz_spx_ifexists_noidx('Test5000001')
>                    Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>                  rows=1 width=32) (actual time=762.154..762.155 rows=1
>         loops=1)
>                      Buffers: shared hit=160 read=5246
> 
>              As you can see, the execution time jumps up to about 762
>              milliseonds.  I can see in the sequence scan node that the
>         LOWER()
>              function shows up on the right side of the equal operator,
>         whereas
>              in the first 5 runs of this test query the plan did not
>         show this.     Why is this?
> 
> 
>     It doesn't really matter on which side it shows, it's more about a
>     generic plan built without knowledge of the parameter value.
> 
> 
> 
> Right.  I was more wondering why it switched over to a generic plan, as 
> you've stated, like clockwork starting with the 6th execution run.
> 

That's a hard-coded value. The first 5 executions are re-planned using 
the actual parameter values, and then we try generating a generic plan 
and see if it's cheaper than the non-generic one. You can disable that, 
though.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On Tue, Apr 17, 2018 at 10:23 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 04/17/2018 04:01 PM, Hackety Man wrote:


On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:



    On 04/16/2018 10:42 PM, Hackety Man wrote:

        ...
             The first thing I did was to run some baseline tests using
        the basic
             queries inside of the IF() checks found in each of the
        functions to
             see how the query planner handled them.  I ran the
        following two
             queries.

                 EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM
        zz_noidx1 WHERE
                 LOWER(text_distinct) = LOWER('Test5000001');
                 EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(text_distinct) = LOWER('Test5000001');


    Those are not the interesting plans, though. The EXISTS only cares
    about the first row, so you should be looking at

         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
         LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;



Okay.  I tested this query and it did return an execution time on par with my tests of the "zz_spx_ifexists_noidx" function.
*
*




             I moved on to test the other function with the following
        query...

                 EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
                 zz_spx_ifcount_noidx('Test5000001');

             and I got the following "auto_explain" results...

                 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
        426.279 ms         plan:
                   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
                   Result  (cost=4.08..4.09 rows=1 width=1) (actual
                 time=426.274..426.274 rows=1 loops=1)
                     Buffers: shared read=5406
                     InitPlan 1 (returns $0)
                    ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00
        rows=5000
                 width=0) (actual time=426.273..426.273 rows=0 loops=1)
                       Filter: (lower(text_distinct) = 'test5000001'::text)
                       Rows Removed by Filter: 1000000
                       Buffers: shared read=5406
                 2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
                 "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
                   PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
        at IF
                 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
        428.077 ms         plan:
                   Query Text: explain (analyze, buffers) select * from
                 zz_spx_ifexists_noidx('Test5000001')
                   Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
                 rows=1 width=32) (actual time=428.076..428.076 rows=1
        loops=1)
                     Buffers: shared hit=30 read=5438

             Definitely not the execution time, or query plan, results I was
             expecting.  As we can see, no workers were employed here and my
             guess was that this was the reason or the large execution time
             difference between these 2 tests?  199 milliseconds versus 428
             milliseconds, which is a big difference.  Why are workers
        not being
             employed here like they were when I tested the query found
        inside of
             the IF() check in a standalone manner?  But then I ran
        another test
             and the results made even less sense to me.


    The plan difference is due to not realizing the EXISTS essentially
    implies LIMIT 1. Secondly, it expects about 5000 rows matching the
    condition,  uniformly spread through the table. But it apparently
    takes much longer to find the first one, hence the increased duration.



Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even though it doesn't show in the query plan, correct? Is it not possible for parallel scans to be implemented while applying an implicit, or explicit, LIMIT 1?
**//___^


It doesn't add a limit node to the plan, but it behaves similarly to that. The database only needs to fetch the first row to answer the EXISTS predicate.

I don't think this is inherently incompatible with parallel plans, but the planner simply thinks it's going to bee very cheap - cheaper than setting up parallel workers etc. So it does not do that.


Understood.  Any chance of the planner possibly being enhanced in the future to come to a better conclusion as to whether, or not, a parallel scan implementation would be a better choice during EXISTS condition checks?  :-)

 



    How did you generate the data?



I used generate_series() to create 1 million records in sequence at the same time that I created the table using the following script...

    CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
    int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
    text_distinct;


Which means that there are actually no matching rows for 'Test5000001'. So the database will scan the whole table anyway, in order to answer the EXISTS condition. The estimate of 5000 matching rows is a default value (0.5% out of 1M rows), because the value is entirely out of the data range covered by the histogram.

The easiest solution probably is adding an index on that column, which will make answering the EXISTS much faster (at least in this case).


Yes.  I did test that scenario, as well.  Adding an index does put the EXISTS condition check on par with the IF(SELECT COUNT(*) FROM...) condition check.  The one scenario where the EXISTS condition check dominated over the IF(SELECT COUNT(*) FROM...) condition check was when no index was used and a matching string *was* found, as opposed to this test particular test where we're looking for a string that will *not* be found.  I just wanted to test all possible scenarios.

 




             When I ran the above query the first 5 times after starting my
             Postgres service, I got the same results each time (around 428
             milliseconds), but when running the query 6 or more times, the
             execution time jumps up to almost double that.  Here are the
             "auto_explain" results running this query a 6th time...


    This is likely due to generating a generic plan after the fifth
    execution. There seems to be only small difference in costs, though.


                 --"auto_explain" results after running the same query 6
        or more
                 times.
                 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
        761.847 ms         plan:
                   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
                   Result  (cost=4.58..4.59 rows=1 width=1) (actual
                 time=761.843..761.843 rows=1 loops=1)
                     Buffers: shared hit=160 read=5246
                     InitPlan 1 (returns $0)
                    ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00
        rows=5000
                 width=0) (actual time=761.841..761.841 rows=0 loops=1)
                       Filter: (lower(text_distinct) = lower($1))
                       Rows Removed by Filter: 1000000
                       Buffers: shared hit=160 read=5246
                 2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
                 "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
                   PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
        at IF
                 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
        762.156 ms         plan:
                   Query Text: explain (analyze, buffers) select * from
                 zz_spx_ifexists_noidx('Test5000001')
                   Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
                 rows=1 width=32) (actual time=762.154..762.155 rows=1
        loops=1)
                     Buffers: shared hit=160 read=5246

             As you can see, the execution time jumps up to about 762
             milliseonds.  I can see in the sequence scan node that the
        LOWER()
             function shows up on the right side of the equal operator,
        whereas
             in the first 5 runs of this test query the plan did not
        show this.     Why is this?


    It doesn't really matter on which side it shows, it's more about a
    generic plan built without knowledge of the parameter value.



Right.  I was more wondering why it switched over to a generic plan, as you've stated, like clockwork starting with the 6th execution run.


That's a hard-coded value. The first 5 executions are re-planned using the actual parameter values, and then we try generating a generic plan and see if it's cheaper than the non-generic one. You can disable that, though.


So on that note, in the planner's eyes, starting with the 6th execution, it looks like the planner still thinks that the generic plan will perform better than the non-generic one, which is why it keeps using the generic plan from that point forward?

Similar to the parallel scans, any chance of the planner possibly being enhanced in the future to come to a better conclusion as to whether, or not, the generic plan will perform better than the non-generic plan?  :-)

 


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Thanks for all the help!  I really appreciate it!

Ryan 



2018-04-17 12:52 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:


On 04/17/2018 07:17 AM, Pavel Stehule wrote:
Hi

2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman@gmail.com <mailto:hacketyman@gmail.com>>:

...
>
A support of parallel query execution is not complete -  it doesn't work in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be - the worst case of EXISTS is same like COUNT), then due disabled parallel execution the COUNT(*) is faster now. It is unfortunate, because I believe so this issue will be fixed in few years.


None of the issues seems to be particularly related to parallel query. It's much more likely a general issue with planning EXISTS / LIMIT and non-uniform data distribution.

I was wrong EXISTS are not supported. It looks like new dimension of performance issues related to parallelism. I understand so this example is worst case.

postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
                                                      QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------------
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=423.600..423.600 rows=1 loops=1)
   Buffers: shared hit=3296 read=2110
   InitPlan 1 (returns $0)
     ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0) (actual time=423.595..423.595 rows=0 loops=1)
           Filter: (lower(text_distinct) = 'test5000001'::text)
           Rows Removed by Filter: 1000000
           Buffers: shared hit=3296 read=2110
 Planning Time: 0.133 ms
 Execution Time: 423.633 ms

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
                                                               QUERY PLAN                                                               
-----------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual time=246.662..246.662 rows=1 loops=1)
   Buffers: shared hit=817 read=549
   ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual time=246.642..246.656 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=817 read=549
         ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual time=242.168..242.169 rows=1 loops=3)
               Buffers: shared hit=3360 read=2046
               ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=242.165..242.165 rows=0 loops=3)
                     Filter: (lower(text_distinct) = 'test5000001'::text)
                     Rows Removed by Filter: 333333
                     Buffers: shared hit=3360 read=2046
 Planning Time: 0.222 ms
 Execution Time: 247.927 ms

The cost of EXISTS is too low to use parallelism, and value is found too late.

When I decrease startup cost to 0 of parallel exec I got similar plan, similar time

postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
                                                             QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.43..2.44 rows=1 width=1) (actual time=246.398..246.402 rows=1 loops=1)
   Buffers: shared hit=885 read=489
   InitPlan 1 (returns $1)
     ->  Gather  (cost=0.00..12156.00 rows=5000 width=0) (actual time=246.393..246.393 rows=0 loops=1)
           Workers Planned: 2
           Workers Launched: 2
           Buffers: shared hit=885 read=489
           ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083 width=0) (actual time=241.067..241.067 rows=0 loops=3)
                 Filter: (lower(text_distinct) = 'test5000001'::text)
                 Rows Removed by Filter: 333333
                 Buffers: shared hit=3552 read=1854
 Planning Time: 0.138 ms
 Execution Time: 247.623 ms
(13 rows)

From this perspective it looks so cost of EXISTS(subselect) is maybe too low.

Regards

Pavel








regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services






Right.  I was more wondering why it switched over to a generic plan, as you've stated, like clockwork starting with the 6th execution run.


That's a hard-coded value. The first 5 executions are re-planned using the actual parameter values, and then we try generating a generic plan and see if it's cheaper than the non-generic one. You can disable that, though.


So on that note, in the planner's eyes, starting with the 6th execution, it looks like the planner still thinks that the generic plan will perform better than the non-generic one, which is why it keeps using the generic plan from that point forward?

Similar to the parallel scans, any chance of the planner possibly being enhanced in the future to come to a better conclusion as to whether, or not, the generic plan will perform better than the non-generic plan?  :-)

all is based on estimations, and when estimations are not correct, then .. The current solution is fart to perfect, but nobody goes with better ideas :( Statistic based planners is best available technology, unfortunately with lot of gaps.

There are not any statistic where any tuple is in database, so a precious estimation of EXISTS is hard (impossible). Similar issue is with LIMIT. It can be nice, but I don't expect any significant changes in this area - maybe some tuning step by step of some parameters.

Regards

Pavel
 

 


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Thanks for all the help!  I really appreciate it!

Ryan 


On 04/17/2018 05:43 PM, Hackety Man wrote:
> 
> 
> On Tue, Apr 17, 2018 at 10:23 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
> 
>     On 04/17/2018 04:01 PM, Hackety Man wrote:
> 
>  ...
>         Right.  I was more wondering why it switched over to a generic
>         plan, as you've stated, like clockwork starting with the 6th
>         execution run.
> 
> 
>     That's a hard-coded value. The first 5 executions are re-planned
>     using the actual parameter values, and then we try generating a
>     generic plan and see if it's cheaper than the non-generic one. You
>     can disable that, though.
> 
> 
> 
> So on that note, in the planner's eyes, starting with the 6th execution,
> it looks like the planner still thinks that the generic plan will
> perform better than the non-generic one, which is why it keeps using the
> generic plan from that point forward?
> 

Yes. The point of prepared statements (which also applies to plpgsql, as
it uses prepared statements automatically) is to eliminate the planning
overhead. So we try planning it with actual parameter values for the
first 5 plans, and then compare it to the generic plan.

> Similar to the parallel scans, any chance of the planner possibly being
> enhanced in the future to come to a better conclusion as to whether, or
> not, the generic plan will perform better than the non-generic plan?  :-)

There's always hope, but it's hard to say if/when an enhancement will
happen, unfortunately.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 04/17/2018 04:05 PM, Hackety Man wrote:
> Hi Pavel,
> 
> Thanks for sharing that information.  I was not aware that the parallel
> query functionality was not yet fully implemented.
> 

Nothing is ever "fully implemented". There are always gaps and possible
improvements ;-)

That being said, parallelism opens an entirely new dimension of possible
plans and planning issues.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Citext Performance

От
Deepak Somaiya
Дата:
Apology for sending you emails directly but I do see you guys responding on email related to performance so thought of copying you folks.

Folks,
 I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.




"citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."


Here is what I have done 

drop table test;
drop table testci;

CREATE TABLE test (
id INTEGER PRIMARY KEY,
name character varying(254)
);
CREATE TABLE testci (
id INTEGER PRIMARY KEY,
name citext

);

INSERT INTO test(id, name)
SELECT generate_series(1000001,2000000), (md5(random()::text));

INSERT INTO testci(id, name)
SELECT generate_series(1,1000000), (md5(random()::text));


Now, I have done sequential search

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan: 
    Node Type: "Seq Scan"
    Parallel Aware: false
    Relation Name: "test"
    Alias: "test"
    Startup Cost: 0.00
    Total Cost: 23334.00
    Plan Rows: 5000
    Plan Width: 37
    Actual Startup Time: 0.016
    Actual Total Time: 680.199
    Actual Rows: 1
    Actual Loops: 1
    Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
    Rows Removed by Filter: 999999
  Planning Time: 0.045
  Triggers: 
  Execution Time: 680.213


explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan: 
    Node Type: "Seq Scan"
    Parallel Aware: false
    Relation Name: "testci"
    Alias: "testci"
    Startup Cost: 0.00
    Total Cost: 20834.00
    Plan Rows: 1
    Plan Width: 37
    Actual Startup Time: 0.017
    Actual Total Time: 1184.485
    Actual Rows: 1
    Actual Loops: 1
    Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
    Rows Removed by Filter: 999999
  Planning Time: 0.029
  Triggers: 
  Execution Time: 1184.496



You can see sequential searches with lower working twice as fast as citext.

Now I added index on citext and equivalent functional index (lower) on text.


CREATE INDEX textlowerindex ON test (lower(name));
create index textindex on test(name);


Index creation took longer with citext v/s creating lower functional index.


Now here comes execution with indexes

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan: 
    Node Type: "Bitmap Heap Scan"
    Parallel Aware: false
    Relation Name: "test"
    Alias: "test"
    Startup Cost: 187.18
    Total Cost: 7809.06
    Plan Rows: 5000
    Plan Width: 37
    Actual Startup Time: 0.020
    Actual Total Time: 0.020
    Actual Rows: 1
    Actual Loops: 1
    Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
    Rows Removed by Index Recheck: 0
    Exact Heap Blocks: 1
    Lossy Heap Blocks: 0
    Plans: 
      - Node Type: "Bitmap Index Scan"
        Parent Relationship: "Outer"
        Parallel Aware: false
        Index Name: "textlowerindex"
        Startup Cost: 0.00
        Total Cost: 185.93
        Plan Rows: 5000
        Plan Width: 0
        Actual Startup Time: 0.016
        Actual Total Time: 0.016
        Actual Rows: 1
        Actual Loops: 1
        Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
  Planning Time: 0.051
  Triggers: 
  Execution Time: 0.035




explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad'; 

- Plan: 
    Node Type: "Index Scan"
    Parallel Aware: false
    Scan Direction: "Forward"
    Index Name: "citextindex"
    Relation Name: "testci"
    Alias: "testci"
    Startup Cost: 0.42
    Total Cost: 8.44
    Plan Rows: 1
    Plan Width: 37
    Actual Startup Time: 0.049
    Actual Total Time: 0.050
    Actual Rows: 1
    Actual Loops: 1
    Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
    Rows Removed by Index Recheck: 0
  Planning Time: 0.051
  Triggers: 
  Execution Time: 0.064