Обсуждение: Adding nextval() to a select caused hang/very slow execution

Поиск
Список
Период
Сортировка

Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
Hello (TL;DR):

Noob here, so please bear with me.  The SQL I'm presenting is part of a larger PL/PGSQL script that generates generic "counts" from tables in our database.   This is code converted from an Oracle database that we recently migrated from.

I have a strange situation where a base query completes in about 30 seconds but if I add a nextval() call to the select it never completes.  There are other processes running that are accessing the same sequence, but I thought that concurrency was not an issue for sequences (other than skipped values).  

We are running on Google Cloud SQL v12 (I believe it is currently 12.3).  We are configured with a failover replica.  The VM configured is 8 vCPUs and 16GB of RAM.  PgAdmin shows that our cache hit rates are around 99%, as does this SQL:

SELECT
   sum(heap_blks_read) as heap_read,
   sum(heap_blks_hit)  as heap_hit,
   sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
   pg_statio_user_tables;

 heap_read  |   heap_hit   |         ratio
------------+--------------+------------------------
 1558247211 | 156357754256 | 0.99013242992145017164
(1 row)


We run autovacuum.  work_mem = 131072.  

The base SQL without the nextval() call and plan are at:  https://explain.depesz.com/s/T3Gn

While the performance is not the fastest, 30 seconds for the execution is acceptable in our application.  We only run this once a month.  I am not looking to optimize the query as it stands (yet).  The only change that causes it to be extremely slow or hang (can't tell which) is that I changed the select from:

select unnest(....

to

select nextval('sbowner.idgen'), unnest(....

Here are all the tables/views involved, as requested in the "Slow Query Questions" FAQ.  

I am aware that the structure of these views leaves A LOT to be desired, but I don't think they have a bearing on this issue, since the addition of nextval() is the problem.    We are going to restructure all of this and remove many layers eventually.  Before subjecting the reader to this long list of views, here's the theory.  We have customers and orders for various products.  Those products are grouped together into "lists" that can be selected.  Some products could be in more than one list.  At the same time, some products are "pre-release", so they are only reported internally and omitted from these counts.  Next, some of the orders are "autoship", meaning that the customer has subscribed to receive the product automatically.  Any customer with an autoship in the same category of product is to be omitted from these counts.  

As there are many of these "lists", there was a naming scheme created in the Oracle database we converted from.  Oracle allows synonyms so that we could create one view and rename it to match the naming scheme.  PostgreSQL does not allow that, so instead we had to create views on views to keep the names intact. 

The views/tables involved are:

                     Table "lruser.count_tempcols"
  Column   |            Type             | Collation | Nullable | Default
-----------+-----------------------------+-----------+----------+---------
 typecode  | character(1)                |           |          |
 disporder | smallint                    |           |          |
 mindate   | timestamp without time zone |           |          |
 maxdate   | timestamp without time zone |           |          |
 fmtdate   | character varying(10)       |           |          |

This table holds generated date ranges for the counts to be generated by the main query.  Records are inserted once at the start of execution of the script, followed by creating an index:

create index count_tempcols_ndx on count_tempcols(mindate, maxdate, fmtdate, disporder, typecode);
analyze count_tempcols;

It is actually created as a temporary table, but that makes it hard to present here. ;-)

Here is fortherb_indcounts and the entire view chain:

fortherb_indcounts - an extract of the data that we actually generate the counts from

                                 View "lruser.fortherb_indcounts"
  Column  |            Type             | Collation | Nullable | Default | Storage  | Description
----------+-----------------------------+-----------+----------+---------+----------+-------------
 id       | bigint                      |           |          |         | plain    |
 state    | character varying(2)        |           |          |         | extended |
 zip      | character varying(6)        |           |          |         | extended |
 rtype    | bpchar                      |           |          |         | extended |
 sexcode  | character(1)                |           |          |         | extended |
 origdate | timestamp without time zone |           |          |         | plain    |
 hotline  | timestamp without time zone |           |          |         | plain    |
 numpurch | bigint                      |           |          |         | plain    |
 scf      | text                        |           |          |         | extended |
 phone    | character varying(16)       |           |          |         | extended |
 paymeth  | character varying(4)        |           |          |         | extended |
 email    | character varying(40)       |           |          |         | extended |
 itemcode | character varying(10)       |           |          |         | extended |
View definition:
 SELECT c.id,
    c.state,
    c.zip,
    c.rtype,
    c.sexcode,
    c.origdate,
    date_trunc('day'::text, t.hotlinedate) AS hotline,
    c.numpurch,
    substr(c.zip::text, 1, 3) AS scf,
    c.phone,
    t.paymeth,
    c.email,
    t.itemcode
   FROM fortherb_ind c,
    "fortherb_ind$rent$tracking" t
  WHERE c.id = t.pasid;

This next table is a list of "rentable" transactions - those transactions that we want to actually count (omitting pre-release and autoships).

                              View "lruser.fortherb_ind$rent$tracking"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------
 pasid       | bigint                      |           |          |         | plain    |
 jobid       | bigint                      |           |          |         | plain    |
 itemcode    | character varying(10)       |           |          |         | extended |
 hotlinedate | timestamp without time zone |           |          |         | plain    |
 updatedate  | timestamp without time zone |           |          |         | plain    |
 rectype     | character(1)                |           |          |         | extended |
 autoship    | character(1)                |           |          |         | extended |
 subid       | character varying(20)       |           |          |         | extended |
 amount      | numeric(10,2)               |           |          |         | main     |
 sourcecode  | character varying(20)       |           |          |         | extended |
 ordernum    | character varying(20)       |           |          |         | extended |
 paymeth     | character varying(4)        |           |          |         | extended |
View definition:
 SELECT "fortherb$rent$i_tracking".pasid,
    "fortherb$rent$i_tracking".jobid,
    "fortherb$rent$i_tracking".itemcode,
    "fortherb$rent$i_tracking".hotlinedate,
    "fortherb$rent$i_tracking".updatedate,
    "fortherb$rent$i_tracking".rectype,
    "fortherb$rent$i_tracking".autoship,
    "fortherb$rent$i_tracking".subid,
    "fortherb$rent$i_tracking".amount,
    "fortherb$rent$i_tracking".sourcecode,
    "fortherb$rent$i_tracking".ordernum,
    "fortherb$rent$i_tracking".paymeth
   FROM "fortherb$rent$i_tracking";


This is the same table as the previous, just with a different name (Oracle synonym simulation):

                               View "lruser.fortherb$rent$i_tracking"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------
 pasid       | bigint                      |           |          |         | plain    |
 jobid       | bigint                      |           |          |         | plain    |
 itemcode    | character varying(10)       |           |          |         | extended |
 hotlinedate | timestamp without time zone |           |          |         | plain    |
 updatedate  | timestamp without time zone |           |          |         | plain    |
 rectype     | character(1)                |           |          |         | extended |
 autoship    | character(1)                |           |          |         | extended |
 subid       | character varying(20)       |           |          |         | extended |
 amount      | numeric(10,2)               |           |          |         | main     |
 sourcecode  | character varying(20)       |           |          |         | extended |
 ordernum    | character varying(20)       |           |          |         | extended |
 paymeth     | character varying(4)        |           |          |         | extended |
View definition:
 SELECT i.pasid,
    i.jobid,
    i.itemcode,
    i.hotlinedate,
    i.updatedate,
    i.rectype,
    i.autoship,
    i.subid,
    i.amount,
    i.sourcecode,
    i.ordernum,
    i.paymeth
   FROM glm.glmitems i
  WHERE (i.prodtable::text = ANY (ARRAY['fortherb'::character varying::text, 'fortherb2'::character varying::text])) AND NOT (EXISTS ( SELECT NULL::text AS text
           FROM glmprods
          WHERE glmprods.prerelease IS NOT NULL AND glmprods.prerelease::text <> ''::text AND glmprods.prodcode::text = i.itemcode::text)) AND (i.rectype = ANY (ARRAY['2'::bpchar, '3'::bpchar])) AND NOT (EXISTS ( SELECT NULL::text AS text
           FROM "fortherb$rent$i_track_as" a
          WHERE a.pasid = i.pasid));


This is a view of all transactions with the "list" they belong to appended, as well as a pre-release code.

                                         View "glm.glmitems"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------
 pasid       | bigint                      |           |          |         | plain    |
 jobid       | bigint                      |           |          |         | plain    |
 itemcode    | character varying(10)       |           |          |         | extended |
 hotlinedate | timestamp without time zone |           |          |         | plain    |
 updatedate  | timestamp without time zone |           |          |         | plain    |
 rectype     | character(1)                |           |          |         | extended |
 autoship    | character(1)                |           |          |         | extended |
 subid       | character varying(20)       |           |          |         | extended |
 amount      | numeric(10,2)               |           |          |         | main     |
 sourcecode  | character varying(20)       |           |          |         | extended |
 ordernum    | character varying(20)       |           |          |         | extended |
 paymeth     | character varying(4)        |           |          |         | extended |
 itemid      | bigint                      |           |          |         | plain    |
 prodtable   | character varying           |           |          |         | extended |
 category    | character varying(50)       |           |          |         | extended |
 subcategory | character varying(15)       |           |          |         | extended |
 prerelease  | character(1)                |           |          |         | extended |
View definition:
 SELECT t.pasid,
    t.jobid,
    t.itemcode,
    t.hotlinedate,
    t.updatedate,
    t.rectype,
    t.autoship,
    t.subid,
    t.amount,
    t.sourcecode,
    t.ordernum,
    t.paymeth,
    t.itemid,
        CASE
            WHEN t.hotlinedate >= p.changedate THEN p.prodtable
            ELSE p.prodtable_old
        END AS prodtable,
    p.category,
    p.subcategory,
    p.prerelease
   FROM "glm$tracking" t
     JOIN glm.glmproducts p ON t.itemcode::text = p.prodcode::text;


This is the base table of transactions:

                        Table "lruser.glm$tracking"
   Column    |            Type             | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------
 pasid       | bigint                      |           |          |
 jobid       | bigint                      |           |          |
 itemcode    | character varying(10)       |           |          |
 hotlinedate | timestamp without time zone |           |          |
 updatedate  | timestamp without time zone |           |          |
 rectype     | character(1)                |           |          |
 autoship    | character(1)                |           |          |
 subid       | character varying(20)       |           |          |
 amount      | numeric(10,2)               |           |          |
 sourcecode  | character varying(20)       |           |          |
 ordernum    | character varying(20)       |           |          |
 paymeth     | character varying(4)        |           |          |
 itemid      | bigint                      |           |          |
Indexes:
    "glm$tracking$countndx" btree (itemcode, pasid, rectype, hotlinedate)
    "glm$tracking$ndx" btree (itemcode, hotlinedate, rectype, pasid)
    "glm$tracking$prodndx" btree (itemcode, pasid, rectype, hotlinedate)
    "glm$tracking$rent$ndx" btree (pasid, hotlinedate, itemcode) INCLUDE (rectype)
Foreign-key constraints:
    "glm$autoship$fk" FOREIGN KEY (subid) REFERENCES "glm$autoship"(subid)
    "glm$cust$fk" FOREIGN KEY (pasid) REFERENCES glm(id)
    "glm$tracking$prod$fk" FOREIGN KEY (itemcode) REFERENCES glm.glmproducts(prodcode)
Triggers:
    "glm$tracking$itemid" BEFORE INSERT OR UPDATE ON "glm$tracking" FOR EACH ROW EXECUTE FUNCTION "trigger_fct_glm$tracking$itemid"()


This is an old version of our table of products sold - it should be replaced by our newer "glmproducts" table.

                                           Table "lruser.glmprods"  (OLD VERSION OF GLMPRODUCTS - SHOULD BE REMOVED!)
   Column   |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description
------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 prodcode   | character varying(10) |           | not null |         | extended |              |
 prodtable  | character varying(30) |           |          |         | extended |              |
 prerelease | character(1)          |           |          |         | extended |              |
 category   | character varying(50) |           |          |         | extended |              |
Indexes:
    "glmprods$pk" PRIMARY KEY, btree (prodcode)
Access method: heap


This is a view that identifies customers with autoships and the category the product belongs to.  Note that subscriptions can end, hence the canceldate.  We only want to omit customers with active subscription in products of the same category.

                                  View "lruser.category_autoship"
   Column   |            Type             | Collation | Nullable | Default | Storage  | Description
------------+-----------------------------+-----------+----------+---------+----------+-------------
 category   | character varying(50)       |           |          |         | extended |
 prodtable  | character varying(30)       |           |          |         | extended |
 subid      | character varying(20)       |           |          |         | extended |
 pasid      | bigint                      |           |          |         | plain    |
 jobid      | bigint                      |           |          |         | plain    |
 updatedate | timestamp without time zone |           |          |         | plain    |
 startdate  | timestamp without time zone |           |          |         | plain    |
 canceldate | timestamp without time zone |           |          |         | plain    |
 itemcode   | character varying(10)       |           |          |         | extended |
View definition:
 SELECT p.category,
    p.prodtable,
    a.subid,
    a.pasid,
    a.jobid,
    a.updatedate,
    a.startdate,
    a.canceldate,
    a.itemcode
   FROM "glm$autoship" a,
    glmprods p
  WHERE a.itemcode::text = p.prodcode::text;


This is the base table of the actual autoship subscriptions.

                        Table "lruser.glm$autoship"
   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 subid      | character varying(20)       |           | not null |
 pasid      | bigint                      |           |          |
 jobid      | bigint                      |           |          |
 updatedate | timestamp without time zone |           |          |
 startdate  | timestamp without time zone |           |          |
 canceldate | timestamp without time zone |           |          |
 itemcode   | character varying(10)       |           |          |
Indexes:
    "glm$autoship$pk" PRIMARY KEY, btree (subid)
    "glm$autoship$catndx" btree (pasid, itemcode, canceldate)
    "glm$autoship$prodndx" btree (itemcode, pasid, canceldate)
Foreign-key constraints:
    "glm$autoship$prodfk" FOREIGN KEY (itemcode) REFERENCES glm.glmproducts(prodcode)
Referenced by:
    TABLE ""glm$tracking"" CONSTRAINT "glm$autoship$fk" FOREIGN KEY (subid) REFERENCES "glm$autoship"(subid)

This is the new table of products.  

                           Table "glm.glmproducts"
    Column     |            Type             | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------
 id            | bigint                      |           |          |
 prodcode      | character varying(8)        |           | not null |
 prodtable     | character varying(20)       |           | not null |
 category      | character varying(50)       |           |          |
 prodtable_old | character varying(30)       |           |          |
 category_old  | character varying(50)       |           |          |
 prodname      | character varying(30)       |           |          |
 broker        | character varying(20)       |           |          |
 prerelease    | character(1)                |           |          |
 exclude       | character(1)                |           |          |
 changedate    | timestamp without time zone |           |          |
 subcategory   | character varying(15)       |           |          |
 changedate_ih | timestamp without time zone |           |          |
Indexes:
    "glmproducts$pk" PRIMARY KEY, btree (prodcode)
Referenced by:
    TABLE ""glm$autoship"" CONSTRAINT "glm$autoship$prodfk" FOREIGN KEY (itemcode) REFERENCES glm.glmproducts(prodcode)
    TABLE ""glm$tracking"" CONSTRAINT "glm$tracking$prod$fk" FOREIGN KEY (itemcode) REFERENCES glm.glmproducts(prodcode)
Triggers:
    "aur$glmproducts" AFTER UPDATE ON glm.glmproducts FOR EACH ROW EXECUTE FUNCTION glm."trigger_fct_aur$glmproducts"()
    "bdr$glmproducts" BEFORE DELETE ON glm.glmproducts FOR EACH ROW EXECUTE FUNCTION glm."trigger_fct_bdr$glmproducts"()
    "biur$glmproducts" BEFORE INSERT OR UPDATE ON glm.glmproducts FOR EACH ROW EXECUTE FUNCTION glm."trigger_fct_biur$glmproducts"()


I believe that's the entire list of tables/views involved.  Again, my apologies for the long post.  I presented all of this for completeness, although I don't believe it has anything to do with the actual problem. :-/

Thanks in advance for any advice!

   Eric Raskin


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

RE: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
SORRY!  Here's a link that should show the plan:



--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Tom Lane
Дата:
Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes.  There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock?  (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much.  But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization.  Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

            regards, tom lane



Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
Thanks for the reply.  I see that the explain.depesz.com did not show you the query.  My apologies:

select unnest(array[273941676,273941677,273941678,273941679,273941680]) countrow_id, 
       disporder, fmtdate, typecode, 
       unnest(array[count_273941676,count_273941677,count_273941678,count_273941679,count_273941680]) countval
 from (select coalesce(count(distinct id_273941676),0) count_273941676, 
              coalesce(count(distinct id_273941677),0) count_273941677, 
              coalesce(count(distinct id_273941678),0) count_273941678, 
              coalesce(count(distinct id_273941679),0) count_273941679, 
              coalesce(count(distinct id_273941680),0) count_273941680, 
                         disporder, fmtdate, typecode 
        from (select case when sexcode = 'M' then id else null end id_273941676,
                     case when sexcode = 'F' then id else null end id_273941677, 
                     case when sexcode = 'A' then id else null end id_273941678, 
                     case when sexcode = 'C' then id else null end id_273941679, 
                     case when sexcode not in ('M','F','A','C') then id else null end id_273941680, 
                     hotline cnt_hotline 
               from lruser.fortherb_indcounts
               where ( (rtype = '2') 
                and ((sexcode = 'M') or (sexcode = 'F') or (sexcode = 'A') or (sexcode = 'C') or (sexcode not in ('M','F','A','C'))
               )
            )
      ) as x
right outer join count_tempcols t on (x.cnt_hotline between t.mindate and t.maxdate) group by disporder, fmtdate, typecode ) as y

I know it seems overly complicated, but it is auto-generated by our code.  The conditions and fields are variable based on what the user wants to generate.  

This is the topmost select.  The only difference that causes the hang is adding nextval('sbowner.idgen') to the start of the select right before the first unnest().

In the real application, this code feeds an insert statement with a trigger that accesses the sequence where we store the results of the query.  I "simplified" it and discovered that the nextval() was the difference that caused the performance hit.

   Eric


On Wed, Nov 4, 2020 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes.  There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock?  (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much.  But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization.  Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

                        regards, tom lane


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
And, to follow up on your question, the plan shape DOES change when I add/remove the nextval() on a plain explain. 

Without nextval():  https://explain.depesz.com/s/SCdY





On Wed, Nov 4, 2020 at 1:22 PM Eric Raskin <eraskin@paslists.com> wrote:
Thanks for the reply.  I see that the explain.depesz.com did not show you the query.  My apologies:

select unnest(array[273941676,273941677,273941678,273941679,273941680]) countrow_id, 
       disporder, fmtdate, typecode, 
       unnest(array[count_273941676,count_273941677,count_273941678,count_273941679,count_273941680]) countval
 from (select coalesce(count(distinct id_273941676),0) count_273941676, 
              coalesce(count(distinct id_273941677),0) count_273941677, 
              coalesce(count(distinct id_273941678),0) count_273941678, 
              coalesce(count(distinct id_273941679),0) count_273941679, 
              coalesce(count(distinct id_273941680),0) count_273941680, 
                         disporder, fmtdate, typecode 
        from (select case when sexcode = 'M' then id else null end id_273941676,
                     case when sexcode = 'F' then id else null end id_273941677, 
                     case when sexcode = 'A' then id else null end id_273941678, 
                     case when sexcode = 'C' then id else null end id_273941679, 
                     case when sexcode not in ('M','F','A','C') then id else null end id_273941680, 
                     hotline cnt_hotline 
               from lruser.fortherb_indcounts
               where ( (rtype = '2') 
                and ((sexcode = 'M') or (sexcode = 'F') or (sexcode = 'A') or (sexcode = 'C') or (sexcode not in ('M','F','A','C'))
               )
            )
      ) as x
right outer join count_tempcols t on (x.cnt_hotline between t.mindate and t.maxdate) group by disporder, fmtdate, typecode ) as y

I know it seems overly complicated, but it is auto-generated by our code.  The conditions and fields are variable based on what the user wants to generate.  

This is the topmost select.  The only difference that causes the hang is adding nextval('sbowner.idgen') to the start of the select right before the first unnest().

In the real application, this code feeds an insert statement with a trigger that accesses the sequence where we store the results of the query.  I "simplified" it and discovered that the nextval() was the difference that caused the performance hit.

   Eric


On Wed, Nov 4, 2020 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes.  There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock?  (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much.  But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization.  Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

                        regards, tom lane


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com



--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Tom Lane
Дата:
Eric Raskin <eraskin@paslists.com> writes:
> And, to follow up on your question, the plan shape DOES change when I
> add/remove the nextval() on a plain explain.
> Without nextval():  https://explain.depesz.com/s/SCdY
> With nextval():  https://explain.depesz.com/s/oLPn

Ah, there's your problem, I think: the plan without nextval() is
parallelized while the plan with nextval() is not, because nextval() is
marked as parallel-unsafe.  It's not immediately clear why that would
result in more than about a 4X speed difference, given that the parallel
plan is using 4 workers.  But some of the rowcount estimates seem fairly
far off, so I'm betting that the planner is just accidentally lighting on
a decent plan when it's using parallelism while making some poor choices
when it isn't.

The reason for the original form of your problem is likely that we don't
use parallelism at all in non-SELECT queries, so you ended up with a bad
plan even though the nextval() was hidden in a trigger.

What you need to do is get the rowcount estimates nearer to reality
--- those places where you've got estimated rowcount 1 while reality
is tens or hundreds of thousands of rows are just disasters waiting
to bite.  I suspect most of the problem is join conditions like

Join Filter: (CASE WHEN (c.rtype = ANY ('{0,1,7,9}'::bpchar[])) THEN c.rtype ELSE x.rtype END = '2'::bpchar)

The planner just isn't going to have any credible idea how selective
that is.  I wonder to what extent you could fix this by storing
generated columns that represent the derived conditions you want to
filter on.

            regards, tom lane



Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
OK - I see.  And to add insult to injury, I tried creating a temporary table to store the intermediate results.  Then I was going to just do an insert... select... to insert the rows.   That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great.  But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again.  And, of course, when I remove the insert, it's fine.  

And, of course, your explanation that inserts will not be parallelized must be the reason.  I will certainly re-vacuum the tables.  I wonder why auto-vacuum didn't collect better stats.  vacuum  analyze <table> is all I need, right?

As a last resort, what about a PL/PGSQL procedure loop on the query result?  Since the insert is very few rows relative to the work the select has to do, I could just turn the insert.. select.. into a for loop.  Then the select could be parallel?

What do you think?


On Wed, Nov 4, 2020 at 2:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> And, to follow up on your question, the plan shape DOES change when I
> add/remove the nextval() on a plain explain.
> Without nextval():  https://explain.depesz.com/s/SCdY
> With nextval():  https://explain.depesz.com/s/oLPn

Ah, there's your problem, I think: the plan without nextval() is
parallelized while the plan with nextval() is not, because nextval() is
marked as parallel-unsafe.  It's not immediately clear why that would
result in more than about a 4X speed difference, given that the parallel
plan is using 4 workers.  But some of the rowcount estimates seem fairly
far off, so I'm betting that the planner is just accidentally lighting on
a decent plan when it's using parallelism while making some poor choices
when it isn't.

The reason for the original form of your problem is likely that we don't
use parallelism at all in non-SELECT queries, so you ended up with a bad
plan even though the nextval() was hidden in a trigger.

What you need to do is get the rowcount estimates nearer to reality
--- those places where you've got estimated rowcount 1 while reality
is tens or hundreds of thousands of rows are just disasters waiting
to bite.  I suspect most of the problem is join conditions like

Join Filter: (CASE WHEN (c.rtype = ANY ('{0,1,7,9}'::bpchar[])) THEN c.rtype ELSE x.rtype END = '2'::bpchar)

The planner just isn't going to have any credible idea how selective
that is.  I wonder to what extent you could fix this by storing
generated columns that represent the derived conditions you want to
filter on.

                        regards, tom lane


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Tom Lane
Дата:
Eric Raskin <eraskin@paslists.com> writes:
> And, of course, your explanation that inserts will not be parallelized must
> be the reason.  I will certainly re-vacuum the tables.  I wonder why
> auto-vacuum didn't collect better stats.  vacuum  analyze <table> is all I
> need, right?

Plain ANALYZE is enough to collect stats; but I doubt that'll improve
matters for you.  The problem is basically that the planner can't do
anything with a CASE construct, so you end up with default selectivity
estimates for anything involving a CASE, statistics or no statistics.
You need to try to reformulate the query with simpler join conditions.

> As a last resort, what about a PL/PGSQL procedure loop on the query
> result?  Since the insert is very few rows relative to the work the select
> has to do, I could just turn the insert.. select.. into a for loop.  Then
> the select could be parallel?

Maybe, but you're still skating on a cliff edge.  I think it's pure chance
that the parallelized query is working acceptably well; next month with
slightly different conditions, it might not.

            regards, tom lane



Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
OK -- got it.  Thanks very much for your help.  I'll see what I can do to denormalize the case statements into actual columns to support the queries.

On Wed, Nov 4, 2020 at 2:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> And, of course, your explanation that inserts will not be parallelized must
> be the reason.  I will certainly re-vacuum the tables.  I wonder why
> auto-vacuum didn't collect better stats.  vacuum  analyze <table> is all I
> need, right?

Plain ANALYZE is enough to collect stats; but I doubt that'll improve
matters for you.  The problem is basically that the planner can't do
anything with a CASE construct, so you end up with default selectivity
estimates for anything involving a CASE, statistics or no statistics.
You need to try to reformulate the query with simpler join conditions.

> As a last resort, what about a PL/PGSQL procedure loop on the query
> result?  Since the insert is very few rows relative to the work the select
> has to do, I could just turn the insert.. select.. into a for loop.  Then
> the select could be parallel?

Maybe, but you're still skating on a cliff edge.  I think it's pure chance
that the parallelized query is working acceptably well; next month with
slightly different conditions, it might not.

                        regards, tom lane


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Tom Lane
Дата:
... btw, it occurs to me that at least as a stopgap,
"set enable_nestloop = off" would be worth trying.
The killer problem with rowcount-1 estimates is that they
encourage the planner to use nestloops when it shouldn't.

            regards, tom lane



Re: Adding nextval() to a select caused hang/very slow execution

От
Michael Lewis
Дата:
On Wed, Nov 4, 2020 at 12:12 PM Eric Raskin <eraskin@paslists.com> wrote:
OK - I see.  And to add insult to injury, I tried creating a temporary table to store the intermediate results.  Then I was going to just do an insert... select... to insert the rows.   That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great.  But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again.  And, of course, when I remove the insert, it's fine. 

I am not entirely sure I am understanding your process properly, but just a note- If you are getting acceptable results creating the temp table, and the issue is just that you get very bad plans when using it in some query that follows, then it is worth noting that autovacuum does nothing on temp tables and for me it is nearly always worth the small cost to perform an analyze (at least on key fields) after creating a temp table, or rather after inserting/updating/deleting records in a significant way.

Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
So, things get even weirder.   When I execute each individual select statement I am generating from a psql prompt, they all finish very quickly.  

If I execute them inside a pl/pgsql block, the second one hangs.

Is there something about execution inside a pl/pgsql block that is different from the psql command line?


On Wed, Nov 4, 2020 at 3:20 PM Michael Lewis <mlewis@entrata.com> wrote:
On Wed, Nov 4, 2020 at 12:12 PM Eric Raskin <eraskin@paslists.com> wrote:
OK - I see.  And to add insult to injury, I tried creating a temporary table to store the intermediate results.  Then I was going to just do an insert... select... to insert the rows.   That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great.  But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again.  And, of course, when I remove the insert, it's fine. 

I am not entirely sure I am understanding your process properly, but just a note- If you are getting acceptable results creating the temp table, and the issue is just that you get very bad plans when using it in some query that follows, then it is worth noting that autovacuum does nothing on temp tables and for me it is nearly always worth the small cost to perform an analyze (at least on key fields) after creating a temp table, or rather after inserting/updating/deleting records in a significant way.


--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

От
Tom Lane
Дата:
Eric Raskin <eraskin@paslists.com> writes:
> So, things get even weirder.   When I execute each individual select
> statement I am generating from a psql prompt, they all finish very
> quickly.
> If I execute them inside a pl/pgsql block, the second one hangs.
> Is there something about execution inside a pl/pgsql block that is
> different from the psql command line?

Generic vs specific plan, perhaps?  Are you passing any parameter
values in from plpgql variables?

IIRC, you could force the matter by using EXECUTE, though it's
somewhat more notationally tedious.  In late-model PG versions,
plan_cache_mode could help you too.

            regards, tom lane



Re: Adding nextval() to a select caused hang/very slow execution

От
Eric Raskin
Дата:
set enable_nestloop=off did the trick.  Execution time when down to seconds per query.

Thanks very much for your help.

On Wed, Nov 4, 2020 at 4:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> So, things get even weirder.   When I execute each individual select
> statement I am generating from a psql prompt, they all finish very
> quickly.
> If I execute them inside a pl/pgsql block, the second one hangs.
> Is there something about execution inside a pl/pgsql block that is
> different from the psql command line?

Generic vs specific plan, perhaps?  Are you passing any parameter
values in from plpgql variables?

IIRC, you could force the matter by using EXECUTE, though it's
somewhat more notationally tedious.  In late-model PG versions,
plan_cache_mode could help you too.

                        regards, tom lane
--

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin                                                                                                      914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc.                                                                     fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310                                                                                           eraskin@paslists.com

Mt Kisco, NY 10549                                                                                              http://www.paslists.com