Обсуждение: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

Поиск
Список
Период
Сортировка

EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Gunnlaugur Thor Briem
Дата:
Hi, 

we are seeing cases of EXPLAIN INSERT INTO foo SELECT ... taking over an hour, with disk I/O utilization (percent of time device is busy) at 100% the whole time, although I/O bandwidth is not saturated. This is on PostgreSQL 9.1.13.

What could cause this? Note that there is no ANALYZE. Is it possible that the SELECT is actually executed, in planning the INSERT?

When executing the INSERT itself (not EXPLAIN) immediately afterwards, that logs a "temporary file" message, but the EXPLAIN invocation does not (though the disk I/O suggests that a large on-disk sort is occurring):

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6016.0", size 744103936
STATEMENT:  INSERT INTO f_foo SELECT
[...] 

During that actual execution, there's a lower disk I/O utilization (though a higher MB/s rate).

Charts of disk I/O utilization and rate are at http://postimg.org/image/628h6jmn3/ ... the solid 100% span is the EXPLAIN statement, ending at 6:13:30pm, followed by the INSERT statement ending at 6:32pm. Metrics are collected by New Relic; their definition of I/O utilization is at https://discuss.newrelic.com/t/disk-i-o-metrics/2900

Here's the EXPLAIN statement: 

LOG:  duration: 3928778.823 ms  statement: EXPLAIN INSERT INTO f_foo SELECT
       t_foo.fk_d1,
       t_foo.fk_d2,
       t_foo.fk_d3,
       t_foo.fk_d4,
       t_foo.fk_d5,
       t_foo.fk_d6,
       t_foo.value
FROM t_foo
WHERE NOT (EXISTS
             (SELECT *
              FROM f_foo
              WHERE f_foo.fk_d2 = t_foo.fk_d2
                AND f_foo.fk_d3 = t_foo.fk_d3
                AND f_foo.fk_d4 = t_foo.fk_d4
                AND f_foo.fk_d5 = t_foo.fk_d5
                AND f_foo.fk_d6 = t_foo.fk_d6
                AND f_foo.fk_d1 = t_foo.fk_d1))

(where t_foo is a temp table previously populated using COPY, and the NOT EXISTS subquery refers to the same table we are inserting into)

Here's the EXPLAIN output: 

Insert on f_foo  (cost=8098210.50..9354519.69 rows=1 width=16)
  ->  Merge Anti Join  (cost=8098210.50..9354519.69 rows=1 width=16)
        Merge Cond: ((t_foo.fk_d2 = public.f_foo.fk_d2) AND
                     (t_foo.fk_d3 = public.f_foo.fk_d3) AND
                     (t_foo.fk_d4 = public.f_foo.fk_d4) AND
                     (t_foo.fk_d5 = public.f_foo.fk_d5) AND
                     (t_foo.fk_d6 = public.f_foo.fk_d6) AND
                     (t_foo.fk_d1 = public.f_foo.fk_d1))
        ->  Sort  (cost=3981372.25..4052850.70 rows=28591380 width=16)
              Sort Key: t_foo.fk_d2, t_foo.fk_d3, t_foo.fk_d4, t_foo.fk_d5,
                        t_foo.fk_d6, t_foo.fk_d1
              ->  Seq Scan on t_foo  (cost=0.00..440461.80 rows=28591380
                                      width=16)
        ->  Sort  (cost=4116838.25..4188025.36 rows=28474842 width=12)
              Sort Key: public.f_foo.fk_d2, public.f_foo.fk_d3,
                        public.f_foo.fk_d4, public.f_foo.fk_d5,
                        public.f_foo.fk_d6, public.f_foo.fk_d1
              ->  Seq Scan on f_foo  (cost=0.00..591199.42 rows=28474842
                                      width=12)

The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So its long execution time is expected. But I want to understand why the EXPLAIN takes even longer.

The table looks like this: 

\d f_foo
Table "public.f_foo"
 Column |   Type   | Modifiers 
--------+----------+-----------
 fk_d1  | smallint | not null
 fk_d2  | smallint | not null
 fk_d3  | smallint | not null
 fk_d4  | smallint | not null
 fk_d5  | smallint | not null
 fk_d6  | smallint | not null
 value  | integer  | 
Indexes:
    "f_foo_pkey" PRIMARY KEY, btree (fk_d2, fk_d6, fk_d4, fk_d3, fk_d5, fk_d1) CLUSTER
    "ix_f_foo_d4" btree (fk_d4)
    "ix_f_foo_d3" btree (fk_d3)
    "ix_f_foo_d5" btree (fk_d5)
    "ix_f_foo_d6" btree (fk_d6)
Foreign-key constraints:
    "f_foo_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES d2(id) DEFERRABLE
    "f_foo_d6_fkey" FOREIGN KEY (fk_d6) REFERENCES d6(id) DEFERRABLE
    "f_foo_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES d5(id) DEFERRABLE
    "f_foo_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES d4(id) DEFERRABLE
    "f_foo_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES d3(id) DEFERRABLE

Conceivably relevant (though I don't know how): this database has a very large number of table objects (1.3 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB; the heavy EXPLAIN is only seen on INSERT into this and a couple of other tables with tens of millions of rows.

Any ideas?

Thanks, best regards, 

- Gulli

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Marc Mamin
Дата:
>Hi,
>we are seeing cases of EXPLAIN INSERT INTO foo SELECT ... taking over an hour, with disk I/O utilization (percent of time device is busy) at 100% the whole time, although I/O bandwidth is not saturated. This is on PostgreSQL 9.1.13.
>What could cause this? Note that there is no ANALYZE. Is it possible that the SELECT is actually executed, in planning the INSERT?
>When executing the INSERT itself (not EXPLAIN) immediately afterwards, that logs a "temporary file" message, but the EXPLAIN invocation does not (though the disk I/O suggests that a large on-disk sort is occurring):
>LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6016.0", size 744103936
>STATEMENT:  INSERT INTO f_foo SELECT
>[...]
>During that actual execution, there's a lower disk I/O utilization (though a higher MB/s rate).
>Charts of disk I/O utilization and rate are at http://postimg.org/image/628h6jmn3/ ... the solid 100% span is the EXPLAIN statement, ending at 6:13:30pm, followed by the INSERT statement ending at 6:32pm. Metrics are collected by New Relic; their definition of I/O utilization is at https://discuss.newrelic.com/t/disk-i-o-metrics/2900
>Here's the EXPLAIN statement:
>LOG:  duration: 3928778.823 ms  statement: EXPLAIN INSERT INTO f_foo SELECT
>       t_foo.fk_d1,
>       t_foo.fk_d2,
>       t_foo.fk_d3,
>       t_foo.fk_d4,
>       t_foo.fk_d5,
>       t_foo.fk_d6,
>       t_foo.value
>FROM t_foo
>WHERE NOT (EXISTS
>             (SELECT *
>              FROM f_foo
>              WHERE f_foo.fk_d2 = t_foo.fk_d2
>                AND f_foo.fk_d3 = t_foo.fk_d3
>                AND f_foo.fk_d4 = t_foo.fk_d4
>                AND f_foo.fk_d5 = t_foo.fk_d5
>                AND f_foo.fk_d6 = t_foo.fk_d6
>                AND f_foo.fk_d1 = t_foo.fk_d1))
>(where t_foo is a temp table previously populated using COPY, and the NOT EXISTS subquery refers to the same table we are inserting into)
>Here's the EXPLAIN output:
>Insert on f_foo  (cost=8098210.50..9354519.69 rows=1 width=16)
>  ->  Merge Anti Join  (cost=8098210.50..9354519.69 rows=1 width=16)
>        Merge Cond: ((t_foo.fk_d2 = public.f_foo.fk_d2) AND
>                     (t_foo.fk_d3 = public.f_foo.fk_d3) AND
>                     (t_foo.fk_d4 = public.f_foo.fk_d4) AND
>                     (t_foo.fk_d5 = public.f_foo.fk_d5) AND
>                     (t_foo.fk_d6 = public.f_foo.fk_d6) AND
>                     (t_foo.fk_d1 = public.f_foo.fk_d1))
>        ->  Sort  (cost=3981372.25..4052850.70 rows=28591380 width=16)
>              Sort Key: t_foo.fk_d2, t_foo.fk_d3, t_foo.fk_d4, t_foo.fk_d5,
>                        t_foo.fk_d6, t_foo.fk_d1
>              ->  Seq Scan on t_foo  (cost=0.00..440461.80 rows=28591380
>                                      width=16)
>        ->  Sort  (cost=4116838.25..4188025.36 rows=28474842 width=12)
>              Sort Key: public.f_foo.fk_d2, public.f_foo.fk_d3,
>                        public.f_foo.fk_d4, public.f_foo.fk_d5,
>                        public.f_foo.fk_d6, public.f_foo.fk_d1
>              ->  Seq Scan on f_foo  (cost=0.00..591199.42 rows=28474842
>                                      width=12)
>The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So its long execution time is expected. But I want to understand why the EXPLAIN takes even longer.
>The table looks like this:
>\d f_foo
>Table "public.f_foo"
> Column |   Type   | Modifiers
>--------+----------+-----------
> fk_d1  | smallint | not null
> fk_d2  | smallint | not null
> fk_d3  | smallint | not null
> fk_d4  | smallint | not null
> fk_d5  | smallint | not null
> fk_d6  | smallint | not null
> value  | integer  |
>Indexes:
>    "f_foo_pkey" PRIMARY KEY, btree (fk_d2, fk_d6, fk_d4, fk_d3, fk_d5, fk_d1) CLUSTER
>    "ix_f_foo_d4" btree (fk_d4)
>    "ix_f_foo_d3" btree (fk_d3)
>    "ix_f_foo_d5" btree (fk_d5)
>    "ix_f_foo_d6" btree (fk_d6)
>Foreign-key constraints:
>    "f_foo_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES d2(id) DEFERRABLE
>    "f_foo_d6_fkey" FOREIGN KEY (fk_d6) REFERENCES d6(id) DEFERRABLE
>    "f_foo_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES d5(id) DEFERRABLE
>    "f_foo_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES d4(id) DEFERRABLE
>    "f_foo_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES d3(id) DEFERRABLE
>Conceivably relevant (though I don't know how): this database has a very large number of table objects (1.3 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB; the heavy EXPLAIN is only seen on INSERT into this and a couple of other tables with tens of millions of rows.
>Any ideas?
>Thanks, best regards,
>- Gulli
>

Hi,
I've no clue for the time required by EXPLAIN
but some more information are probably relevant to find an explanation:

- postgres version
- number of rows inserted by the query
- how clean is your catalog in regard to vacuum
   ( can you run vacuum full verbose & analyze it, and then retry the analyze statement ?)
- any other process that may interfere, e.g. while locking some catalog tables ?
- statistic target ?
- is your temp table analyzed?
- any index on it ?
   
We have about 300'000 entries in our pg_class tables, and I've never seen such an issue.   

regards,
Marc Mamin

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Gunnlaugur Thor Briem
Дата:
Hi, thanks for your follow-up questions.

- postgres version is 9.1.13
- the number of rows (in this latest instance) is 28,474,842
- I've clustered and vacuum-full-ed and analyzed this table frequently, attempting to troubleshoot this. (Running vacuum full on the whole catalog seems a little excessive, and unlikely to help.)
- no other processes are likely to be interfering; nothing other than PostgreSQL runs on this machine (except for normal OS processes and New Relic server monitoring service); concurrent activity in PostgreSQL is low-level and unrelated, and this effect is observed systematically whenever this kind of operation is performed on this table
- no override for this table; the system default_statistics_target is 100 (the default)
- yes, an ANALYZE is performed on the temp table after the COPY and before the INSERT
- no index on the temp table (but I'm scanning the whole thing anyway). There are indexes on f_foo as detailed in my original post, and I expect the PK to make the WHERE NOT EXISTS filtering efficient (as it filters on exactly all columns of the PK) ... but even if it didn't, I would expect that to only slow down the actual insert execution, not the EXPLAIN.

Cheers,

Gulli


On Wed, Mar 4, 2015 at 8:10 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>Hi,
>we are seeing cases of EXPLAIN INSERT INTO foo SELECT ... taking over an hour, with disk I/O utilization (percent of time device is busy) at 100% the whole time, although I/O bandwidth is not saturated. This is on PostgreSQL 9.1.13.
>What could cause this? Note that there is no ANALYZE. Is it possible that the SELECT is actually executed, in planning the INSERT?
>When executing the INSERT itself (not EXPLAIN) immediately afterwards, that logs a "temporary file" message, but the EXPLAIN invocation does not (though the disk I/O suggests that a large on-disk sort is occurring):
>LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6016.0", size 744103936
>STATEMENT:  INSERT INTO f_foo SELECT
>[...]
>During that actual execution, there's a lower disk I/O utilization (though a higher MB/s rate).
>Charts of disk I/O utilization and rate are at http://postimg.org/image/628h6jmn3/ ... the solid 100% span is the EXPLAIN statement, ending at 6:13:30pm, followed by the INSERT statement ending at 6:32pm. Metrics are collected by New Relic; their definition of I/O utilization is at https://discuss.newrelic.com/t/disk-i-o-metrics/2900
>Here's the EXPLAIN statement:
>LOG:  duration: 3928778.823 ms  statement: EXPLAIN INSERT INTO f_foo SELECT
>       t_foo.fk_d1,
>       t_foo.fk_d2,
>       t_foo.fk_d3,
>       t_foo.fk_d4,
>       t_foo.fk_d5,
>       t_foo.fk_d6,
>       t_foo.value
>FROM t_foo
>WHERE NOT (EXISTS
>             (SELECT *
>              FROM f_foo
>              WHERE f_foo.fk_d2 = t_foo.fk_d2
>                AND f_foo.fk_d3 = t_foo.fk_d3
>                AND f_foo.fk_d4 = t_foo.fk_d4
>                AND f_foo.fk_d5 = t_foo.fk_d5
>                AND f_foo.fk_d6 = t_foo.fk_d6
>                AND f_foo.fk_d1 = t_foo.fk_d1))
>(where t_foo is a temp table previously populated using COPY, and the NOT EXISTS subquery refers to the same table we are inserting into)
>Here's the EXPLAIN output:
>Insert on f_foo  (cost=8098210.50..9354519.69 rows=1 width=16)
>  ->  Merge Anti Join  (cost=8098210.50..9354519.69 rows=1 width=16)
>        Merge Cond: ((t_foo.fk_d2 = public.f_foo.fk_d2) AND
>                     (t_foo.fk_d3 = public.f_foo.fk_d3) AND
>                     (t_foo.fk_d4 = public.f_foo.fk_d4) AND
>                     (t_foo.fk_d5 = public.f_foo.fk_d5) AND
>                     (t_foo.fk_d6 = public.f_foo.fk_d6) AND
>                     (t_foo.fk_d1 = public.f_foo.fk_d1))
>        ->  Sort  (cost=3981372.25..4052850.70 rows=28591380 width=16)
>              Sort Key: t_foo.fk_d2, t_foo.fk_d3, t_foo.fk_d4, t_foo.fk_d5,
>                        t_foo.fk_d6, t_foo.fk_d1
>              ->  Seq Scan on t_foo  (cost=0.00..440461.80 rows=28591380
>                                      width=16)
>        ->  Sort  (cost=4116838.25..4188025.36 rows=28474842 width=12)
>              Sort Key: public.f_foo.fk_d2, public.f_foo.fk_d3,
>                        public.f_foo.fk_d4, public.f_foo.fk_d5,
>                        public.f_foo.fk_d6, public.f_foo.fk_d1
>              ->  Seq Scan on f_foo  (cost=0.00..591199.42 rows=28474842
>                                      width=12)
>The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So its long execution time is expected. But I want to understand why the EXPLAIN takes even longer.
>The table looks like this:
>\d f_foo
>Table "public.f_foo"
> Column |   Type   | Modifiers
>--------+----------+-----------
> fk_d1  | smallint | not null
> fk_d2  | smallint | not null
> fk_d3  | smallint | not null
> fk_d4  | smallint | not null
> fk_d5  | smallint | not null
> fk_d6  | smallint | not null
> value  | integer  |
>Indexes:
>    "f_foo_pkey" PRIMARY KEY, btree (fk_d2, fk_d6, fk_d4, fk_d3, fk_d5, fk_d1) CLUSTER
>    "ix_f_foo_d4" btree (fk_d4)
>    "ix_f_foo_d3" btree (fk_d3)
>    "ix_f_foo_d5" btree (fk_d5)
>    "ix_f_foo_d6" btree (fk_d6)
>Foreign-key constraints:
>    "f_foo_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES d2(id) DEFERRABLE
>    "f_foo_d6_fkey" FOREIGN KEY (fk_d6) REFERENCES d6(id) DEFERRABLE
>    "f_foo_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES d5(id) DEFERRABLE
>    "f_foo_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES d4(id) DEFERRABLE
>    "f_foo_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES d3(id) DEFERRABLE
>Conceivably relevant (though I don't know how): this database has a very large number of table objects (1.3 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB; the heavy EXPLAIN is only seen on INSERT into this and a couple of other tables with tens of millions of rows.
>Any ideas?
>Thanks, best regards,
>- Gulli
>

Hi,
I've no clue for the time required by EXPLAIN
but some more information are probably relevant to find an explanation:

- postgres version
- number of rows inserted by the query
- how clean is your catalog in regard to vacuum
   ( can you run vacuum full verbose & analyze it, and then retry the analyze statement ?)
- any other process that may interfere, e.g. while locking some catalog tables ?
- statistic target ?
- is your temp table analyzed?
- any index on it ?
   
We have about 300'000 entries in our pg_class tables, and I've never seen such an issue.   

regards,
Marc Mamin


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Vladimir Sitnikov
Дата:
>What could cause this? Note that there is no ANALYZE.

Can you capture pstack and/or perf report while explain hangs?
I think it should shed light on the activity of PostgreSQL.

Vladimir


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Marc Mamin
Дата:


>Hi, thanks for your follow-up questions.
>- postgres version is 9.1.13
>- the number of rows (in this latest instance) is 28,474,842
>- I've clustered and vacuum-full-ed and analyzed this table frequently, attempting to troubleshoot this. (Running vacuum full on the whole catalog seems a little excessive, and unlikely to help.)

Hi,

I mean the pg_* tables. When working with temp objects and a high number of classes, regular vacuum may not clean them efficiently.
It is not a bad idea to run a vacuum full verbose manually on the largest of those from time to time to verify that they don't grow outer control.
And this normally requires a few seconds only.
The verbose output of vacuum full sometimes returns interesting information...
For the ANALYZE performance, I guess that these are the most relevant one:
 pg_statistic;
 pg_class;
 pg_attribute;
 pg_index;
 pg_constraint;
 
regards,

Marc Mamin



>- no other processes are likely to be interfering; nothing other than PostgreSQL runs on this machine (except for normal OS processes and New Relic server monitoring service); concurrent activity in PostgreSQL is low-level and unrelated, and this effect is observed systematically whenever this kind of operation is performed on this table
>- no override for this table; the system default_statistics_target is 100 (the default)
>- yes, an ANALYZE is performed on the temp table after the COPY and before the INSERT
>- no index on the temp table (but I'm scanning the whole thing anyway). There are indexes on f_foo as detailed in my original post, and I expect the PK to make the WHERE NOT EXISTS filtering efficient (as it filters on exactly all columns of the PK) ... but even if it didn't, I would expect that to only slow down the actual insert execution, not the EXPLAIN.
>Cheers,
>Gulli
>On Wed, Mar 4, 2015 at 8:10 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>
>    >Hi,
>    >we are seeing cases of EXPLAIN INSERT INTO foo SELECT ... taking over an hour, with disk I/O utilization (percent of time device is busy) at 100% the whole time, although I/O bandwidth is not saturated. This is on PostgreSQL 9.1.13.
>    >What could cause this? Note that there is no ANALYZE. Is it possible that the SELECT is actually executed, in planning the INSERT?
>    >When executing the INSERT itself (not EXPLAIN) immediately afterwards, that logs a "temporary file" message, but the EXPLAIN invocation does not (though the disk I/O suggests that a large on-disk sort is occurring):
>    >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6016.0", size 744103936
>    >STATEMENT:  INSERT INTO f_foo SELECT
>    >[...]
>    >During that actual execution, there's a lower disk I/O utilization (though a higher MB/s rate).
>    >Charts of disk I/O utilization and rate are at http://postimg.org/image/628h6jmn3/ ... the solid 100% span is the EXPLAIN statement, ending at 6:13:30pm, followed by the INSERT statement ending at 6:32pm. Metrics are collected by New Relic; their definition of I/O utilization is at https://discuss.newrelic.com/t/disk-i-o-metrics/2900
>    >Here's the EXPLAIN statement:
>    >LOG:  duration: 3928778.823 ms  statement: EXPLAIN INSERT INTO f_foo SELECT
>    >       t_foo.fk_d1,
>    >       t_foo.fk_d2,
>    >       t_foo.fk_d3,
>    >       t_foo.fk_d4,
>    >       t_foo.fk_d5,
>    >       t_foo.fk_d6,
>    >       t_foo.value
>    >FROM t_foo
>    >WHERE NOT (EXISTS
>    >             (SELECT *
>    >              FROM f_foo
>    >              WHERE f_foo.fk_d2 = t_foo.fk_d2
>    >                AND f_foo.fk_d3 = t_foo.fk_d3
>    >                AND f_foo.fk_d4 = t_foo.fk_d4
>    >                AND f_foo.fk_d5 = t_foo.fk_d5
>    >                AND f_foo.fk_d6 = t_foo.fk_d6
>    >                AND f_foo.fk_d1 = t_foo.fk_d1))
>    >(where t_foo is a temp table previously populated using COPY, and the NOT EXISTS subquery refers to the same table we are inserting into)
>    >Here's the EXPLAIN output:
>    >Insert on f_foo  (cost=8098210.50..9354519.69 rows=1 width=16)
>    >  ->  Merge Anti Join  (cost=8098210.50..9354519.69 rows=1 width=16)
>    >        Merge Cond: ((t_foo.fk_d2 = public.f_foo.fk_d2) AND
>    >                     (t_foo.fk_d3 = public.f_foo.fk_d3) AND
>    >                     (t_foo.fk_d4 = public.f_foo.fk_d4) AND
>    >                     (t_foo.fk_d5 = public.f_foo.fk_d5) AND
>    >                     (t_foo.fk_d6 = public.f_foo.fk_d6) AND
>    >                     (t_foo.fk_d1 = public.f_foo.fk_d1))
>    >        ->  Sort  (cost=3981372.25..4052850.70 rows=28591380 width=16)
>    >              Sort Key: t_foo.fk_d2, t_foo.fk_d3, t_foo.fk_d4, t_foo.fk_d5,
>    >                        t_foo.fk_d6, t_foo.fk_d1
>    >              ->  Seq Scan on t_foo  (cost=0.00..440461.80 rows=28591380
>    >                                      width=16)
>    >        ->  Sort  (cost=4116838.25..4188025.36 rows=28474842 width=12)
>    >              Sort Key: public.f_foo.fk_d2, public.f_foo.fk_d3,
>    >                        public.f_foo.fk_d4, public.f_foo.fk_d5,
>    >                        public.f_foo.fk_d6, public.f_foo.fk_d1
>    >              ->  Seq Scan on f_foo  (cost=0.00..591199.42 rows=28474842
>    >                                      width=12)
>    >The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So its long execution time is expected. But I want to understand why the EXPLAIN takes even longer.
>    >The table looks like this:
>    >\d f_foo
>    >Table "public.f_foo"
>    > Column |   Type   | Modifiers
>    >--------+----------+-----------
>    > fk_d1  | smallint | not null
>    > fk_d2  | smallint | not null
>    > fk_d3  | smallint | not null
>    > fk_d4  | smallint | not null
>    > fk_d5  | smallint | not null
>    > fk_d6  | smallint | not null
>    > value  | integer  |
>    >Indexes:
>    >    "f_foo_pkey" PRIMARY KEY, btree (fk_d2, fk_d6, fk_d4, fk_d3, fk_d5, fk_d1) CLUSTER
>    >    "ix_f_foo_d4" btree (fk_d4)
>    >    "ix_f_foo_d3" btree (fk_d3)
>    >    "ix_f_foo_d5" btree (fk_d5)
>    >    "ix_f_foo_d6" btree (fk_d6)
>    >Foreign-key constraints:
>    >    "f_foo_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES d2(id) DEFERRABLE
>    >    "f_foo_d6_fkey" FOREIGN KEY (fk_d6) REFERENCES d6(id) DEFERRABLE
>    >    "f_foo_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES d5(id) DEFERRABLE
>    >    "f_foo_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES d4(id) DEFERRABLE
>    >    "f_foo_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES d3(id) DEFERRABLE
>    >Conceivably relevant (though I don't know how): this database has a very large number of table objects (1.3 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB; the heavy EXPLAIN is only seen on INSERT into this and a couple of other tables with tens of millions of rows.
>    >Any ideas?
>    >Thanks, best regards,
>    >- Gulli
>    >
>    Hi,
>    I've no clue for the time required by EXPLAIN
>    but some more information are probably relevant to find an explanation:
>
>    - postgres version
>    - number of rows inserted by the query
>    - how clean is your catalog in regard to vacuum
>       ( can you run vacuum full verbose & analyze it, and then retry the analyze statement ?)
>    - any other process that may interfere, e.g. while locking some catalog tables ?
>    - statistic target ?
>    - is your temp table analyzed?
>    - any index on it ?
>       
>    We have about 300'000 entries in our pg_class tables, and I've never seen such an issue.   
>
>    regards,
>    Marc Mamin
>
>

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tomas Vondra
Дата:
Hi,

On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote:
> Hi, thanks for your follow-up questions.
>
> - postgres version is 9.1.13
> - the number of rows (in this latest instance) is 28,474,842
> - I've clustered and vacuum-full-ed and analyzed this table frequently,
> attempting to troubleshoot this. (Running vacuum full on the whole
> catalog seems a little excessive, and unlikely to help.)
> - no other processes are likely to be interfering; nothing other than
> PostgreSQL runs on this machine (except for normal OS processes and New
> Relic server monitoring service); concurrent activity in PostgreSQL is
> low-level and unrelated, and this effect is observed systematically
> whenever this kind of operation is performed on this table
> - no override for this table; the system default_statistics_target is
> 100 (the default)
> - yes, an ANALYZE is performed on the temp table after the COPY and
> before the INSERT
> - no index on the temp table (but I'm scanning the whole thing anyway).
> There are indexes on f_foo as detailed in my original post, and I expect
> the PK to make the WHERE NOT EXISTS filtering efficient (as it filters
> on exactly all columns of the PK) ... but even if it didn't, I would
> expect that to only slow down the actual insert execution, not the EXPLAIN.

The only thing I can think of is some sort of memory exhaustion,
resulting in swapping out large amounts of memory. That'd explain the
I/O load. Can you run something like vmstat to see if this really is swap?

The fact that plain INSERT does not do that contradicts that, as it
should be able to plan either both queries (plain and EXPLAIN), or none
of them.

Can you prepare a self-contained test case? I.e. a script that
demonstrates the issue? I tried to reproduce the issue using the
information provided so far, but unsuccessfully :-(

Even if you could reproduce the problem on another machine (because of
keeping the data internal) on a server with debug symbols and see where
most of the time is spent (e.g. using 'perf top'), that'd be useful.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote:
>> - postgres version is 9.1.13

> The only thing I can think of is some sort of memory exhaustion,
> resulting in swapping out large amounts of memory.

I'm wondering about the issue addressed by commit fccebe421 ("Use
SnapshotDirty rather than an active snapshot to probe index endpoints").
Now, that was allegedly fixed in 9.1.13 ... but if the OP were confused
and this server were running, say, 9.1.12, that could be a viable
explanation.  Another possibly viable explanation for seeing the issue
in 9.1.13 would be if I fat-fingered the back-patch somehow :-(.

In any case, I concur with your advice:

> Even if you could reproduce the problem on another machine (because of
> keeping the data internal) on a server with debug symbols and see where
> most of the time is spent (e.g. using 'perf top'), that'd be useful.

Without some more-precise idea of where the time is going, we're really
just guessing as to the cause.

            regards, tom lane


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tomas Vondra
Дата:
On 6.3.2015 01:44, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote:
>>> - postgres version is 9.1.13
>
>> The only thing I can think of is some sort of memory exhaustion,
>> resulting in swapping out large amounts of memory.
>
> I'm wondering about the issue addressed by commit fccebe421 ("Use
> SnapshotDirty rather than an active snapshot to probe index
> endpoints"). Now, that was allegedly fixed in 9.1.13 ... but if the
> OP were confused and this server were running, say, 9.1.12, that
> could be a viable explanation. Another possibly viable explanation
> for seeing the issue in 9.1.13 would be if I fat-fingered the
> back-patch somehow :-(.

How would fccebe421 explain the large amount of random writes (~4MB/s
for more than an hour), reported in the initial post? And why would that
only affect the EXPLAIN and not the bare query?

I guess there might be two sessions, one keeping uncommitted changes
(thus invisible tuples), and the other one doing the explain. And the
actual query might be executed after the first session does a commit.

But the random writes don't really match in this scenario ...


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Jeff Janes
Дата:
On Thu, Mar 5, 2015 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote:
>> - postgres version is 9.1.13

> The only thing I can think of is some sort of memory exhaustion,
> resulting in swapping out large amounts of memory.

I'm wondering about the issue addressed by commit fccebe421 ("Use
SnapshotDirty rather than an active snapshot to probe index endpoints").
Now, that was allegedly fixed in 9.1.13 ... but if the OP were confused
and this server were running, say, 9.1.12, that could be a viable
explanation.  Another possibly viable explanation for seeing the issue
in 9.1.13 would be if I fat-fingered the back-patch somehow :-(.


The back patch into 9.1.13 seems to have worked.

psql -c 'create table foo (x integer ); create index on foo(x);insert into foo select * from generate_series(1,10000); analyze foo;'

perl -le 'use DBI; my $dbh=DBI->connect("DBi:Pg:"); $dbh->begin_work(); foreach (1..1e6) {$dbh->do("insert into foo values ($_)") or die; }; $dbh->rollback()' & 

while (true); 
  do pgbench -T5 -c4 -j4 -n -f <(echo "explain select count(*) from foo a join foo b using (x);"); 
done

on 9.1.12 this slows down dramatically and on 9.1.13 it does not.

Cheers,

Jeff

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 6.3.2015 01:44, Tom Lane wrote:
>> I'm wondering about the issue addressed by commit fccebe421 ("Use
>> SnapshotDirty rather than an active snapshot to probe index
>> endpoints").

> How would fccebe421 explain the large amount of random writes (~4MB/s
> for more than an hour), reported in the initial post? And why would that
> only affect the EXPLAIN and not the bare query?

Hard to say.  There's probably additional factors involved, and this
may ultimately turn out to be unrelated.  But it seemed like it might
be an issue, particularly since the plan had a mergejoin with lots of
clauses ...

            regards, tom lane


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Kevin Grittner
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 6.3.2015 01:44, Tom Lane wrote:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> On 5.3.2015 16:01, Gunnlaugur Thor Briem wrote:
>>>> - postgres version is 9.1.13
>>>
>>> The only thing I can think of is some sort of memory exhaustion,
>>> resulting in swapping out large amounts of memory.
>>
>> I'm wondering about the issue addressed by commit fccebe421 ("Use
>> SnapshotDirty rather than an active snapshot to probe index
>> endpoints"). Now, that was allegedly fixed in 9.1.13 ... but if the
>> OP were confused and this server were running, say, 9.1.12, that
>> could be a viable explanation. Another possibly viable explanation
>> for seeing the issue in 9.1.13 would be if I fat-fingered the
>> back-patch somehow :-(.
>
> How would fccebe421 explain the large amount of random writes (~4MB/s
> for more than an hour), reported in the initial post? And why would that
> only affect the EXPLAIN and not the bare query?
>
> I guess there might be two sessions, one keeping uncommitted changes
> (thus invisible tuples), and the other one doing the explain. And the
> actual query might be executed after the first session does a commit.
>
> But the random writes don't really match in this scenario ...

Sure they do -- both the index and heap pages may be rewritten with
hints that the rows are dead.  In this email:

http://www.postgresql.org/message-id/19685.1384307200@sss.pgh.pa.us

... Tom said:

| The other thing we might consider doing is using SnapshotAny, which
| would amount to just taking the extremal index entry at face value.
| This would be cheap all right (in fact, we might later be able to optimize
| it to not even visit the heap).  However, I don't like the implications
| for the accuracy of the approximation.  It seems quite likely that an
| erroneous transaction could insert a wildly out-of-range index entry
| and then roll back --- or even if it committed, somebody might soon come
| along and clean up the bogus entry in a separate transaction.  If we use
| SnapshotAny then we'd believe that bogus entry until it got vacuumed, not
| just till it got deleted.  This is a fairly scary proposition, because
| the wackier that extremal value is, the more impact it will have on the
| selectivity estimates.
|
| If it's demonstrated that SnapshotDirty doesn't reduce the estimation
| costs enough to solve the performance problem the complainants are facing,
| I'd be willing to consider using SnapshotAny here.  But I don't want to
| take that step until it's proven that the more conservative approach
| doesn't solve the performance problem.

I wonder whether this isn't an example of a case where SnapshotAny
would do better.  I have definitely seen cases where BIND time on a
bloated database has exceeded all other PostgreSQL time combined --
in this planner probe of the index.  I haven't seen two hours on a
single EXPLAIN, but I have seen it be several times the execution
time.

Regarding the concerns about the accuracy of the estimate and the
effects on the planner that an incorrect selectivity estimate could
have -- clearly scanning a large number of index tuples and chasing
them to the heap just to find that the tuples are not visible is
about as expensive as if they were visible *for the index scan step
itself*.  I wonder whether there would be any way to allow the
index scan cost to be based on the work it has to do while somehow
generating a reasonable number for the rows it will find to be
visible, so that estimation of other nodes can be based on that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tom Lane
Дата:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> How would fccebe421 explain the large amount of random writes (~4MB/s
>> for more than an hour), reported in the initial post? And why would that
>> only affect the EXPLAIN and not the bare query?
>> But the random writes don't really match in this scenario ...

> Sure they do -- both the index and heap pages may be rewritten with
> hints that the rows are dead.

Hm ... yeah, this idea could match the symptoms: if the database has built
up a large debt of dead-but-unhinted rows, then the first pass through the
index would cause a write storm but the next would not, which would
explain why doing EXPLAIN immediately followed by the real query would put
all the hint-update burden on the EXPLAIN even though the planning phase
of the real query would inspect the same index entries.

But if that's the situation, those hint-updating writes would get done
sooner or later --- probably sooner, by actual execution of the query
itself.  So I'm not convinced that moving to SnapshotAny would fix
anything much, only change where the problem manifests.

Also, it's less than clear why only this particular query is showing
any stress.  Dead rows should be a hazard for anything, especially if
there are enough of them to require hours to re-hint.  And why wouldn't
autovacuum get to them first?

            regards, tom lane


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Jeff Janes
Дата:
On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> How would fccebe421 explain the large amount of random writes (~4MB/s
>> for more than an hour), reported in the initial post? And why would that
>> only affect the EXPLAIN and not the bare query?
>> But the random writes don't really match in this scenario ...

> Sure they do -- both the index and heap pages may be rewritten with
> hints that the rows are dead.

Hm ... yeah, this idea could match the symptoms: if the database has built
up a large debt of dead-but-unhinted rows, then the first pass through the
index would cause a write storm but the next would not, which would
explain why doing EXPLAIN immediately followed by the real query would put
all the hint-update burden on the EXPLAIN even though the planning phase
of the real query would inspect the same index entries.

But if that's the situation, those hint-updating writes would get done
sooner or later --- probably sooner, by actual execution of the query
itself.  So I'm not convinced that moving to SnapshotAny would fix
anything much, only change where the problem manifests.

But the actual query is using a seq scan, and so it would hint the table in efficient sequential order, rather than hinting the table haphazardly in index order like probing the endpoint does.


 
Also, it's less than clear why only this particular query is showing
any stress.  Dead rows should be a hazard for anything, especially if
there are enough of them to require hours to re-hint.  And why wouldn't
autovacuum get to them first?

Say the timing of this query is such that 10% of the parent turns over between invocations of this query, and that this 10% is all at the end of some index but random over the table heap.  If autovac kicks in at 20% turn over, then half the time autovac does get to them first, and half the time it doesn't.  It would be interesting to know if this query is bad every time it is planner, or just sometimes.
 
Cheers,

Jeff

On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> How would fccebe421 explain the large amount of random writes (~4MB/s
>> for more than an hour), reported in the initial post? And why would that
>> only affect the EXPLAIN and not the bare query?
>> But the random writes don't really match in this scenario ...

> Sure they do -- both the index and heap pages may be rewritten with
> hints that the rows are dead.

Hm ... yeah, this idea could match the symptoms: if the database has built
up a large debt of dead-but-unhinted rows, then the first pass through the
index would cause a write storm but the next would not, which would
explain why doing EXPLAIN immediately followed by the real query would put
all the hint-update burden on the EXPLAIN even though the planning phase
of the real query would inspect the same index entries.

But if that's the situation, those hint-updating writes would get done
sooner or later --- probably sooner, by actual execution of the query
itself.  So I'm not convinced that moving to SnapshotAny would fix
anything much, only change where the problem manifests.

Also, it's less than clear why only this particular query is showing
any stress.  Dead rows should be a hazard for anything, especially if
there are enough of them to require hours to re-hint.  And why wouldn't
autovacuum get to them first?

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tomas Vondra
Дата:
On 7.3.2015 03:26, Jeff Janes wrote:
> On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
> But the actual query is using a seq scan, and so it would hint the
> table in efficient sequential order, rather than hinting the table
> haphazardly in index order like probing the endpoint does.

I think this has nothing to do with the plan itself, but with the
estimation in optimizer - that looks up the range from the index in some
cases, and that may generate random I/O to the table.

>
>> Also, it's less than clear why only this particular query is
>> showing any stress. Dead rows should be a hazard for anything,
>> especially if there are enough of them to require hours to re-hint.
>> And why wouldn't autovacuum get to them first?
>
>
> Say the timing of this query is such that 10% of the parent turns
> over between invocations of this query, and that this 10% is all at
> the end of some index but random over the table heap. If autovac
> kicks in at 20% turn over, then half the time autovac does get to
> them first, and half the time it doesn't. It would be interesting to
> know if this query is bad every time it is planner, or just
> sometimes.

Yeah, this might be the reason. Another possibility is that this is part
of some large batch, and autovacuum simply did not have change to do the
work.

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Gunnlaugur Thor Briem
Дата:
On Sat, Mar 7, 2015 at 3:44 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Another possibility is that this is part
of some large batch, and autovacuum simply did not have change to do the
work.

Yes, I think that's it: I've just realized that immediately prior to the INSERT, in the same transaction, an unfiltered DELETE has been issued; i.e. the whole table is being rewritten. Then the INSERT is issued ... with a WHERE clause on non-existence in the (now empty) table.

In that case of course the WHERE clause is unnecessary, as it will always evaluate as true (and we've locked the whole table for writes). Looks like it is a lot worse than unnecessary, though, if it triggers this performance snafu in EXPLAIN INSERT.

This seems very likely to be the explanation here. So our workaround will be to simply omit the WHERE clause in those cases where the full DELETE has been issued. (And then vacuum afterwards.)

(Even better, just make the new table not temporary, and have it replace the former table altogether. But that's for later; requires some broader changes in our application.)

I'll report back if I *do* see the problem come up again despite this change.

Thanks all for your help figuring this out!

Best regards,

Gulli

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tom Lane
Дата:
Gunnlaugur Thor Briem <gunnlaugur@gmail.com> writes:
> Yes, I think that's it: I've just realized that immediately prior to the
> INSERT, in the same transaction, an unfiltered DELETE has been issued; i.e.
> the whole table is being rewritten. Then the INSERT is issued ... with a
> WHERE clause on non-existence in the (now empty) table.

> In that case of course the WHERE clause is unnecessary, as it will always
> evaluate as true (and we've locked the whole table for writes). Looks like
> it is a lot worse than unnecessary, though, if it triggers this performance
> snafu in EXPLAIN INSERT.

Ah-hah.  So what's happening is that the planner is doing an indexscan
over the entire table of now-dead rows, looking vainly for an undeleted
maximal row.  Ouch.

I wonder how hard it would be to make the indexscan give up after hitting
N consecutive dead rows, for some suitable N, maybe ~1000.  From the
planner's viewpoint it'd be easy enough to fall back to using whatever
it had in the histogram after all.  But that's all happening down inside
index_getnext, and I'm hesitant to stick some kind of wart into that
machinery for this purpose.

            regards, tom lane


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Jeff Janes
Дата:
On Sat, Mar 7, 2015 at 7:44 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 7.3.2015 03:26, Jeff Janes wrote:
> On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
> But the actual query is using a seq scan, and so it would hint the
> table in efficient sequential order, rather than hinting the table
> haphazardly in index order like probing the endpoint does.

I think this has nothing to do with the plan itself, but with the
estimation in optimizer - that looks up the range from the index in some
cases, and that may generate random I/O to the table.

Right.  Tom was saying that the work needs to be done anyway, but it is just that some ways of doing the work are far more efficient than others.  It just happens that the executed plan in this case would do it more efficiently, (but in general you aren't going to get any less efficient than having the planner do it in index order).

In other similar cases I've looked at (for a different reason), the executor wouldn't do that work at all because the plan it actually chooses only touches a handful of rows.  So it is planning a merge join, only to realize how ridiculous one would be and so not use one.  But it still pays the price.  In that, case the thing that would do the needful, in lieu of the planner, would be a vacuum process.  Which is optimal both because it is in the background, and is optimized for efficient IO.

Cheers,

Jeff

Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Tomas Vondra
Дата:
On 11.3.2015 18:30, Jeff Janes wrote:
> On Sat, Mar 7, 2015 at 7:44 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>     On 7.3.2015 03:26, Jeff Janes wrote:
>     > On Fri, Mar 6, 2015 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>
>     > <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> wrote:
>     >
>     > But the actual query is using a seq scan, and so it would hint the
>     > table in efficient sequential order, rather than hinting the table
>     > haphazardly in index order like probing the endpoint does.
>
>     I think this has nothing to do with the plan itself, but with the
>     estimation in optimizer - that looks up the range from the index in some
>     cases, and that may generate random I/O to the table.
>
>
> Right.  Tom was saying that the work needs to be done anyway, but it is
> just that some ways of doing the work are far more efficient than
> others.  It just happens that the executed plan in this case would do it
> more efficiently, (but in general you aren't going to get any less
> efficient than having the planner do it in index order).

Oh! Now I see what you meant. I parsed is as if you're suggesting that
the theory does not match the symptoms because the plan contains
sequential scan yet there's a lot of random I/O. But now I see that's
not what you claimed, so sorry for the noise.



--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Jim Nasby
Дата:
On 3/11/15 11:15 AM, Tom Lane wrote:
> Gunnlaugur Thor Briem <gunnlaugur@gmail.com> writes:
>> Yes, I think that's it: I've just realized that immediately prior to the
>> INSERT, in the same transaction, an unfiltered DELETE has been issued; i.e.
>> the whole table is being rewritten. Then the INSERT is issued ... with a
>> WHERE clause on non-existence in the (now empty) table.
>
>> In that case of course the WHERE clause is unnecessary, as it will always
>> evaluate as true (and we've locked the whole table for writes). Looks like
>> it is a lot worse than unnecessary, though, if it triggers this performance
>> snafu in EXPLAIN INSERT.
>
> Ah-hah.  So what's happening is that the planner is doing an indexscan
> over the entire table of now-dead rows, looking vainly for an undeleted
> maximal row.  Ouch.
>
> I wonder how hard it would be to make the indexscan give up after hitting
> N consecutive dead rows, for some suitable N, maybe ~1000.  From the
> planner's viewpoint it'd be easy enough to fall back to using whatever
> it had in the histogram after all.  But that's all happening down inside
> index_getnext, and I'm hesitant to stick some kind of wart into that
> machinery for this purpose.

ISTM what we really want here is a time-based behavior, not number of
rows. Given that, could we do the index probe in a subtransaction, set
an alarm for X ms, and simply abort the subtransaction if the alarm fires?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Jim Nasby
Дата:
On 3/11/15 10:54 AM, Gunnlaugur Thor Briem wrote:
> (Even better, just make the new table not temporary, and have it replace
> the former table altogether. But that's for later; requires some broader
> changes in our application.)

The other thing you should consider is using TRUNCATE instead of an
un-filtered DELETE. It will both be much faster to perform and won't
leave any dead rows behind.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

От
Gunnlaugur Thor Briem
Дата:
On Mon, Mar 16, 2015 at 7:24 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
The other thing you should consider is using TRUNCATE instead of an un-filtered DELETE. It will both be much faster to perform and won't leave any dead rows behind.

Yep, but it does take an ACCESS EXCLUSIVE lock. We want the old table contents to be readable to other sessions while the new table contents are being populated (which can take quite a while), hence we don't use TRUNCATE.

Best of both worlds is to just populate a new table, flip over to that when it's ready, and drop the old one once nobody's referring to it anymore. That way we don't pay the DELETE scan penalty and don't leave dead rows, and also don't lock reads out while we repopulate.

Gulli