Обсуждение: Performance impact of updating target columns with unchanged valuesON CONFLICT

Поиск
Список
Период
Сортировка

Performance impact of updating target columns with unchanged valuesON CONFLICT

От
Abi Noda
Дата:
Given a table, `github_repos`, with a multi-column unique index on `org_id` and `github_id` columns, is there any performance difference (or other issues to be aware of) between the two bulk upsert operations below? The difference is that in the first query, the `org_id` and `github_id` columns are included in the UPDATE, whereas in the second query they are not. Since the UPDATE runs ON CONFLICT, the updated values of `org_id` and `github_id` will be the same as the old values, but those columns are included in the UPDATE because the underlying library I am using is designed that way. I'm wondering if its safe to use as-is or whether I should be explicitly excluding those columns in the UPDATE.

Query #1:

    INSERT INTO "github_repos" ("org_id","github_id","name")
    VALUES (1,1,'foo')
    ON CONFLICT (org_id, github_id)
    DO UPDATE SET "org_id"=EXCLUDED."org_id","github_id"=EXCLUDED."github_id","name"=EXCLUDED."name"
    RETURNING "id"

Query #2:

    INSERT INTO "github_repos" ("org_id","github_id","name")
    VALUES (1,1,'foo')
    ON CONFLICT (org_id, github_id)
    DO UPDATE SET "name"=EXCLUDED."name"
    RETURNING "id"

`github_repos` table:

          Column       |       Type        | Collation | Nullable
    -------------------+-------------------+-----------+----------+
     id                | bigint            |           | not null |
     org_id            | bigint            |           | not null |
     github_id         | bigint            |           | not null |
     name              | character varying |           | not null |

    Indexes:
        "github_repos_pkey" PRIMARY KEY, btree (id)
        "unique_repos" UNIQUE, btree (org_id, github_id)

Re: Performance impact of updating target columns with unchangedvalues ON CONFLICT

От
Abi Noda
Дата:
In other words, is Postgres smart enough to not actually write to disk any columns that haven’t changed value or update indexes based on those columns?

On Thu, Nov 22, 2018 at 11:32 AM Abi Noda <a@abinoda.com> wrote:
Given a table, `github_repos`, with a multi-column unique index on `org_id` and `github_id` columns, is there any performance difference (or other issues to be aware of) between the two bulk upsert operations below? The difference is that in the first query, the `org_id` and `github_id` columns are included in the UPDATE, whereas in the second query they are not. Since the UPDATE runs ON CONFLICT, the updated values of `org_id` and `github_id` will be the same as the old values, but those columns are included in the UPDATE because the underlying library I am using is designed that way. I'm wondering if its safe to use as-is or whether I should be explicitly excluding those columns in the UPDATE.

Query #1:

    INSERT INTO "github_repos" ("org_id","github_id","name")
    VALUES (1,1,'foo')
    ON CONFLICT (org_id, github_id)
    DO UPDATE SET "org_id"=EXCLUDED."org_id","github_id"=EXCLUDED."github_id","name"=EXCLUDED."name"
    RETURNING "id"

Query #2:

    INSERT INTO "github_repos" ("org_id","github_id","name")
    VALUES (1,1,'foo')
    ON CONFLICT (org_id, github_id)
    DO UPDATE SET "name"=EXCLUDED."name"
    RETURNING "id"

`github_repos` table:

          Column       |       Type        | Collation | Nullable
    -------------------+-------------------+-----------+----------+
     id                | bigint            |           | not null |
     org_id            | bigint            |           | not null |
     github_id         | bigint            |           | not null |
     name              | character varying |           | not null |

    Indexes:
        "github_repos_pkey" PRIMARY KEY, btree (id)
        "unique_repos" UNIQUE, btree (org_id, github_id)

Re: Performance impact of updating target columns with unchangedvalues ON CONFLICT

От
Justin Pryzby
Дата:
On Thu, Nov 22, 2018 at 01:31:10PM -0800, Abi Noda wrote:
> In other words, is Postgres smart enough to not actually write to disk any
> columns that haven’t changed value or update indexes based on those columns?

You're asking about what's referred to as Heap only tuples:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD
https://wiki.postgresql.org/wiki/Index-only_scans#Interaction_with_HOT

Note, if you're doing alot of updates, you should consider setting a lower the
table fillfactor, since HOT is only possible if the new tuple (row version) is
on the same page as the old tuple.

|With HOT, a new tuple placed on the same page and with all indexed columns the
|same as its parent row version does not get new index entries."

And check pg_stat_user_tables to verify that's working as intended.

Justin


Re: Performance impact of updating target columns with unchangedvalues ON CONFLICT

От
Abi Noda
Дата:
Thanks Justin. Do you know if Postgres treats an UPDATE that sets the indexed columns set to the same previous values as a change? Or does it only count it as "changed" if the values are different. This is ambiguous to me.

> HOT solves this problem for a restricted but useful special case where a tuple is repeatedly updated in ways that do not change its indexed columns.

> With HOT, a new tuple placed on the same page and with all indexed columns the same as its parent row version does not get new index entries.

> [HOT] will create a new physical heap tuple when inserting, and not a new index tuple, if and only if the update did not affect indexed columns.



On Thu, Nov 22, 2018 at 2:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Nov 22, 2018 at 01:31:10PM -0800, Abi Noda wrote:
> In other words, is Postgres smart enough to not actually write to disk any
> columns that haven’t changed value or update indexes based on those columns?

You're asking about what's referred to as Heap only tuples:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD
https://wiki.postgresql.org/wiki/Index-only_scans#Interaction_with_HOT

Note, if you're doing alot of updates, you should consider setting a lower the
table fillfactor, since HOT is only possible if the new tuple (row version) is
on the same page as the old tuple.

|With HOT, a new tuple placed on the same page and with all indexed columns the
|same as its parent row version does not get new index entries."

And check pg_stat_user_tables to verify that's working as intended.

Justin

Re: Performance impact of updating target columns with unchangedvalues ON CONFLICT

От
Abi Noda
Дата:
I take that question back – someone helped me on StackExchange and addressed it:

> It appears that Postgres is smart enough to identify cases where indexed columns are not changed , and perform HOT updates; thus , there is no difference between having or not having key columns in update statement from performance point of view. The only thing that matters it whether actual value changed. Surely, this behaviour is limited to B-Tree indexes.


On Fri, Nov 23, 2018 at 7:44 PM Abi Noda <a@abinoda.com> wrote:
Thanks Justin. Do you know if Postgres treats an UPDATE that sets the indexed columns set to the same previous values as a change? Or does it only count it as "changed" if the values are different. This is ambiguous to me.

> HOT solves this problem for a restricted but useful special case where a tuple is repeatedly updated in ways that do not change its indexed columns.

> With HOT, a new tuple placed on the same page and with all indexed columns the same as its parent row version does not get new index entries.

> [HOT] will create a new physical heap tuple when inserting, and not a new index tuple, if and only if the update did not affect indexed columns.



On Thu, Nov 22, 2018 at 2:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Nov 22, 2018 at 01:31:10PM -0800, Abi Noda wrote:
> In other words, is Postgres smart enough to not actually write to disk any
> columns that haven’t changed value or update indexes based on those columns?

You're asking about what's referred to as Heap only tuples:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD
https://wiki.postgresql.org/wiki/Index-only_scans#Interaction_with_HOT

Note, if you're doing alot of updates, you should consider setting a lower the
table fillfactor, since HOT is only possible if the new tuple (row version) is
on the same page as the old tuple.

|With HOT, a new tuple placed on the same page and with all indexed columns the
|same as its parent row version does not get new index entries."

And check pg_stat_user_tables to verify that's working as intended.

Justin