Pathological performance when inserting many NULLs into a unique index

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Pathological performance when inserting many NULLs into a unique index
Дата
Msg-id CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Pathological performance when inserting many NULLs into a unique index  (Peter Geoghegan <pg@bowt.ie>)
Re: Pathological performance when inserting many NULLs into a uniqueindex  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
I thought of a case that results in pathological performance due to a
behavior of my nbtree patch series:

regression=# create table uniquenulls(nully int4, constraint pp unique(nully));
CREATE TABLE
Time: 10.694 ms
regression=# insert into uniquenulls select i from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 1356.025 ms (00:01.356)
regression=# insert into uniquenulls select null from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 270834.196 ms (04:30.834)

The issue here is that the duration of the second INSERT statement is
wildly excessive, because _bt_stepright() needs to step right many
many times for each tuple inserted. I would expect the second insert
to take approximately as long as the first, but it takes ~200x longer
here. It could take much longer still if we pushed this example
further. What we see here is a limited case in which the O(n ^ 2)
performance that "getting tired" used to prevent can occur. Clearly
that's totally unacceptable. Mea culpa.

Sure enough, the problem goes away when the index isn't a unique index
(i.e. in the UNIQUE_CHECK_NO case):

regression=# alter table uniquenulls drop constraint pp;
ALTER TABLE
Time: 28.968 ms
regression=# create index on uniquenulls (nully);
CREATE INDEX
Time: 1159.958 ms (00:01.160)
regression=# insert into uniquenulls select null from generate_series(1, 1e6) i;
INSERT 0 1000000
Time: 1155.708 ms (00:01.156)

Tentatively, I think that the fix here is to not "itup_key->scantid =
NULL" when a NULL value is involved (i.e. don't "itup_key->scantid =
NULL" when we see IndexTupleHasNulls(itup) within _bt_doinsert()). We
may also want to avoid calling _bt_check_unique() entirely in this
situation. That way, the performance should be the same as the
UNIQUE_CHECK_NO case: we descend to the appropriate leaf page
directly, and then we're done. We don't have to find the appropriate
leaf page by groveling through indefinitely-many existing leaf pages
that are full of NULL values, because we know that there cannot ever
be a unique violation for us to detect.

I'll create an open item for this, and begin work on a patch tomorrow.
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kato, Sho"
Дата:
Сообщение: RE: Speeding up creating UPDATE/DELETE generic plan for partitionedtable into a lot
Следующее
От: Tom Lane
Дата:
Сообщение: Re: ALTER TABLE with ADD COLUMN and ADD PRIMARY KEY USING INDEX throws spurious "column contains null values"