Re: How bad is using queries with thousands of values for operators IN or ANY?

Поиск

Список

Период

Сортировка

От	Pavel Stehule
Тема	Re: How bad is using queries with thousands of values for operators IN or ANY?
Дата	1 сентября 2020 г. 19:46:01
Msg-id	CAFj8pRAVJZypehmFiskRk=AOFbupWMCZRLRp7ocTnXoM8rVFCQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: How bad is using queries with thousands of values for operators IN or ANY? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы	Re: How bad is using queries with thousands of values for operators IN or ANY? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список	pgsql-general

Дерево обсуждения

út 1. 9. 2020 v 9:22 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com> napsal:

At Mon, 31 Aug 2020 16:04:43 +0200, Pavel Stehule <pavel.stehule@gmail.com> wrote in
> po 31. 8. 2020 v 13:29 odesílatel Thomas Kellerer <shammat@gmx.net> napsal:
>
> > Thorsten Schöning schrieb am 31.08.2020 um 12:37:
> > > So for what query size or number of IDs to compare in IN would you
> > > consider a different approach at all?
> >
> >
> > In my experience "hundreds" of IDs tend to be quite slow if used with an
> > IN clause.
> >
> > Rewriting the IN to a JOIN against a VALUES clause is very often faster:
> >
> > So instead of:
> >
> > select *
> > from t
> > where id in (1,2,3, .... ,500);
> >
> > using this:
> >
> > select *
> > from t
> > join (
> > values (1),(2),(3),...(500)
> > ) as x(id) on x.id = t.id
> >
> > produces more often than not a more efficient execution plan (assuming no
> > values are duplicated in the IN list)
> >
> > Obviously I don't know if such a re-write is even feasible though.
> >
>
> yes - this query probably will have a slow start, but the execution will be
> fast. Unfortunately, there are not available statistics.

FWIW, the attached is the dusted-off version of a part of a stalled
development of mine, which unconditionally(!) creates on-the-fly
statistics on VALUES list. It seems to work for certain cases,
although the planning time increases significantly.

=$ CREATE TABLE t1 AS SELECT a, a * 2 AS b FROM generate_series(0, 99999) a;
=$ CREATE INDEX ON t1 (a);
> perl q.pl(*) | psql

*: q.pl:
> print "explain analyze select b from t1 join (values ";
> foreach $i (0..10000) {
> print ", " if ($i > 0);
> printf("(%d)", $i/10 + 1000);
> }
> print ") as v(v) on (v.v = t1.a);";

patched:

Merge Join (cost=824.25..1005.19 rows=10001 width=4) (actual time=13.513..24.285 rows=10001 loops=1)
Merge Cond: (t1.a = "*VALUES*".column1)
-> Index Scan using t1_a_idx on t1 (cost=0.29..3050.29 rows=100000 width=8) (actual time=0.033..1.629 rows=2002 loops=1)
-> Sort (cost=789.47..814.47 rows=10001 width=4) (actual time=12.557..14.546 rows=10001 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 931kB
-> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.271 rows=10001 loops=1)
Planning Time: 17.290 ms
Execution Time: 26.344 ms
(9 rows)

master:
Hash Join (cost=250.03..2168.03 rows=10001 width=4) (actual time=14.482..77.205 rows=10001 loops=1)
Hash Cond: (t1.a = "*VALUES*".column1)
-> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.017..23.540 rows=100000 loops=1)
-> Hash (cost=125.01..125.01 rows=10001 width=4) (actual time=13.786..13.788 rows=10001 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
-> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.503 rows=10001 loops=1)
Planning Time: 12.365 ms
Execution Time: 78.567 ms
(8 rows)

regards.

nice :)

Pavel

--
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-general по дате отправления:

Предыдущее

От: Michael Lewis
Дата: 01 сентября 2020 г., 19:35:51
Сообщение: Re: How bad is using queries with thousands of values for operators IN or ANY?

Следующее

От: Maksim Milyutin
Дата: 01 сентября 2020 г., 20:10:41
Сообщение: Re: High Availability, guarantee to use sync nodes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: How bad is using queries with thousands of values for operators IN or ANY?

Предыдущее

Следующее