Join with an array

Поиск
Список
Период
Сортировка
От Markus Schiltknecht
Тема Join with an array
Дата
Msg-id 1140694595.11865.8.camel@fotomarburg
обсуждение исходный текст
Ответы Re: Join with an array  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Join with an array  (Oleg Bartunov <oleg@sai.msu.su>)
Re: Join with an array  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

I'm trying to speed up a query with a lookup table. This lookup table
gets very big and should still fit into memory. It does not change very
often. Given these facts I decided to use an array, as follows:

CREATE TABLE lookup_table (id INT PRIMARY KEY, items INT[] NOT NULL);

I know this is not considered good database design, but it saves a lot
of overhead for tuple visibility compared to a 1:1 table.

To fetch an item via the lookup_table I tried to use the following
query:

SELECT i.id, i.title FROM item iJOIN lookup_table lut ON i.id = ANY(lut.items)WHERE lut.id = $LOOKUP_ID;

Unfortunately that one seems to always use a sequential scan over items.
As the items array in the lookup table often has only 3 - 10 entries
(compared to about 1 mio rows in the item table) this is a very
expensive operation.

I tried to circumvent the problem with generate_series:

SELECT i.id, i.title FROM generate_series(0, $MAX) sJOIN lookup_table lut ON s = ANY(lut.items)JOIN item i ON s =
i.idWHERElut.id = $LOOKUP_ID;
 

That query uses the index to lookup the item, but as soon as $MAX gets
bigger than 10000 generate_series takes too long and too many
comparisons s = ANY(lut.items) need to be done.

I think it would be possible to write a function generate_series(INT[])
which returns all the elements of the array. The query would then look
something like:

SELECT i.id, i.titleFROM generate_series(SELECT lut.items FROM lookup_table lut WHERE
lut.id = $LOOKUP_ID) sJOIN item i ON s = i.id;

Do you see any problem in implementing such function? Does something
similar already existt?

Why does the first query use a seqscan instead of the index on items? Do
I miss anything? What problems do I face if I want to teach the planner
to use the index in the first query [1]?

Regards

Markus

[1]: generally in most cases like "JOIN .. ON x IN ANY($ARRAY)" where
$ARRAY is reasonably small.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: pg_config, pg_service.conf, postgresql.conf ....
Следующее
От: Martin Pitt
Дата:
Сообщение: Re: pg_config, pg_service.conf, postgresql.conf ....