Re: How to get good performance for very large lists/sets?

Поиск

Список

Период

Сортировка

От	François Beausoleil
Тема	Re: How to get good performance for very large lists/sets?
Дата	6 октября 2014 г. 17:52:45
Msg-id	7A8CBA27-59BF-4EEA-81D9-F1A851A51933@teksol.info обсуждение исходный текст
Ответ на	Re: How to get good performance for very large lists/sets? (Andy Colson <andy@squeakycode.net>)
Список	pgsql-general

Дерево обсуждения

Le 2014-10-06 à 13:22, Andy Colson <andy@squeakycode.net> a écrit :

> On 10/6/2014 3:02 AM, Richard Frith-Macdonald wrote:
>> I'm wondering if anyone can help with advice on how to manage large lists/sets of items in a postgresql database.
>>
>> I have a database which uses multiple  lists of items roughly like this:
>>
>> CREATE TABLE List (
>>   ID SERIAL,
>>   Name VARCHAR ....
>> );
>>
>> and a table containing individual entries in the lists:
>>
>> CREATE TABLE ListEntry (
>>   ListID INT, /* Reference the List table */
>>   ItemID INT /* References an Item table */
>> ) ;
>> CREATE UNIQUE INDEX ListEntryIDX ON ListEntry(ListID, ItemID);
>>
>> Now, there are thousands of lists, many with millions of entries, and items are added to and removed from lists in
anunpredictable way (in response to our customer's actions, not under our control).  Lists are also created by customer
actions.
>>
>> Finding whether a particular item is in a particular list is reasonably fast, but when we need to do things like
findall the items in list A but not list B things can get very slow (particularly when both lists contain millions of
commonitems). 
>>
>> I think that server won't use index-only scans because, even in cases where a particular list has not had any recent
changes,the ListEntry table will almost always have had some change (for one of the other lists) since its last vacuum. 
>> Perhaps creating multiple ListEntry tables (one for each list) would allow better performance; but that would be
thousands(possibly tens of thousands) of tables, and allowing new tables to be created by our clients might conflict
withthings like nightly backups. 
>>
>> Is there a better way to manage list/set membership for many thousands of sets and many millions of items?
>
>
> I seem to recall something about NOT IN() and nulls, but I dont recall the details.
>
> are you using:
>
> select * where exists(select ...) and not exists(select ..)
>
> or
>
> select * where id in (select...) and id not in (select …)

Would

select * from …
except
select * from …

work better? The plan I get for SELECT EXCEPT SELECT ends up with a SetOp Except, while the SELECT WHERE exists() AND
NOTexists() plan gives me a Nested Loop Semi Join. The in / not in case gives me a simple Hash Join. 

My dataset is a parent table partitioned by market and week, so not exactly the same as Richard’s original request, and
it’sdebatable how much my data set would compare. 

Hope that helps!
François Beausoleil

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: How to get good performance for very large lists/sets?