Re: How to get good performance for very large lists/sets?

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: How to get good performance for very large lists/sets?
Дата
Msg-id 397F595E-DC1D-4E5F-BEF5-476CA50F63FA@gmail.com
обсуждение исходный текст
Ответ на How to get good performance for very large lists/sets?  (Richard Frith-Macdonald <richard.frith-macdonald@brainstorm.co.uk>)
Список pgsql-general
On 06 Oct 2014, at 10:02, Richard Frith-Macdonald <richard.frith-macdonald@brainstorm.co.uk> wrote:

> I'm wondering if anyone can help with advice on how to manage large lists/sets of items in a postgresql database.
>
> I have a database which uses multiple  lists of items roughly like this:
>
> CREATE TABLE List (
>  ID SERIAL,
>  Name VARCHAR ....
> );
>
> and a table containing individual entries in the lists:
>
> CREATE TABLE ListEntry (
>  ListID INT, /* Reference the List table */
>  ItemID INT /* References an Item table */
> ) ;
> CREATE UNIQUE INDEX ListEntryIDX ON ListEntry(ListID, ItemID);

Don’t you have any PK’s? A UNIQUE INDEX is not the same as a PK, a PK does not allow NULLs for example.

For that matter, I’d ditch the serial column in List - it attributes to a larger index size which decreases the chances
thatthe index will fit in memory, making it less feasable to the query planner. IMHO, natural keys are to be preferred
hereover surrogate keys. That is assuming that List.Name is supposed to be unique. 

> Now, there are thousands of lists, many with millions of entries, and items are added to and removed from lists in an
unpredictableway (in response to our customer's actions, not under our control).  Lists are also created by customer
actions.


> I think that server won't use index-only scans because, even in cases where a particular list has not had any recent
changes,the ListEntry table will almost always have had some change (for one of the other lists) since its last vacuum. 
> Perhaps creating multiple ListEntry tables (one for each list) would allow better performance; but that would be
thousands(possibly tens of thousands) of tables, and allowing new tables to be created by our clients might conflict
withthings like nightly backups. 
>
> Is there a better way to manage list/set membership for many thousands of sets and many millions of items?

Another benefit of using natural keys is that you don’t need to fetch the actual List entries - the Names are right
therein your ListEntry table. You only you need to look records up in the List table when you want their details
(columnsother than Name). 

A possible drawback in this case is that the PK index on ListEntry would probably be larger.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



В списке pgsql-general по дате отправления:

Предыдущее
От: Tim Mickelson
Дата:
Сообщение: Re: Really strange foreign key constraint problem blocking delete
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Really strange foreign key constraint problem blocking delete