Re: Best design for performance

Поиск
Список
Период
Сортировка
От Riaan Stander
Тема Re: Best design for performance
Дата
Msg-id 052b01d2a761$06e8c4b0$14ba4e10$@exa.co.za
обсуждение исходный текст
Ответ на Re: Best design for performance  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Best design for performance
Список pgsql-performance
I'm using the first query from here.
https://wiki.postgresql.org/wiki/Disk_Usage
It does seem to include toast data.

The plan is to do the rights checking in the application. The join solution gets used for reports to filter data &
clientadhoc queries. 

-----Original Message-----
From: Claudio Freire [mailto:klaussfreire@gmail.com]
Sent: 28 March 2017 02:42 AM
To: Riaan Stander <rstander@exa.co.za>
Cc: postgres performance list <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Best design for performance

On Mon, Mar 27, 2017 at 8:43 PM, Riaan Stander <rstander@exa.co.za> wrote:
> In Postgres I was thinking of going with a design like this
>
> CREATE TABLE security.user_right_site
> (
>     user_id bigint NOT NULL,
>     right_id bigint NOT NULL,
>     sites bigint[]
> );
> create index on security.user_right_site(user_id, right_id);
>
>
>
> This drastically cut down on the number of records in the table. It
> also seems to make a massive change to the storage requirements.
>
> The old design requires 61GB vs 2.6GB.

How did you query the table's size? You're probably failing to account for TOAST tables.

I'd suggest using pg_total_relation_size.

> My one concern is regarding the limitations of the array type in Postgres.
> Is there a point at which one should not use it? Currently our largest
> client has 6000+ sites, meaning that the array would contain that many
> items. What would the upper feasible limit be in Postgres?

In that design, rows with a large number of sites would probably end up TOASTing the sites array.

That will make access to that array a tad slower, but it would probably be OK for your use case, since you'll ever just
readone such row per query. You'll have to test to be sure. 

The limit on that design is about 128M items on sites, IIRC (ie: the maximum size of values is 1GB, so an array of 128M
bigintsis above that limit). You'll probably have issues much earlier than that. For instance, a filter of the form
"site_id= ANY(sites)" with that many entries would probably be unusably slow. 

Personally, I would go for fetching the sites array on the application side, and using site_id = ANY(ARRAY[...]) if
smallenough, and a subselect if the array is too big. That would let the planner be smarter, since it'll have the
literalarray list at planning time and will be able to fetch accurate stats, and choose an optimal plan based on data
skew.




В списке pgsql-performance по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: Best design for performance
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Best design for performance