Re: Best practices to manage custom statistics

Поиск

Список

Период

Сортировка

От	Adrian Klaver
Тема	Re: Best practices to manage custom statistics
Дата	8 ноября 2016 г. 23:28:12
Msg-id	0cc5b1e2-ebd9-2bde-253a-701a9ba571be@aklaver.com обсуждение исходный текст
Ответ на	Best practices to manage custom statistics (Moreno Andreo <moreno.andreo@evolu-s.it>)
Ответы	Re: Best practices to manage custom statistics
Список	pgsql-general

Дерево обсуждения

On 11/08/2016 12:13 PM, Moreno Andreo wrote:
> Hi,
>     I'm checking if there's a best way to obtain stastistics based on my
> database tables
>
> Here's the scenario.
> First of all, technical details:
> - Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk
> migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with
> 2 TB disk.
> - 350 databases, 350 users, every user connects to his own database and
> his teammates' (max 10 in total) so each user can connect to max 10
> databases at a time
>
>
> My application needs to achieve a certain number of statistics (how many
> records are in a certain state, how many are in another state) to send
> back to user.
> This is obtained, at the moment, with a select count(*) from ..... (that
> involves 4 joins on 4 tables) to be run run every 20 secs from each
> client connected to the cluster (ATM about 650 clients configured, about
> 200 concurrent) to each database it has rights to connect.
>
> I noticed that in some cases, especially when working with not-so-small
> datasets (200k rows x 95 cols), and sometines returning not-so-small
> datasets (10k rows) the query performs not so well, but the worst thing
> is that it raises overall server load (I/O) and bandwidth usage. While
> bandwidth is not a problem (I have spikes at 20 Mbps while "normal"
> traffic speed is at about 3Mbps, but I have 1 Gbps available), server
> load *is* a main problem, because in high-access periods
> (summer/holidays) I see my server load go up to 22-25 on a 4-core
> machine, and users call complaining for timeouts and slowness.
>
> Even if I'm migrating to a better instance, I'm still trying to
> "normalize" this feature.
> I can start looking at indices (I'm not quite sure that those fields in
> WHERE clause are all indexed), but I don't think it would boost its
> performance.
>
> I thought about having a table, say, 'tbl_counters', like this
>
> CREATE TABLE tbl_counters{
> uuid coduser,
> int counter1,
> int counter2,
> ....
> int counterx
> };
> updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd
> it fires a function that increments/decrements values for counter x at
> user y).
> Just to avoid eventual trigger skipping its update, one time a day (say
> at 2 am) a cron performs the above "monster query" for every database
> and adjusts, if necessary, counter values for each user reflecting real
> values.
>
> In your experience, would this approach help me lower server load?
> Are there any other approach I can try?

Instead of pushing why not pull. In other words do the users really
check/need the statistics every 20 secs? Given that you say exact is not
important over the course of day, why not create a mechanism for the
user to poll the database when they need the information.

>
> If more details are needed, just ask.
>
> Thanks in advance and sorry for the long message (but I had to explain
> such a complex thing)
> Moreno.-
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

В списке pgsql-general по дате отправления:

Предыдущее

От: otar shavadze
Дата: 08 ноября 2016 г., 23:27:54
Сообщение: Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

Следующее

От: Raymond O'Donnell
Дата: 09 ноября 2016 г., 01:25:11
Сообщение: Re: PHP-Shop with PostgreSQL

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Best practices to manage custom statistics

Предыдущее

Следующее