Re: Best practices to manage custom statistics
От | Adrian Klaver |
---|---|
Тема | Re: Best practices to manage custom statistics |
Дата | |
Msg-id | 0cc5b1e2-ebd9-2bde-253a-701a9ba571be@aklaver.com обсуждение исходный текст |
Ответ на | Best practices to manage custom statistics (Moreno Andreo <moreno.andreo@evolu-s.it>) |
Ответы |
Re: Best practices to manage custom statistics
|
Список | pgsql-general |
On 11/08/2016 12:13 PM, Moreno Andreo wrote: > Hi, > I'm checking if there's a best way to obtain stastistics based on my > database tables > > Here's the scenario. > First of all, technical details: > - Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk > migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with > 2 TB disk. > - 350 databases, 350 users, every user connects to his own database and > his teammates' (max 10 in total) so each user can connect to max 10 > databases at a time > > > My application needs to achieve a certain number of statistics (how many > records are in a certain state, how many are in another state) to send > back to user. > This is obtained, at the moment, with a select count(*) from ..... (that > involves 4 joins on 4 tables) to be run run every 20 secs from each > client connected to the cluster (ATM about 650 clients configured, about > 200 concurrent) to each database it has rights to connect. > > I noticed that in some cases, especially when working with not-so-small > datasets (200k rows x 95 cols), and sometines returning not-so-small > datasets (10k rows) the query performs not so well, but the worst thing > is that it raises overall server load (I/O) and bandwidth usage. While > bandwidth is not a problem (I have spikes at 20 Mbps while "normal" > traffic speed is at about 3Mbps, but I have 1 Gbps available), server > load *is* a main problem, because in high-access periods > (summer/holidays) I see my server load go up to 22-25 on a 4-core > machine, and users call complaining for timeouts and slowness. > > Even if I'm migrating to a better instance, I'm still trying to > "normalize" this feature. > I can start looking at indices (I'm not quite sure that those fields in > WHERE clause are all indexed), but I don't think it would boost its > performance. > > I thought about having a table, say, 'tbl_counters', like this > > CREATE TABLE tbl_counters{ > uuid coduser, > int counter1, > int counter2, > .... > int counterx > }; > updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd > it fires a function that increments/decrements values for counter x at > user y). > Just to avoid eventual trigger skipping its update, one time a day (say > at 2 am) a cron performs the above "monster query" for every database > and adjusts, if necessary, counter values for each user reflecting real > values. > > In your experience, would this approach help me lower server load? > Are there any other approach I can try? Instead of pushing why not pull. In other words do the users really check/need the statistics every 20 secs? Given that you say exact is not important over the course of day, why not create a mechanism for the user to poll the database when they need the information. > > If more details are needed, just ask. > > Thanks in advance and sorry for the long message (but I had to explain > such a complex thing) > Moreno.- > > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления:
Предыдущее
От: otar shavadzeДата:
Сообщение: Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists