Обсуждение: Configuration tips for very large database
Hello, I've been away from postgres for several years, so please forgive me if I forgot nearly everything:-) I've just inherited a database collecting environmental data. There's a background process continually inserting records (not so often, to say the truth) and a web interface to query data. At the moment the record count of the db is 250M and growing all the time. The 3 main tables have just 3 columns. Queries get executed very very slowly, say 20 minutes. The most evident problem I see is that io wait load is almost always 90+% while querying data, 30-40% when "idle" (so to say). Obviously disk access is to blame, but I'm a bit surprised because the cluster where this db is running is not at all old iron: it's a vmware VM with 16GB ram, 4cpu 2.2Ghz, 128GB disk (half of which used). The disk system underlying vmware is quite powerful, this postgres is the only system that runs slowly in this cluster. I can increase resources if necessary, but.. Even before analyzing queries (that I did) I'd like to know if someone has already succeeded in running postgres with 200-300M records with queries running much faster than this. I'd like to compare the current configuration with a super-optimized one to identify the parameters that need to be changed. Any link to a working configuration would be very appreciated. Thanks for any help, Nico
Nico Sabbi <nicola.sabbi@poste.it> wrote: > Queries get executed very very slowly, say 20 minutes. > I'd like to know if someone has already succeeded in running > postgres with 200-300M records with queries running much faster > than this. If you go to the http://wcca.wicourts.gov/ web site, bring up any case, and click the "Court Record Events" button, it will search a table with hundreds of millions of rows. The table is not partitioned, but has several indexes on it which are useful for queries such as the one that is used when you click the button. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/12/2015 11:38 PM, Kevin Grittner wrote: > > If you go to the http://wcca.wicourts.gov/ web site, bring up any > case, and click the "Court Record Events" button, it will search a > table with hundreds of millions of rows. The table is not > partitioned, but has several indexes on it which are useful for > queries such as the one that is used when you click the button. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Impressive. Can you give any hint on the configuration and on the underlying hardware?
Nico Sabbi <nicola.sabbi@poste.it> wrote: > Can you give any hint on the configuration and on the underlying > hardware? Well, this particular web site has millions of hits per day (running up to about 20 queries per hit) from thousands of concurrent web users, while accepting logical replication from thousands of OLTP users via logical replication, so you probably don't need equivalent hardware. If I recall correctly it is running 32 cores with 512GB RAM running two PostgreSQL clusters, each multiple TB, and each having a RAID 5 array of 40 drives, plus separate controllers and RAID for OS and WAL. For server configuration, see these Wiki pages for the general tuning techniques used: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server https://wiki.postgresql.org/wiki/Number_Of_Database_Connections The best course to solve your problem would probably be to review those and see what might apply, and if you still have a problem pick a specific slow-running query and use the process described here: https://wiki.postgresql.org/wiki/SlowQueryQuestions -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I can't speak to the numbers postgresql can or cannot do but the numbers above sound very very doable. If you can get a hold of greg smith's postgresql high performance, I always liked his method of tuning buffers and checkpoints using the background writer stats. All of which can help with the IO load and caching.
good luck!
On Thu, Feb 12, 2015 at 4:55 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Nico Sabbi <nicola.sabbi@poste.it> wrote:
> Can you give any hint on the configuration and on the underlying
> hardware?
Well, this particular web site has millions of hits per day
(running up to about 20 queries per hit) from thousands of
concurrent web users, while accepting logical replication from
thousands of OLTP users via logical replication, so you probably
don't need equivalent hardware. If I recall correctly it is
running 32 cores with 512GB RAM running two PostgreSQL clusters,
each multiple TB, and each having a RAID 5 array of 40 drives,
plus separate controllers and RAID for OS and WAL.
For server configuration, see these Wiki pages for the general
tuning techniques used:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
https://wiki.postgresql.org/wiki/Number_Of_Database_Connections
The best course to solve your problem would probably be to review
those and see what might apply, and if you still have a problem
pick a specific slow-running query and use the process described
here:
https://wiki.postgresql.org/wiki/SlowQueryQuestions
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Thu, Feb 12, 2015 at 7:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Nico Sabbi <nicola.sabbi@poste.it> wrote: > >> Queries get executed very very slowly, say 20 minutes. > >> I'd like to know if someone has already succeeded in running >> postgres with 200-300M records with queries running much faster >> than this. > > If you go to the http://wcca.wicourts.gov/ web site, bring up any > case, and click the "Court Record Events" button, it will search a > table with hundreds of millions of rows. The table is not > partitioned, but has several indexes on it which are useful for > queries such as the one that is used when you click the button. I have a table with ~800M rows, wide ones, that runs reporting queries quite efficiently (usually seconds). Of course, the queries don't traverse the whole table. That wouldn't be efficient. That's probably the key there, don't make you database process the whole thing every time if you expect it to be scalable. What kind of queries are you running that have slowed down? Post an explain analyze so people can diagnose. Possibly it's a query/indexing issue rather than a hardware one.
On Thu, Feb 12, 2015 at 11:25:54PM +0100, Nico Sabbi wrote: > Hello, > I've been away from postgres for several years, so please forgive > me if I forgot nearly everything:-) > > I've just inherited a database collecting environmental data. > There's a background process continually inserting records (not so > often, to say the truth) and a web interface to query data. > At the moment the record count of the db is 250M and growing all the > time. The 3 main tables have just 3 columns. > > Queries get executed very very slowly, say 20 minutes. The most > evident problem I see is that io wait load is almost always 90+% > while querying data, 30-40% when "idle" (so to say). > Obviously disk access is to blame, but I'm a bit surprised because > the cluster where this db is running is not at all old iron: it's a > vmware VM with 16GB ram, 4cpu 2.2Ghz, 128GB disk (half of which > used). The disk system underlying vmware is quite powerful, this > postgres is the only system that runs slowly in this cluster. > I can increase resources if necessary, but.. > > Even before analyzing queries (that I did) I'd like to know if > someone has already succeeded in running postgres with 200-300M > records with queries running much faster than this. I'd like to > compare the current configuration with a super-optimized one to > identify the parameters that need to be changed. > Any link to a working configuration would be very appreciated. > > Thanks for any help, > Nico > Hi Nico, No one has mentioned the elephant in the room, but a database can be very I/O intensive and you may not be getting the performance you need from your virtual disk running on your VMware disk subsystem. What do IOmeter or other disk performance evaluation software report? Regards, Ken
>> > > Hi Nico, > > No one has mentioned the elephant in the room, but a database can > be very I/O intensive and you may not be getting the performance > you need from your virtual disk running on your VMware disk subsystem. > What do IOmeter or other disk performance evaluation software report? > > Regards, > Ken Anecdatum: Moving from a contended VMware hard-disk based filesystem running over the network, to a bare metal RAID10 SSD, resultedin many DB operations running 20-30x faster. Table sizes circa 10-20G, millions of rows. Graeme.
On 02/13/2015 12:19 AM, Claudio Freire wrote: > I have a table with ~800M rows, wide ones, that runs reporting queries > quite efficiently (usually seconds). > > Of course, the queries don't traverse the whole table. That wouldn't > be efficient. That's probably the key there, don't make you database > process the whole thing every time if you expect it to be scalable. > > What kind of queries are you running that have slowed down? > > Post an explain analyze so people can diagnose. Possibly it's a > query/indexing issue rather than a hardware one. > Thanks everybody for the answers. At the moment I don't have the queries at hand (saturday:-) ). I'll post them next week. I'd really like to avoid data partitioning if possible. It's a thing that gives me a strong stomach ache.