Обсуждение: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence

Поиск
Список
Период
Сортировка
I have a PostgreSQL database on a very low-resource Xen virtual machine,
48 MB RAM. When two queries run at the same time, it takes longer to
complete then if run in sequence. Is there perhaps a way to install
something like a query sequencer, which would process queries in a FIFO
manner, one at a time, even if a new query comes before the last one
running is finished, it would not give the new query to the server
before the one running now finishes? That would greatly improve
performance.

Any tips in general for running PostgreSQL on such low-resource machine?

I have:

shared_buffers = 5MB
work_mem = 1024kB

are these good values, or could perhaps changing something improve it a
bit? Any other parameters to look at?

--
Miernik
http://miernik.name/

On Wed, Jul 23, 2008 at 9:21 AM, Miernik <public@public.miernik.name> wrote:
> I have a PostgreSQL database on a very low-resource Xen virtual machine,
> 48 MB RAM. When two queries run at the same time, it takes longer to
> complete then if run in sequence. Is there perhaps a way to install
> something like a query sequencer, which would process queries in a FIFO
> manner, one at a time, even if a new query comes before the last one
> running is finished, it would not give the new query to the server
> before the one running now finishes? That would greatly improve
> performance.
>
> Any tips in general for running PostgreSQL on such low-resource machine?
>
> I have:
>
> shared_buffers = 5MB
> work_mem = 1024kB
>
> are these good values, or could perhaps changing something improve it a
> bit? Any other parameters to look at?

Well, you're basically working on a really limited machine there.  I'd
set shared buffers up by 1 meg at a time and see if that helps.  But
you're basically looking at a very narrow problem that most people
won't ever run into.  Why such an incredibly limited virtual machine?
Even my cell phone came with 256 meg built in two years ago.

Miernik wrote:
> I have a PostgreSQL database on a very low-resource Xen virtual machine,
> 48 MB RAM. When two queries run at the same time, it takes longer to
> complete then if run in sequence. Is there perhaps a way to install
> something like a query sequencer, which would process queries in a FIFO
> manner, one at a time, even if a new query comes before the last one
> running is finished, it would not give the new query to the server
> before the one running now finishes? That would greatly improve
> performance.

One idea I just had was to have a connection pooler (say pgpool) and
allow only one connection slot.  If the pooler is capable to be
configured to block new connections until the slot is unused, this would
do what you want.  (I don't know whether poolers allow you to do this).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Scott Marlowe <scott.marlowe@gmail.com> wrote:
> won't ever run into.  Why such an incredibly limited virtual machine?
> Even my cell phone came with 256 meg built in two years ago.

Because I don't want to spend too much money on the machine rent, and a
48 MB RAM Xen is about all I can get with a budget of 100$ per year.
Well, there are a few providers which will give me a 128 MB Xen for that
money, but will the difference in performance be worth the hassle to
switch providers? My current provider gives me almost perfect
relaliability for that 100$ and I don't know how the providers which
give more RAM for the same money perform, maybe they are often down or
something. And spending more then 100$ yearly on this would be really
overkill. My thing runs fine, only a bit slow, but reasonable. I just
want to find out if I could maybe make it better with a little tweaking.
Can I expect it to work at least three times faster on 128 MB RAM?
Getting 256 MB would certainly cost too much. Or maybe there are some
providers which can give me much more performance PostgreSQL server with
at least several GB of storage for well... not more then 50$ per year.
(because I must still rent another server to run and SMTP server and few
other small stuff).

My DB has several tables with like 100000 to 1 million rows each,
running sorts, joins, updates etc on them several times per hour.
About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on
disk now, 500 MB dumped.

If I could shorten the time it takes to run each query by a factor of 3
that's something worth going for.

--
Miernik
http://miernik.name/

On Wed, Jul 23, 2008 at 2:32 PM, Miernik <public@public.miernik.name> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> won't ever run into.  Why such an incredibly limited virtual machine?
>> Even my cell phone came with 256 meg built in two years ago.
>
> Because I don't want to spend too much money on the machine rent, and a
> 48 MB RAM Xen is about all I can get with a budget of 100$ per year.
> Well, there are a few providers which will give me a 128 MB Xen for that
> money, but will the difference in performance be worth the hassle to
> switch providers? My current provider gives me almost perfect
> relaliability for that 100$ and I don't know how the providers which
> give more RAM for the same money perform, maybe they are often down or
> something. And spending more then 100$ yearly on this would be really
> overkill. My thing runs fine, only a bit slow, but reasonable. I just
> want to find out if I could maybe make it better with a little tweaking.
> Can I expect it to work at least three times faster on 128 MB RAM?
> Getting 256 MB would certainly cost too much. Or maybe there are some
> providers which can give me much more performance PostgreSQL server with
> at least several GB of storage for well... not more then 50$ per year.
> (because I must still rent another server to run and SMTP server and few
> other small stuff).
>
> My DB has several tables with like 100000 to 1 million rows each,
> running sorts, joins, updates etc on them several times per hour.
> About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on
> disk now, 500 MB dumped.
>
> If I could shorten the time it takes to run each query by a factor of 3
> that's something worth going for.

Well, my guess is that by running under Xen you're already sacrificing
quite a bit of performance, and running it with only 48 Megs of ram is
making it even worse.  But if your budget is $100 a year, I guess
you're probably stuck with such a setup.  I would see if you can get a
trial Xen at the other hosts with 128M or more of memory and compare.

Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Well, my guess is that by running under Xen you're already sacrificing
> quite a bit of performance, and running it with only 48 Megs of ram is
> making it even worse.  But if your budget is $100 a year, I guess
> you're probably stuck with such a setup.  I would see if you can get a
> trial Xen at the other hosts with 128M or more of memory and compare.

Is running in a 48 MB Xen any different in terms of performance to
running on a hardware 48 MB RAM machine?

I see that I am not the only one with such requirements, one guy even
set up a site listing all VPS providers which host for under 7$ per
month: http://www.lowendbox.com/virtual-server-comparison/
You can see that the most you can get for that money is a 128 MB OpenVZ
I wonder if running PostgreSQL on OpenVZ is any different to running it
on Xen in terms of performance.

Oh here is something more:
http://vpsempire.com/action.php?do=vpslite
256 MB for 7.45$ per month
512 MB for 11.95$ per month
however it doesn't say what is the virtualization software, so don't
really know what it is.

--
Miernik
http://miernik.name/

A Dimecres 23 Juliol 2008, Miernik va escriure:
> I have a PostgreSQL database on a very low-resource Xen virtual machine,
> 48 MB RAM. When two queries run at the same time, it takes longer to
> complete then if run in sequence. Is there perhaps a way to install
> something like a query sequencer, which would process queries in a FIFO
> manner, one at a time, even if a new query comes before the last one
> running is finished, it would not give the new query to the server
> before the one running now finishes? That would greatly improve
> performance.

You didn't mention your PostgreSQL version. Since 8.3 there's "synchronized
scans" which greatly improves performance if concurrent queries have to do a
sequential scan on the same table. Of course, if queries don't hit the same
table there'll be no improvements in performance...

>
> Any tips in general for running PostgreSQL on such low-resource machine?
>
> I have:
>
> shared_buffers = 5MB
> work_mem = 1024kB
>
> are these good values, or could perhaps changing something improve it a
> bit? Any other parameters to look at?
>
> --
> Miernik
> http://miernik.name/




Miernik wrote:
> Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> won't ever run into.  Why such an incredibly limited virtual machine?
>> Even my cell phone came with 256 meg built in two years ago.
>
> Because I don't want to spend too much money on the machine rent, and a
> 48 MB RAM Xen is about all I can get with a budget of 100$ per year.
[snip]
> My DB has several tables with like 100000 to 1 million rows each,
> running sorts, joins, updates etc on them several times per hour.
> About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on
> disk now, 500 MB dumped.
>
> If I could shorten the time it takes to run each query by a factor of 3
> that's something worth going for.

Firstly, congratulations on providing quite a large database on such a
limited system. I think most people on such plans have tables with a few
hundred to a thousand rows in them, not a million. Many of the people
here are used to budgets a hundred or a thousand times of yours, so bear
in mind you're as much an expert as them :-)

If you're going to get the most out of this, you'll want to set up your
own Xen virtual machine on a local system so you can test changes.
You'll be trading your time against the budget, so bear that in mind.

If you know other small organisations locally in a similar position
perhaps consider sharing a physical machine and managing Xen yourselves
- that can be cheaper.

Changes

First step is to make sure you're running version 8.3 - there are some
useful improvements there that reduce the size of shorter text fields,
as well as the synchronised scans Albert mentions below.

Second step is to make turn off any other processes you don't need. Tune
down the number of consoles, apache processes, mail processes etc.
Normally not worth the trouble, but getting another couple of MB is
worthwhile in your case. Might be worth turning off autovacuum and
running a manual vacuum full overnight if your database is mostly reads.

Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro
said) and set them to allow only one connection in the pool. I know that
pgbouncer offers per-transaction connection sharing which will make this
more practical. Even so, it will help if your application can co-operate
by closing the connection as soon as possible.

--
   Richard Huxton
   Archonet Ltd

Richard Huxton <dev@archonet.com> wrote:
> Firstly, congratulations on providing quite a large database on such a
> limited system. I think most people on such plans have tables with a
> few hundred to a thousand rows in them, not a million. Many of the
> people here are used to budgets a hundred or a thousand times of
> yours, so bear in mind you're as much an expert as them :-)

Well, I proved that it can reasonably well work, and I am finetuning the
system step by step, so it can work better.

> If you're going to get the most out of this, you'll want to set up
> your own Xen virtual machine on a local system so you can test
> changes.

Good idea.

> If you know other small organisations locally in a similar position
> perhaps consider sharing a physical machine and managing Xen
> yourselves - that can be cheaper.

Well, maybe, but its also a lot of hassle, not sure it's worth it, just
looking to get the most out of thje existing system.

> First step is to make sure you're running version 8.3 - there are some
> useful improvements there that reduce the size of shorter text fields,
> as well as the synchronised scans Albert mentions below.

I am running 8.3.3

> Second step is to make turn off any other processes you don't need.
> Tune down the number of consoles, apache processes, mail processes
> etc.  Normally not worth the trouble, but getting another couple of MB
> is worthwhile in your case.

There is no apache, but lighttpd, right now:

root@polica:~# free
             total     used     free   shared  buffers   cached
Mem:         49344    47840     1504        0        4    23924
-/+ buffers/cache:    23912    25432
Swap:       257000     9028   247972
root@polica:~#

> Might be worth turning off autovacuum and running a manual vacuum full
> overnight if your database is mostly reads.

I run autovacum, and the database has a lot of updates all the time,
also TRUNCATING tables and refilling them, usually one or two
INSERTS/UPDATES per second.

> Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro
> said) and set them to allow only one connection in the pool. I know
> that pgbouncer offers per-transaction connection sharing which will
> make this more practical. Even so, it will help if your application
> can co-operate by closing the connection as soon as possible.

I just installed pgpool2 and whoaaa! Everything its like about 3 times
faster! My application are bash scripts using psql -c "UPDATE ...".
I plan to rewrite it in Python, not sure if it would improve
performance, but will at least be a "cleaner" implementation.

In /etc/pgpool.conf I used:

# number of pre-forked child process
num_init_children = 1

# Number of connection pools allowed for a child process
max_pool = 1

Wanted to install pgbouncer, but it is broken currently in Debian. And
why is it in contrib and not in main (speaking of Debian location)?

--
Miernik
http://miernik.name/

Miernik wrote:
>> Might be worth turning off autovacuum and running a manual vacuum full
>> overnight if your database is mostly reads.
>
> I run autovacum, and the database has a lot of updates all the time,
> also TRUNCATING tables and refilling them, usually one or two
> INSERTS/UPDATES per second.

OK

>> Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro
>> said) and set them to allow only one connection in the pool. I know
>> that pgbouncer offers per-transaction connection sharing which will
>> make this more practical. Even so, it will help if your application
>> can co-operate by closing the connection as soon as possible.
>
> I just installed pgpool2 and whoaaa! Everything its like about 3 times
> faster! My application are bash scripts using psql -c "UPDATE ...".

Probably spending most of their time setting up a new connection, then
clearing it down again.

> I plan to rewrite it in Python, not sure if it would improve
> performance, but will at least be a "cleaner" implementation.

Careful of introducing any more overheads though. If libraries end up
using another 2.5MB of RAM then that's 10% of your disk-cache gone.

> In /etc/pgpool.conf I used:
>
> # number of pre-forked child process
> num_init_children = 1
>
> # Number of connection pools allowed for a child process
> max_pool = 1

Might need to increase that to 2 or 3.

> Wanted to install pgbouncer, but it is broken currently in Debian. And
> why is it in contrib and not in main (speaking of Debian location)?

Not well known enough on the Debian side of the fence? It's simple
enough to install from source though. Takes about one minute.

--
   Richard Huxton
   Archonet Ltd

Richard Huxton <dev@archonet.com> wrote:
>> I just installed pgpool2 and whoaaa! Everything its like about 3 times
>> faster! My application are bash scripts using psql -c "UPDATE ...".
>
> Probably spending most of their time setting up a new connection, then
> clearing it down again.

If I do it in Python it could do all queries in the same connection, so
should be faster? Besides that 'psql' is written in perl, so its also
heavy, by not using psql I get rid of perl library in RAM. Also the
script uses wget to poll some external data sources a lot, also
needlessly opening new connection to the webserver, so I want to make
the script save the http connection, which means I must get rid of wget.
Maybe I should write some parts in C?

BTW, doesn't there exist any tool does what "psql -c" does, but is
written in plain C, not perl? I was looking for such psql replacement,
but couldn't find any.

>> # Number of connection pools allowed for a child process
>> max_pool = 1
>
> Might need to increase that to 2 or 3.

Why? The website says:

max_pool

    The maximum number of cached connections in pgpool-II children
processes. pgpool-II reuses the cached connection if an incoming
connection is connecting to the same database by the same username.

But all my connections are to the same database and the same username,
and I only ever want my application to do 1 connection to the database
at a time, so why would I want/need 2 or 3 in max_pool?

> Not well known enough on the Debian side of the fence? It's simple
> enough to install from source though. Takes about one minute.

But is there any advantage for me compared to pgpool2, which works
really nice? In some parts, like doing some count(*) stuff, it now does
things in about one second, which took a few minutes to finish before (if
the other part of the scripts where doing something else on the database
at the same time).

--
Miernik
http://miernik.name/

On 31 Jul 2008, at 10:29AM, Miernik wrote:

> Richard Huxton <dev@archonet.com> wrote:
>>> I just installed pgpool2 and whoaaa! Everything its like about 3
>>> times
>>> faster! My application are bash scripts using psql -c "UPDATE ...".
>>
>> Probably spending most of their time setting up a new connection,
>> then
>> clearing it down again.
>
> If I do it in Python it could do all queries in the same connection,
> so
> should be faster? Besides that 'psql' is written in perl, so its also
> heavy, by not using psql I get rid of perl library in RAM. Also the
> script uses wget to poll some external data sources a lot, also
> needlessly opening new connection to the webserver, so I want to make
> the script save the http connection, which means I must get rid of
> wget.
> Maybe I should write some parts in C?
>
> BTW, doesn't there exist any tool does what "psql -c" does, but is
> written in plain C, not perl? I was looking for such psql replacement,
> but couldn't find any.


?

file `which psql`
/usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
stripped

--
Regards
Theo


Miernik wrote:
> Richard Huxton <dev@archonet.com> wrote:
>>> I just installed pgpool2 and whoaaa! Everything its like about 3 times
>>> faster! My application are bash scripts using psql -c "UPDATE ...".
>> Probably spending most of their time setting up a new connection, then
>> clearing it down again.
>
> If I do it in Python it could do all queries in the same connection, so
> should be faster? Besides that 'psql' is written in perl, so its also
> heavy, by not using psql I get rid of perl library in RAM.

Nope - "C" all through.

 > Also the
> script uses wget to poll some external data sources a lot, also
> needlessly opening new connection to the webserver, so I want to make
> the script save the http connection, which means I must get rid of wget.
> Maybe I should write some parts in C?
>
> BTW, doesn't there exist any tool does what "psql -c" does, but is
> written in plain C, not perl? I was looking for such psql replacement,
> but couldn't find any

Well ECPG lets you embed SQL directly in your "C".

>>> # Number of connection pools allowed for a child process
>>> max_pool = 1
>> Might need to increase that to 2 or 3.
>
> Why? The website says:
>
> max_pool
>
>     The maximum number of cached connections in pgpool-II children
> processes. pgpool-II reuses the cached connection if an incoming
> connection is connecting to the same database by the same username.
>
> But all my connections are to the same database and the same username,
> and I only ever want my application to do 1 connection to the database
> at a time, so why would I want/need 2 or 3 in max_pool?

 From the subject line of your question: "how to fix problem then when
two queries run at the same time..."

Of course if you don't actually want to run two simultaneous queries,
then max_pool=1 is what you want.

>> Not well known enough on the Debian side of the fence? It's simple
>> enough to install from source though. Takes about one minute.
>
> But is there any advantage for me compared to pgpool2, which works
> really nice?

Can't say. Given your limited RAM, it's probably worth looking at both
and seeing which leaves you more memory. Your main concern has got to be
to reduce wasted RAM.

 > In some parts, like doing some count(*) stuff, it now does
> things in about one second, which took a few minutes to finish before (if
> the other part of the scripts where doing something else on the database
> at the same time).

That will be because you're only running one query, I'd have thought.
Two queries might be sending you into swap.

--
   Richard Huxton
   Archonet Ltd

Theo Kramer <theo@flame.co.za> wrote:
> file `which psql`
> /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1
> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
> stripped

miernik@tarnica:~$ file `which psql`
/usr/bin/psql: symbolic link to `../share/postgresql-common/pg_wrapper'
miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper
/usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script text executable
miernik@tarnica:~$

--
Miernik
http://miernik.name/

Miernik wrote:
> Theo Kramer <theo@flame.co.za> wrote:
>> file `which psql`
>> /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1
>> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
>> stripped
>
> miernik@tarnica:~$ file `which psql`
> /usr/bin/psql: symbolic link to `../share/postgresql-common/pg_wrapper'
> miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper
> /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script text executable

That's not psql though, that's Debian's wrapper around it which lets you
install multiple versions of PostgreSQL on the same machine. Might be
worth bypassing it and calling it directly.

--
   Richard Huxton
   Archonet Ltd

On 31 Jul 2008, at 11:17AM, Miernik wrote:

> Theo Kramer <theo@flame.co.za> wrote:
>> file `which psql`
>> /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1
>> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
>> stripped
>
> miernik@tarnica:~$ file `which psql`
> /usr/bin/psql: symbolic link to `../share/postgresql-common/
> pg_wrapper'
> miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper
> /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script
> text executable
> miernik@tarnica:~$


Hmm - looks like you are on a debian or debian derivative. However,
pg_wrapper is
not psql. It invokes psql which is written in C. Once psql is invoked
pg_wrapper drops away.
--
Regards
Theo


Miernik wrote:

> BTW, doesn't there exist any tool does what "psql -c" does, but is
> written in plain C, not perl? I was looking for such psql replacement,
> but couldn't find any.

As others have noted, psql is written in C, and you're using a wrapper.

Assuming your're on Debian or similar you should be able to invoke the
real psql with:

/usr/lib/postgresql/8.3/bin/psql

psql is a C executable that uses libpq directly, and is really rather
low-overhead and fast.

As for how to issue multiple commands in one statement in a shell
script: one way is to use a here document instead of "-c". Eg:


psql <<__END__
UPDATE blah SET thingy = 7 WHERE otherthingy = 4;
DELETE FROM sometable WHERE criterion = -1;
__END__

You can of course wrap statements in explicit transaction BEGIN/COMMIT,
etc, as appropriate.

As you start doing more complex things, and especially once you start
wanting to have both good performance *and* good error handling, you'll
want to move away from sql scripts with psql and toward using perl,
python, or similar so you can use their native interfaces to libpq.

--
Craig Ringer

> Wanted to install pgbouncer, but it is broken currently in Debian. And
> why is it in contrib and not in main (speaking of Debian location)

Pgbouncer has worked very well for us. Wasn't available in default repos
for Ubuntu server when I did my original setup but installing from
source is quite easy. Running heavy load benchmarks on a web app
(high-rate simple-queries) I saw about a 10-fold improvement just by
using pgbouncer.

Cheers,
Steve