Обсуждение: How to reduce impact of a query.

Поиск

Список

Период

Сортировка

How to reduce impact of a query.

От

Howard Cole

Дата:

17 ноября 2008 г., 11:15:17

Hi,

I am running multiple 8.2 databases on a not-so-powerful W2K3 server -
and it runs great - for the majority of time. However I have some
monster tsearch queries which take a lot of processing and hog system
resources - especially disk.

I am not concerned with the amount of time or speed of the tsearch2
query, which is doing a lot of work, all I need to do is make sure that
the query does not impact all other databases and queries running on the
same machine.

The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non
standard parts of my postgresql.conf are as follows:
max_connections=100
shared_buffers=128MB
work_mem=4MB
maintenance_work_mem=256MB
max_fsm_pages=204800
max_fsm_relations=1500

Any tips appreciated.

Howard.
www.selestial.com

Re: How to reduce impact of a query.

От

"Isak Hansen"

Дата:

17 ноября 2008 г., 11:54:58

On Mon, Nov 17, 2008 at 1:15 PM, Howard Cole <howardnews@selestial.com> wrote:
> Hi,
>
> I am running multiple 8.2 databases on a not-so-powerful W2K3 server - and
> it runs great - for the majority of time. However I have some monster
> tsearch queries which take a lot of processing and hog system resources -
> especially disk.
>
> I am not concerned with the amount of time or speed of the tsearch2 query,
> which is doing a lot of work, all I need to do is make sure that the query
> does not impact all other databases and queries running on the same machine.
>
> The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non
> standard parts of my postgresql.conf are as follows:
> max_connections=100
> shared_buffers=128MB
> work_mem=4MB
> maintenance_work_mem=256MB
> max_fsm_pages=204800
> max_fsm_relations=1500
>
> Any tips appreciated.

Not directly related to your question, but if effective_cache_size
hasn't been changed from the default, adjusting it could improve
overall performance a fair bit.


Regards,
Isak

Re: How to reduce impact of a query.

От

Teodor Sigaev

Дата:

17 ноября 2008 г., 12:12:22

> The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non
> standard parts of my postgresql.conf are as follows:
> max_connections=100
> shared_buffers=128MB
> work_mem=4MB
> maintenance_work_mem=256MB
> max_fsm_pages=204800
> max_fsm_relations=1500
>
> Any tips appreciated.

Pls, show
1) effective_cache_size
2) The query
3) Output of EXPLAIN ANALYZE of query

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

17 ноября 2008 г., 14:17:55

Teodor Sigaev wrote:
>> The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks.
>> Non standard parts of my postgresql.conf are as follows:
>> max_connections=100
>> shared_buffers=128MB
>> work_mem=4MB
>> maintenance_work_mem=256MB
>> max_fsm_pages=204800
>> max_fsm_relations=1500
>>
>> Any tips appreciated.
>
> Pls, show
> 1) effective_cache_size
> 2) The query
> 3) Output of EXPLAIN ANALYZE of query
>
effective_cache_size is set at 128MB (the default).

A simplified version of the query is as follows

select email_id from email where to_tsquery('default','hannah') @@ fts;

Bitmap Heap Scan on email  (cost=12.50..80.25 rows=18 width=8) (actual
time=9073.878..39371.665 rows=6535 loops=1)
  Filter: ('''hannah'''::tsquery @@ fts)
  ->  Bitmap Index Scan on email_fts_index  (cost=0.00..12.49 rows=18
width=0) (actual time=9023.036..9023.036 rows=6696 loops=1)
        Index Cond: ('''hannah'''::tsquery @@ fts)
Total runtime: 39375.892 ms

The time that this query takes is not the issue, rather it is the impact
that it has on the server - effectively killing it for the 40 seconds
due to the heavy disk access.

Re: How to reduce impact of a query.

От

"Scott Marlowe"

Дата:

17 ноября 2008 г., 14:28:51

On Mon, Nov 17, 2008 at 8:17 AM, Howard Cole <howardnews@selestial.com> wrote:
> Teodor Sigaev wrote:
>>>
>>> The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non

Your entire disk io subsystem is a pair of hard drives.  I'm assuming
software RAID.

> The time that this query takes is not the issue, rather it is the impact
> that it has on the server - effectively killing it for the 40 seconds due to
> the heavy disk access.

You either need to invest more into your drive subsystem so it can
handle parallel load better, or you need to create a slave db with
slony or londiste so that the ugly queries hit the slave.

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

17 ноября 2008 г., 14:42:33

Scott Marlowe wrote:
> Your entire disk io subsystem is a pair of hard drives.  I'm assuming
> software RAID.
Correct.
>> The time that this query takes is not the issue, rather it is the impact
>> that it has on the server - effectively killing it for the 40 seconds due to
>> the heavy disk access.
>>
>
> You either need to invest more into your drive subsystem so it can
> handle parallel load better, or you need to create a slave db with
> slony or londiste so that the ugly queries hit the slave.
>
> ____
I take your point Scott. But short of the hardware upgrade, is there a
way to effectively throttle this query? If I reduce maintenance_work_mem
then the database dump/restore is slower but there is less overall
impact on the server. Is there some equivalent parameter on the server
to throttle general queries? It would be unfortunate if all queries
slowed down a bit, but a better outcome than having the entire server
hang for 40 seconds.

Re: How to reduce impact of a query.

От

"Scott Marlowe"

Дата:

17 ноября 2008 г., 15:05:09

On Mon, Nov 17, 2008 at 8:42 AM, Howard Cole <howardnews@selestial.com> wrote:
> Scott Marlowe wrote:
>>
>> Your entire disk io subsystem is a pair of hard drives.  I'm assuming
>> software RAID.
>
> Correct.
>>>
>>> The time that this query takes is not the issue, rather it is the impact
>>> that it has on the server - effectively killing it for the 40 seconds due
>>> to
>>> the heavy disk access.
>>>
>>
>> You either need to invest more into your drive subsystem so it can
>> handle parallel load better, or you need to create a slave db with
>> slony or londiste so that the ugly queries hit the slave.
>>
>> ____
>
> I take your point Scott. But short of the hardware upgrade, is there a way
> to effectively throttle this query? If I reduce maintenance_work_mem then
> the database dump/restore is slower but there is less overall impact on the
> server. Is there some equivalent parameter on the server to throttle general
> queries? It would be unfortunate if all queries slowed down a bit, but a
> better outcome than having the entire server hang for 40 seconds.

The problem is most likely you're I/O bound.  If one query is hitting
a table it can pull in data (sequentially) at 40 to 80 megabytes per
second.  Since most of your queries are small, they don't run into
each other a lot, so to speak. As soon as your big reporting query
hits it's likely hitting the drives much longer and getting in the way
of all the other queries.

You could add more ram, that should help since the kernel could then
fit much more of your data into ram and not be as dependent on your
drive subsystem.  Memory is cheap, even FBDIMMS are pretty inexpensive
nowadays for 4 to 8 gigs of ram in a server.

While you can nice the backend process with some kind of script it
doesn't generally affect io priority.  Some oses do allow process
priority to dictate io priority, but I'm pretty sure linux doesn't.
It still might help a little, but right now you need to either add
enough ram for the kernel to cache the majority of your data set or
get more drives.

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

17 ноября 2008 г., 15:41:57

Scott Marlowe wrote:
> The problem is most likely you're I/O bound.  If one query is hitting
> a table it can pull in data (sequentially) at 40 to 80 megabytes per
> second.  Since most of your queries are small, they don't run into
> each other a lot, so to speak. As soon as your big reporting query
> hits it's likely hitting the drives much longer and getting in the way
> of all the other queries.
>
>
Thanks for the input Scott. You are correct - I am IO bound, but only
for the query described. 99% of the time, my IO runs at 3% or less, even
during peak times, only this one query, which happens approximately 10
times a day grinds the system to a halt.

I am considering running this query in the background every couple of
minutes or so to force the tables/indexes into cache. Once the tables
are cached the query runs very quickly and there is no significant IO
impact.This is a bodge but hopefully should work.

Re: How to reduce impact of a query.

От

"Scott Marlowe"

Дата:

17 ноября 2008 г., 15:52:18

On Mon, Nov 17, 2008 at 9:36 AM, Howard Cole <howardnews@selestial.com> wrote:
> Scott Marlowe wrote:
>>
>> The problem is most likely you're I/O bound.  If one query is hitting
>> a table it can pull in data (sequentially) at 40 to 80 megabytes per
>> second.  Since most of your queries are small, they don't run into
>> each other a lot, so to speak. As soon as your big reporting query
>> hits it's likely hitting the drives much longer and getting in the way
>> of all the other queries.
>>
>>
>
> Thanks for the input Scott. You are correct - I am IO bound, but only for
> the query described. 99% of the time, my IO runs at 3% or less, even during
> peak times, only this one query, which happens approximately 10 times a day
> grinds the system to a halt.
>
> I am considering running this query in the background every couple of
> minutes or so to force the tables/indexes into cache. Once the tables are
> cached the query runs very quickly and there is no significant IO
> impact.This is a bodge but hopefully should work.

Simpler to probably just do:

select count(*) from sometable;

to force it into cache.  Buy some more memory and it'll likely stay in
memory without such tricks.

Best of luck on this.

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

17 ноября 2008 г., 16:29:30

Scott Marlowe wrote:
> Best of luck on this.
>
Thanks Scott.

Re: How to reduce impact of a query.

От

Craig Ringer

Дата:

18 ноября 2008 г., 04:53:47

Howard Cole wrote:

> If I reduce maintenance_work_mem
> then the database dump/restore is slower but there is less overall
> impact on the server.

There could be more impact, rather than less, if it forces a sort that'd
be done in memory out to disk instead. If you have dedicated storage on
separate spindles for disk sorts etc that might be OK, but it doesn't
sound like you do.

> Is there some equivalent parameter on the server
> to throttle general queries?

As far as I know there is no facility for this within PostgreSQL.

On a Linux (or maybe other UNIX too) machine you can use ionice to tell
the OS I/O scheduler to give that process lower priority for disk access
or rate limit it's disk access. Note that setting the CPU access
priority (`nice' level) will NOT help unless the server is CPU-limited,
and even then probably not much.

Maybe there is a similar facility to ionice for Windows, or the generic
process priority setting also affects disk I/O? You'll probably have to
do some research to find out.

I'm not sure there's any way to stop it pushing other useful data out of
shared_buffers, though. Anyone?

> It would be unfortunate if all queries
> slowed down a bit, but a better outcome than having the entire server
> hang for 40 seconds.

Are you sure there isn't a table locking issue involved - something your
batch query is doing that's causing other queries to block until that
transaction commits/rolls back? Check pg_locks:

  SELECT * FROM pg_locks;

Also: Try setting the transaction to readonly before running it, and see
if it succeeds.

  SET transaction_read_only = true;

This is probably a good thing to do anyway, as it *might* help the
database make better decisions.

--
Craig Ringer

Re: How to reduce impact of a query.

От

Craig Ringer

Дата:

18 ноября 2008 г., 05:10:25

Howard Cole wrote:

> Thanks for the input Scott. You are correct - I am IO bound, but only
> for the query described. 99% of the time, my IO runs at 3% or less, even
> during peak times, only this one query, which happens approximately 10
> times a day grinds the system to a halt.

If your I/O is normally that idle, surely one big query shouldn't stop
everything? Slow it down, maybe, but stop it?

Perhaps your RAID drivers, hardware (if any) or OS are deeply queuing
requests and/or doing lots of readahead, probably to make sequential I/O
benchmarks and random I/O throughput benchmarks that don't consider
request latency look better.

Consider turning down I/O queue depths if possible, and/or tuning
readhead to something suitable for your I/O loads. The latter will
require some work to find the right balance between random request
latency and sequential I/O throughput.

I had HUGE problems with a 3Ware 8500-8 RAID controller queuing requests
very deeply inside the Linux driver its self, rather than the OS's I/O
scheduler, causing high priority small I/O to be stuck behind long
series of low priority bulk reads and writes since the driver wasn't
aware of the OS's I/O priority mechanisms. I ended up modifying the
driver a little to reduce the queue depth since there wasn't a runtime
param for it, and the result was VASTLY improved I/O latency with only a
very small cost to throughput.

It'd be nice if the controller had the brains to be told "fetch this
block, and read this one too if it happens to be on the way" ... but if
there's anything like that out there, with hardware or driver level I/O
priority awareness, I haven't run into it yet. Alas.

By the way, it'd be REALLY nice if the postmaster had the facility to
set CPU and I/O nice levels for the backends individually (on Linux and
other supporting platforms). I'm currently using a user C function
linked into the backend to set the nice level, but it'd be a great thing
to have built-in. Would patches for this be considered, with the
functions being no-ops (with warnings?) on non-supporting platforms?

I also think it's a wee bit of a pity that there's no way to tell Pg
that a job isn't important, so data shouldn't be permitted to push much
else out of shared_buffers or the OS's cache. The latter can be ensured
to an extent, at least on Linux, with posix_fadvise(...,
POSIX_FADV_NOREUSE) or with madvise(...). The former is presumably
possible with proper work_mem (etc) settings, but I find it's the OS's
habit of filling the cache with gigabytes of data I won't need again
that's the real problem. I don't know how this'd work when interacting
with other backends doing other work with the same tables, though.

--
Craig Ringer

Re: How to reduce impact of a query.

От

"Scott Marlowe"

Дата:

18 ноября 2008 г., 05:26:56

On Mon, Nov 17, 2008 at 11:10 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:

> I also think it's a wee bit of a pity that there's no way to tell Pg
> that a job isn't important, so data shouldn't be permitted to push much
> else out of shared_buffers or the OS's cache. The latter can be ensured
> to an extent, at least on Linux, with posix_fadvise(...,
> POSIX_FADV_NOREUSE) or with madvise(...). The former is presumably
> possible with proper work_mem (etc) settings, but I find it's the OS's
> habit of filling the cache with gigabytes of data I won't need again
> that's the real problem. I don't know how this'd work when interacting
> with other backends doing other work with the same tables, though.

Agreed.

It could be that in the OP's case the data set for the big query is so
big it blows out share_buffers completely / most of the way, and then
I/O for the other data has to hit the drives instead of memory and
that's why they're so slow.

Re: How to reduce impact of a query.

От

Craig Ringer

Дата:

18 ноября 2008 г., 05:42:43

Scott Marlowe wrote:
> On Mon, Nov 17, 2008 at 11:10 PM, Craig Ringer
> <craig@postnewspapers.com.au> wrote:
>
>> I also think it's a wee bit of a pity that there's no way to tell Pg
>> that a job isn't important, so data shouldn't be permitted to push much
>> else out of shared_buffers or the OS's cache. The latter can be ensured
>> to an extent, at least on Linux, with posix_fadvise(...,
>> POSIX_FADV_NOREUSE) or with madvise(...).

Unfortunately, this isn't as useful as I'd initially hoped:

http://kerneltrap.org/node/7563 :

Quoting Torvalds:

"So O_DIRECT not only is a total disaster from a design standpoint (just
look at all the crap it results in), it also indirectly has hurt better
interfaces. For example, POSIX_FADV_NOREUSE (which _could_ be a useful
and clean interface to make sure we don't pollute memory unnecessarily
with cached pages after they are all done) ends up being a no-op ;/"

Darn.

> It could be that in the OP's case the data set for the big query is so
> big it blows out share_buffers completely / most of the way, and then
> I/O for the other data has to hit the drives instead of memory and
> that's why they're so slow.

shared_buffers alone shouldn't be *too* bad (right?), but if it's
pushing data out of the OS's cache as well (or the OS does a bad job of
caching disk reads) then that'd really hurt, yeah.

--
Craig Ringer

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

18 ноября 2008 г., 11:00:07

Craig Ringer wrote:
>> If I reduce maintenance_work_mem
>> then the database dump/restore is slower but there is less overall
>> impact on the server.
>>
>
> There could be more impact, rather than less, if it forces a sort that'd
> be done in memory out to disk instead. If you have dedicated storage on
> separate spindles for disk sorts etc that might be OK, but it doesn't
> sound like you do.
>
>
>> Is there some equivalent parameter on the server
>> to throttle general queries?
>>
>
> As far as I know there is no facility for this within PostgreSQL.
>
> On a Linux (or maybe other UNIX too) machine you can use ionice to tell
> the OS I/O scheduler to give that process lower priority for disk access
> or rate limit it's disk access. Note that setting the CPU access
> priority (`nice' level) will NOT help unless the server is CPU-limited,
> and even then probably not much.
>
>
>
Unfortunately I am on a windows platform. Plus I am running windows
software raid so there is little tweaking allowed.
>> It would be unfortunate if all queries
>> slowed down a bit, but a better outcome than having the entire server
>> hang for 40 seconds.
>>
>
> Are you sure there isn't a table locking issue involved - something your
> batch query is doing that's causing other queries to block until that
> transaction commits/rolls back? Check pg_locks:
>
>   SELECT * FROM pg_locks;
>
> Also: Try setting the transaction to readonly before running it, and see
> if it succeeds.
>
>   SET transaction_read_only = true;
>
> This is probably a good thing to do anyway, as it *might* help the
> database make better decisions.
>
>
>
I didn't even know you could do that! I can do this on a system wide
basis for all of my read only queries so I shall see if it makes a
difference. I'll check the locking issues but I was under the impression
that postgres was excellent for this? One of the reqular, smaller
queries does however use the same table so I shall check if this is
having a major impact. - If I set them both to read-only then that might
have the desired impact? Perhaps this is something arising from the
MVCC? If so is that something that can be switched off?

On another point, I tried setting up a scheduled query to force the
tables into cache and this had some strange effects... As I mentioned in
an earlier post I have multiple databases running on the same server so
I ran a select queries for all of them. This speeded up the queries as
expected with the cached data. However, two of the databases seemingly
refused to speed up - They always seemed to take 30+ seconds (again
eating up the machine IO resource). Even if I ran the query-as-a-job on
only one of these databases, it didn't seem to speed up. Perhaps there
is something wrong with these databases? The explain analyse seems to
come back with identical plans on these. Any ideas? (p.s. I am running
autovacuum)

Howard Cole
www.selestial.com

Re: How to reduce impact of a query.

От

Craig Ringer

Дата:

18 ноября 2008 г., 23:02:45

Howard Cole wrote:

> Unfortunately I am on a windows platform. Plus I am running windows
> software raid so there is little tweaking allowed.

Don't write the possibility off too quickly. The driver may well accept
parameters for readahead settings etc, either through a custom
configuration applet (might be a separate download), the driver
properties interface in Device Manager, or through manual registry settings.

Obviously you should be really careful with messing with anything like
this, especially the latter. Your backups are up to date and well tested
anyway, though, RIGHT?

I do suspect that your tuning options will be severely limited by your
I/O system. It's quite likely that the software RAID implementation
you're using is optimized to benchmark well on measurements of
throughput (I/Os per second and bytes per second) at the cost of request
latency. This won't help with your problems with queries interfering
with each other.

That doesn't mean it's not worth some research into tuning Windows
systems for I/O request latency. I don't know much about this, but I'd
be surprised if there wasn't info out there.

Personally I'd also consider getting a database server with a bit more
I/O grunt and room for tuning - even a Linux box with a four fast-ish
NCQ-capable disks in RAID 10 on a modern AHCI SATA controller would
probably help. It'd give you room to use tools like `ionice', for one
thing, plus you'd be using the `md' or `dmraid' drivers for software
RAID and would be able to tweak their behaviour significantly more than
you're likely to be able to with your RAID system under Windows. Then
again, I have lots more experience with Linux in a server role, so I'm
naturally inclined toward using it in preference to Windows. PostgreSQL
is much better tested, much more mature, and has much more community
support on UNIX/Linux platforms though, which is not a trivial factor to
consider either.

> I didn't even know you could do that! I can do this on a system wide
> basis for all of my read only queries so I shall see if it makes a
> difference. I'll check the locking issues but I was under the impression
> that postgres was excellent for this?

It generally is. You still need to think about how your concurrent
queries interact, though, as there are some operations that really must
be ordered. For example, if a query tries to UPDATE a record that a
concurrent query has already UPDATEd PostgreSQL will make the second
query block until the first one commits or rolls back. Doing otherwise
would cause exciting issues if the second query's UPDATE was in any way
dependent on the results of the first.

If you're only INSERTing and SELECTing then there's not much that'll
make queries interfere with each other with locks, unless you have
trigger-maintained materialized views, summary tables or the like
issuing UPDATEs behind the scenes.

> One of the reqular, smaller
> queries does however use the same table so I shall check if this is
> having a major impact. - If I set them both to read-only then that might
> have the desired impact?

I don't think it'll change anything much, but it might tell you (by
causing a query to fail) if it's trying to write anything. This might
help you detect a point at which the queries are interacting that you
might not have expected, such as an UPDATE being issued within a trigger
or by a rule.

> On another point, I tried setting up a scheduled query to force the
> tables into cache and this had some strange effects... As I mentioned in
> an earlier post I have multiple databases running on the same server so
> I ran a select queries for all of them. This speeded up the queries as
> expected with the cached data. However, two of the databases seemingly
> refused to speed up - They always seemed to take 30+ seconds (again
> eating up the machine IO resource). Even if I ran the query-as-a-job on
> only one of these databases, it didn't seem to speed up. Perhaps there
> is something wrong with these databases? The explain analyse seems to
> come back with identical plans on these. Any ideas? (p.s. I am running
> autovacuum)

I can't really help you on this one. I'd start by comparing table sizes
across the various databases, but beyond that ...

--
Craig Ringer

Re: How to reduce impact of a query.

От

Howard Cole

Дата:

19 ноября 2008 г., 21:11:36

Craig Ringer wrote:
> Howard Cole wrote:
>
>
>> Unfortunately I am on a windows platform. Plus I am running windows
>> software raid so there is little tweaking allowed.
>>
> Don't write the possibility off too quickly. The driver may well accept
> parameters for readahead settings etc, either through a custom
> configuration applet (might be a separate download), the driver
> properties interface in Device Manager, or through manual registry settings.
>
>
>
Hmm. It would probably be more economical to buy the bigger server!
>> I didn't even know you could do that! I can do this on a system wide
>> basis for all of my read only queries so I shall see if it makes a
>> difference. I'll check the locking issues but I was under the impression
>> that postgres was excellent for this?
>>
>
> It generally is. You still need to think about how your concurrent
> queries interact, though, as there are some operations that really must
> be ordered. For example, if a query tries to UPDATE a record that a
> concurrent query has already UPDATEd PostgreSQL will make the second
> query block until the first one commits or rolls back. Doing otherwise
> would cause exciting issues if the second query's UPDATE was in any way
> dependent on the results of the first.
>
> If you're only INSERTing and SELECTing then there's not much that'll
> make queries interfere with each other with locks, unless you have
> trigger-maintained materialized views, summary tables or the like
> issuing UPDATEs behind the scenes.
>
>
You'll be pleased to know that changing the transaction to read only
(SET TRANSACTION READ ONLY)
as you suggested seemed to prevent the server from slowing to a halt.
The query still takes an age  but not to the
detriment of all else. Many thanks for that tip.

Is it not possible that the query optimisation process should determine
that the query should be read only without
explicitly stating this?
>> One of the reqular, smaller
>> queries does however use the same table so I shall check if this is
>> having a major impact. - If I set them both to read-only then that might
>> have the desired impact?
>>
>
> I don't think it'll change anything much, but it might tell you (by
> causing a query to fail) if it's trying to write anything. This might
> help you detect a point at which the queries are interacting that you
> might not have expected, such as an UPDATE being issued within a trigger
> or by a rule.
>
>
I do have a trigger-on-update on this table, and it is possible that an
insert is being done in parallel.

However my immediate problems appears to be solved. Special thanks to
Craig and Scott.

Howard Cole
www.selestial.com

Re: transaction_read_only effects on performance, actual meaning (was: How to reduce impact of a query)

От

Craig Ringer

Дата:

20 ноября 2008 г., 10:24:51

Howard Cole wrote:

> You'll be pleased to know that changing the transaction to read only
> (SET TRANSACTION READ ONLY)
> as you suggested seemed to prevent the server from slowing to a halt.

Actually, I'm mostly surprised by that. I primarily suggested issuing
the command to ensure that if your transaction was doing UPDATes or
similar via triggers or function side-effects you weren't aware of, the
transaction would fail and help you pin-point the problem area.

I'm quite curious about why setting the transaction to readonly helped
its performance. Could it be to do with setting hint bits or something
along those lines, perhaps? Anyone?

> Is it not possible that the query optimisation process should determine
> that the query should be read only without
> explicitly stating this?

I don't think it can. You can issue a series of statements that make no
changes to the database, followed by an UPDATE/INSERT/DELETE/ALTER
TABLE/whatever. You could also issue a SELECT that invokes a function
(directly or via a rule) that modifies the database.

The database doesn't know what statements you will issue next.

That said, I thought the transaction_read_only flag had to be set before
any operations were issued, but it seems to be permitted even after
queries have been issued. In fact, it seems to be permitted after DML
has been issued in a transaction, which really confuses me. Is
transaction_read_only not actually applied to the transaction as a whole
like, say, transaction_isolation, but rather on a statement-by-statement
basis? If so, is it not somewhat misleadingly named?

I would not have expected the following to work:

CREATE TABLE a ( i serial primary key, j text );

BEGIN;

INSERT INTO a (j) VALUES ('oats'), ('wheat'), ('beans');

SET transaction_read_only = 1;

SELECT * FROM a;

COMMIT;

but it does.

--
Craig Ringer

Re: transaction_read_only effects on performance, actual meaning (was: How to reduce impact of a query)

От

Ivan Sergio Borgonovo

Дата:

20 ноября 2008 г., 11:16:31

On Thu, 20 Nov 2008 20:24:42 +0900
Craig Ringer <craig@postnewspapers.com.au> wrote:

> Howard Cole wrote:

> > You'll be pleased to know that changing the transaction to read
> > only (SET TRANSACTION READ ONLY)
> > as you suggested seemed to prevent the server from slowing to a
> > halt.

> Actually, I'm mostly surprised by that. I primarily suggested
> issuing the command to ensure that if your transaction was doing
> UPDATes or similar via triggers or function side-effects you
> weren't aware of, the transaction would fail and help you
> pin-point the problem area.

> I'm quite curious about why setting the transaction to readonly
> helped its performance. Could it be to do with setting hint bits
> or something along those lines, perhaps? Anyone?

Function happens in transactions. I'd be curious to know if
declaring inside a function SET TRANSACTION READ ONLY has any
meaning/effect once you've declared that function stable/immutable.

> I would not have expected the following to work:

> CREATE TABLE a ( i serial primary key, j text );
>
> BEGIN;
>
> INSERT INTO a (j) VALUES ('oats'), ('wheat'), ('beans');
>
> SET transaction_read_only = 1;
>
> SELECT * FROM a;
>
> COMMIT;
>
>
>
> but it does.

Interesting. Thank you for pointing it out.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: transaction_read_only effects on performance, actual meaning (was: How to reduce impact of a query)

От

Tom Lane

Дата:

20 ноября 2008 г., 15:12:05

Craig Ringer <craig@postnewspapers.com.au> writes:
> I'm quite curious about why setting the transaction to readonly helped
> its performance. Could it be to do with setting hint bits or something
> along those lines, perhaps? Anyone?

AFAIK that's actually a pessimization.  Declaring the transaction READ
ONLY does not activate any optimizations that wouldn't be there
otherwise, and it causes a few more cycles to be expended to check that
each statement is allowed under READ ONLY rules.  So I think that this
report is mistaken, and the performance change came from something else.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: How to reduce impact of a query.