Обсуждение: how delete/insert/update affects select performace?

Поиск
Список
Период
Сортировка

how delete/insert/update affects select performace?

От
"Anibal David Acosta"
Дата:

Hi!

 

I have a table not too big but with aprox. 5 millions of rows, this table must have 300 to 400 select per second. But also must have 10~20 delete/insert/update per second.

 

So, I need to know if the insert/delete/update really affect the select performance and how to deal with it.

 

The table structure is very simple:

 

account_id integer (PK)

service_id integer (PK)

enabled char(1)

 

The index created on this has the same 3 columns.

 

Most of time the table has more insert or delete than update, when update occur the column changed is enabled;

 

Thanks!

 

 

Re: how delete/insert/update affects select performace?

От
"Tomas Vondra"
Дата:
On 8 Září 2011, 14:51, Anibal David Acosta wrote:
> Hi!
>
>
>
> I have a table not too big but with aprox. 5 millions of rows, this table
> must have 300 to 400 select per second. But also must have 10~20
> delete/insert/update per second.
>
> So, I need to know if the insert/delete/update really affect the select
> performance and how to deal with it.

Yes, insert/update do affect query performance, because whenever a row is
modified a new copy is created. So the table might grow over time, and
bigger tables mean more data to read.

There are two ways to prevent this:

1) autovacuum - has to be configured properly (watch the table size and
number of rows, and if it grows then make it a bit more aggressive)

2) HOT

> The table structure is very simple:
>
> account_id integer (PK)
>
> service_id integer (PK)
>
> enabled char(1)
>
> The index created on this has the same 3 columns.
>
> Most of time the table has more insert or delete than update, when update
> occur the column changed is enabled;

So there's one index on all three columns? I'd remove the "enabled" from
the index, it's not going to help much I guess and it makes HOT possible
(the modified column must not be indexed). Plus there will be one less
index (the other two columns are already a PK, so there's a unique index).

Tomas


Re: how delete/insert/update affects select performace?

От
"Kevin Grittner"
Дата:
"Anibal David Acosta" <aa@devshock.com> wrote:

> I have a table not too big but with aprox. 5 millions of rows,
> this table must have 300 to 400 select per second. But also must
> have 10~20 delete/insert/update per second.
>
> So, I need to know if the insert/delete/update really affect the
> select performance and how to deal with it.

In addition to the advice from Tomas (which was all good) you should
be aware that depending on the version of PostgreSQL (which you
didn't mention), your hardware (which you didn't describe), and your
configuration (which you didn't show) the data modification can make
you vulnerable to a phenomenon where a checkpoint can cause a
blockage of all disk I/O for a matter of minutes, causing even
simple SELECT statements which normally run in under a millisecond
to run for minutes.  This is more likely to occur in a system which
has been aggressively tuned for maximum throughput -- you may need
to balance throughput needs against response time needs.

Every one of the last several major releases of PostgreSQL has
gotten better at preventing this problem, so your best protection
from it is to use a recent version.

There's a good chance that you won't run into this, but if you do,
you can generally correct it by reducing your shared_buffers setting
or making your background writer more aggressive.

-Kevin

Re: how delete/insert/update affects select performace?

От
"Anibal David Acosta"
Дата:
Postgres 9.0 on windows server 2008 r2
HW is a dell dual processor with 16gb of ram .

Tthe reason I add the enabled column to index is because a select won't need
to read the table to get this value

My select is : exists(select * from table where account_id=X and
service_id=Y and enabled='T')

So, do you think I must remove the enabled from index?

Thanks




-----Mensaje original-----
De: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Enviado el: jueves, 08 de septiembre de 2011 10:51 a.m.
Para: Anibal David Acosta; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how delete/insert/update affects select performace?

"Anibal David Acosta" <aa@devshock.com> wrote:

> I have a table not too big but with aprox. 5 millions of rows, this
> table must have 300 to 400 select per second. But also must have 10~20
> delete/insert/update per second.
>
> So, I need to know if the insert/delete/update really affect the
> select performance and how to deal with it.

In addition to the advice from Tomas (which was all good) you should be
aware that depending on the version of PostgreSQL (which you didn't
mention), your hardware (which you didn't describe), and your configuration
(which you didn't show) the data modification can make you vulnerable to a
phenomenon where a checkpoint can cause a blockage of all disk I/O for a
matter of minutes, causing even simple SELECT statements which normally run
in under a millisecond to run for minutes.  This is more likely to occur in
a system which has been aggressively tuned for maximum throughput -- you may
need to balance throughput needs against response time needs.

Every one of the last several major releases of PostgreSQL has gotten better
at preventing this problem, so your best protection from it is to use a
recent version.

There's a good chance that you won't run into this, but if you do, you can
generally correct it by reducing your shared_buffers setting or making your
background writer more aggressive.

-Kevin


Re: how delete/insert/update affects select performace?

От
"Kevin Grittner"
Дата:
"Anibal David Acosta" <aa@devshock.com> wrote:

> Tthe reason I add the enabled column to index is because a select
> won't need to read the table to get this value

That's not true in PostgreSQL, although there is an effort to
support that optimization, at least to some degree.  In all current
versions of PostgreSQL, it will always need to read the heap to
determine whether the index entry is pointing at a version of the
row which is visible to your transaction.  Adding the enabled column
to an index will prevent faster HOT updates to that column.

> My select is : exists(select * from table where account_id=X and
> service_id=Y and enabled='T')

On the other hand, if you have very many rows where enabled is not
'T', and you are generally searching for where enabled = 'T', you
might want a partial index (an index with a WHERE clause in its
definition).  If enabled only has two states, you will probably get
better performance using a boolean column.

-Kevin

Re: how delete/insert/update affects select performace?

От
"Anibal David Acosta"
Дата:
>On the other hand, if you have very many rows where enabled is not 'T', and
you are generally searching for where enabled = 'T', you might want a
partial index (an index with a WHERE clause in its definition).  If >enabled
only has two states, you will probably get better performance using a
boolean column.

Maybe 1% or 2% are enabled='F' all others are 'T'

Another question Kevin (thanks for your time)

When an insert/update occur, the index is "reindexed" how index deals with
new or deleted rows.

Whay happened with select, it wait that index "reindex" or rebuild or
something? Or just select view another "version" of the table?

Thanks




-----Mensaje original-----
De: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Enviado el: jueves, 08 de septiembre de 2011 01:01 p.m.
Para: Anibal David Acosta; pgsql-performance@postgresql.org
CC: 'Tomas Vondra'
Asunto: RE: [PERFORM] how delete/insert/update affects select performace?

"Anibal David Acosta" <aa@devshock.com> wrote:

> Tthe reason I add the enabled column to index is because a select
> won't need to read the table to get this value

That's not true in PostgreSQL, although there is an effort to support that
optimization, at least to some degree.  In all current versions of
PostgreSQL, it will always need to read the heap to determine whether the
index entry is pointing at a version of the row which is visible to your
transaction.  Adding the enabled column to an index will prevent faster HOT
updates to that column.

> My select is : exists(select * from table where account_id=X and
> service_id=Y and enabled='T')

On the other hand, if you have very many rows where enabled is not 'T', and
you are generally searching for where enabled = 'T', you might want a
partial index (an index with a WHERE clause in its definition).  If enabled
only has two states, you will probably get better performance using a
boolean column.

-Kevin


Re: how delete/insert/update affects select performace?

От
"Kevin Grittner"
Дата:
"Anibal David Acosta" <aa@devshock.com> wrote:

> Maybe 1% or 2% are enabled='F' all others are 'T'

Then an index on this column is almost certainly going to be
counter-productive.  The only index on this column which *might*
make sense is WHERE enabled = 'F', and only if you run queries for
that often enough to outweigh the added maintenance cost.  If it's
always one of those two values, I would use boolean (with NOT NULL
if appropriate).

> When an insert/update occur, the index is "reindexed" how index
> deals with new or deleted rows.

Ignoring details of HOT updates, where less work is done if no
indexed column is updated and there is room for the new version of
the row (tuple) on the same page, an UPDATE is almost exactly like a
DELETE and an INSERT in the same transaction.  A new tuple (from an
INSERT or UPDATE) is added to the index(es), and if you query
through the index, it will see entries for both the old and new
versions of the row; this is why it must visit both versions -- to
check tuple visibility.  Eventually the old tuples and their index
entries are cleaned up through a "vacuum" process (autovacuum or an
explicit VACUUM command).  Until then queries do extra work visiting
and ignoring the old tuples.  (That is why people who turn off
autovacuum almost always regret it later.)

> Whay happened with select, it wait that index "reindex" or rebuild
> or something? Or just select view another "version" of the table?

The new information is immediately *added*, but there may be other
transactions which should still see the old state of the table, so
cleanup of old tuples and their index entries must wait for those
transactions to complete.

See this for more information:

http://www.postgresql.org/docs/9.0/interactive/mvcc.html

-Kevin

Re: how delete/insert/update affects select performace?

От
Greg Smith
Дата:
On 09/08/2011 12:40 PM, Anibal David Acosta wrote:
> Postgres 9.0 on windows server 2008 r2
> HW is a dell dual processor with 16gb of ram .
>

The general guidelines for Windows servers such as
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server recommend
a fairly small setting for the shared_buffers parameters on Windows--no
more than 512MB.  That makes your server a bit less likely to run in the
nasty checkpoint spike issues Kevin was alluding to.  I don't think
we've seen any reports of that on Windows.  The problem is worst on Linux.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: how delete/insert/update affects select performace?

От
"Anibal David Acosta"
Дата:
Even if I have a server with 16GB of ram, I must set the shared_buffer to
512MB on windows?

In the wiki page they talk about 1/4 of ram, in my case that represent a
shared_buffer = 4GB, that is incorrect?

I have 8 GB of ram for each processor, each processor is a quad core with
hyperthreading, that means 16 "processors" o something like that. Windows
show 16 in task manager.

If I can't configure more than 512MB of shared_buffer all other RAM is
unnecessary?

Thanks for your time.



-----Mensaje original-----
De: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] En nombre de Greg Smith
Enviado el: jueves, 08 de septiembre de 2011 09:29 p.m.
Para: pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how delete/insert/update affects select performace?

On 09/08/2011 12:40 PM, Anibal David Acosta wrote:
> Postgres 9.0 on windows server 2008 r2 HW is a dell dual processor
> with 16gb of ram .
>

The general guidelines for Windows servers such as
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server recommend a
fairly small setting for the shared_buffers parameters on Windows--no more
than 512MB.  That makes your server a bit less likely to run in the nasty
checkpoint spike issues Kevin was alluding to.  I don't think we've seen any
reports of that on Windows.  The problem is worst on Linux.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance