Обсуждение: Tweaking PG (again)

Поиск
Список
Период
Сортировка

Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
Hi.

I had tweaked my PG 8.2.6 with the very kind help of this list a
couple years ago. It has been working fine, until recently. Not sure
if it is after the update to 8.3 or because my DB has been growing,
but the db is very slow now and the cache doesn't seem enough.

~ > free -m
total used free shared buffers cached
Mem: 4051 4033 18 0 6 2576
-/+ buffers/cache: 1450 2601
Swap: 2047 43 2003

Some of the SQL queries that were super fast (indexed with LIMIT 1!)
are now slow too.

What is a good starting point for me apart from checking the slow SQL?
Because almost every SQL is now slow.

I can restart the PG server and it is fast for a little while after
that but then the buffer fills up I think.

It's a CentOS server, Pentium Core2Duo dual processor, 6MB RAM. Same
server runs Apache (low mem consumption), MySQL (for really small web
stuff, not much load) and PGSQL (huge load). I can devote a lot of
memory to PG, no problem.

Autovacuum is on but I also manually vacuum the big tables by crontab
-- per hour.

This one is not a huge DB, about 5GB right now. The tables are as such:


   relname    | rowcnt  | inserted | updated | deleted
---------------+---------+----------+---------+---------
 books         | 8622136 |  1852965 |  938229 |   16304
 checkout_count|  261317 |     9834 |  116664 |    1225
 subscribers   |   10180 |     1267 |   79623 |       0
 interesting   |    4196 |       53 |   54774 |       0
 pg_statistic  |     411 |        0 |   43104 |       0
 books_deleted |     896 |    16350 |       0 |   11473
 users         |   62865 |     2428 |    2493 |       0
 pg_attribute  |    1844 |     1322 |     575 |    1321
(8 rows)




Below are my CONF settings:



listen_addresses = 'localhost,*'
max_connections              = 300
shared_buffers               = 330MB
effective_cache_size         = 512000
max_fsm_relations            = 100
max_fsm_pages                = 300000

work_mem                     = 20MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1

autovacuum                   = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay            = 20
autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01



Any pointers or advice MUCH appreciated! THANKS.

Re: Tweaking PG (again)

От
"Scott Marlowe"
Дата:
On Thu, Nov 13, 2008 at 11:59 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Hi.
>
> I had tweaked my PG 8.2.6 with the very kind help of this list a
> couple years ago. It has been working fine, until recently. Not sure
> if it is after the update to 8.3 or because my DB has been growing,
> but the db is very slow now and the cache doesn't seem enough.

Everything you posted looks pretty normal.  I'd find the slowest
queries and post explain analyze to see what's happening.

Re: Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
On Fri, Nov 14, 2008 at 3:10 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Nov 13, 2008 at 11:59 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Hi.
>>
>> I had tweaked my PG 8.2.6 with the very kind help of this list a
>> couple years ago. It has been working fine, until recently. Not sure
>> if it is after the update to 8.3 or because my DB has been growing,
>> but the db is very slow now and the cache doesn't seem enough.
>
> Everything you posted looks pretty normal.  I'd find the slowest
> queries and post explain analyze to see what's happening.



Thanks Scott. That is a relief.

My logs are full of

(1) One SELECT sql
(2) And INSERT and UPDATE sql to my main table, called "books"

The definition of "books" is as follows --



                                Table "public.books"
        Column         |            Type             |
Modifiers
-----------------------+-----------------------------+------------------------------
 id                    | bigint                      | not null
 book_id               | character varying(10)       | not null
 alias                 | character varying(20)       | not null
 url                   | text                        | not null
 user_known            | smallint                    | not null default 0
 user_id               | character varying(45)       | not null
 url_encrypted         | character(40)               | default ''::bpchar
 title                 | character varying(500)      |
 status                | character(1)                | default 'Y'::bpchar
 modify_date           | timestamp without time zone |
Indexes:
    "books2_pkey" PRIMARY KEY, btree (id)
    "books2_alias_key" UNIQUE, btree (alias) WITH (fillfactor=75)
    "new_idx_books_userid" btree (user_id) WITH (fillfactor=70)
    "new_idx_modify_date" btree (modify_date) WITH (fillfactor=75)
    "new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE
user_known = 1
Check constraints:
    "books2_id_check" CHECK (id > 0)
    "books2_url_check" CHECK (url <> ''::text)
    "books2_user_id_check" CHECK (user_id::text <> ''::text)
    "books_alias_check" CHECK (alias::text ~ '[-~a-z0-9_]'::text)




(1) The culprit SELECT sql is (note that "MYUSER" in this example can
be an IP address) --




explain analyze SELECT alias, id, title, private_key, aliasEntered
 FROM books
 WHERE user_id = 'MYUSER'  AND url_encrypted =
'bed59c2f2d92e306d4481a276bd8a78c4b0532d4' ;


                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using new_idx_books_userid on books  (cost=0.00..493427.14
rows=2 width=31) (actual time=0.428..8400.299 rows=1 loops=1)
   Index Cond: ((user_id)::text = 'MYUSER'::text)
   Filter: (url_encrypted = 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4'::bpchar)
 Total runtime: 8400.349 ms
(4 rows)




(2) The culprit INSERT sql is as follows


explain analyze
INSERT INTO books (id, book_id, url, user_known, user_id,
url_encrypted, alias, title, private_key, status, modify_date)
            values
            (
              9107579
             ,'5f7gb'
             ,'http://www.google.com'
             ,'0'
             ,'MYUSER'
             ,'73684da5ef05d9589f95d8ba9e4429ea062549c7'
             ,'5f7gb'
             ,''
             ,''
             ,'Y'
             ,now()
            )
;



                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.03 rows=1 width=0) (actual time=0.017..0.022
rows=1 loops=1)
 Total runtime: 106.747 ms
(2 rows)

Time: 3421.424 ms

Re: Tweaking PG (again)

От
"Scott Marlowe"
Дата:
On Thu, Nov 13, 2008 at 2:04 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On Fri, Nov 14, 2008 at 3:10 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Thu, Nov 13, 2008 at 11:59 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>>> Hi.
>>>
>>> I had tweaked my PG 8.2.6 with the very kind help of this list a
>>> couple years ago. It has been working fine, until recently. Not sure
>>> if it is after the update to 8.3 or because my DB has been growing,
>>> but the db is very slow now and the cache doesn't seem enough.
>>
>> Everything you posted looks pretty normal.  I'd find the slowest
>> queries and post explain analyze to see what's happening.
>
> My logs are full of
>
> (1) One SELECT sql
> (2) And INSERT and UPDATE sql to my main table, called "books"
>
> The definition of "books" is as follows --
>
>
>
>                                Table "public.books"
>        Column         |            Type             |
> Modifiers
> -----------------------+-----------------------------+------------------------------
>  id                    | bigint                      | not null
>  book_id               | character varying(10)       | not null
>  alias                 | character varying(20)       | not null
>  url                   | text                        | not null
>  user_known            | smallint                    | not null default 0
>  user_id               | character varying(45)       | not null
>  url_encrypted         | character(40)               | default ''::bpchar
>  title                 | character varying(500)      |
>  status                | character(1)                | default 'Y'::bpchar
>  modify_date           | timestamp without time zone |
> Indexes:
>    "books2_pkey" PRIMARY KEY, btree (id)
>    "books2_alias_key" UNIQUE, btree (alias) WITH (fillfactor=75)
>    "new_idx_books_userid" btree (user_id) WITH (fillfactor=70)
>    "new_idx_modify_date" btree (modify_date) WITH (fillfactor=75)
>    "new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE
> user_known = 1
> Check constraints:
>    "books2_id_check" CHECK (id > 0)
>    "books2_url_check" CHECK (url <> ''::text)
>    "books2_user_id_check" CHECK (user_id::text <> ''::text)
>    "books_alias_check" CHECK (alias::text ~ '[-~a-z0-9_]'::text)
>
>
>
>
> (1) The culprit SELECT sql is (note that "MYUSER" in this example can
> be an IP address) --

So, it can be, but might not be?  Darn,  If it was always an ip I'd
suggest changing types.

> explain analyze SELECT alias, id, title, private_key, aliasEntered
>  FROM books
>  WHERE user_id = 'MYUSER'  AND url_encrypted =
> 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4' ;
>
>                                                        QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
> Index Scan using new_idx_books_userid on books  (cost=0.00..493427.14
> rows=2 width=31) (actual time=0.428..8400.299 rows=1 loops=1)
>   Index Cond: ((user_id)::text = 'MYUSER'::text)
>   Filter: (url_encrypted = 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4'::bpchar)
>  Total runtime: 8400.349 ms
> (4 rows)

8.4 seconds is a very long time to spend looking up a single record.
Is this table bloated?  What does

vacuum verbose books;

say about it?  Look for a line like this:

There were 243 unused item pointers

> (2) The culprit INSERT sql is as follows
>
> explain analyze
> INSERT INTO books (id, book_id, url, user_known, user_id,
> url_encrypted, alias, title, private_key, status, modify_date)
>                values
>                (
>                  9107579
>                 ,'5f7gb'
>                 ,'http://www.google.com'
>                 ,'0'
>                 ,'MYUSER'
>                 ,'73684da5ef05d9589f95d8ba9e4429ea062549c7'
>                 ,'5f7gb'
>                 ,''
>                 ,''
>                 ,'Y'
>                 ,now()
>                )
> ;
>
>                                     QUERY PLAN
> ------------------------------------------------------------------------------------
>  Result  (cost=0.00..0.03 rows=1 width=0) (actual time=0.017..0.022
> rows=1 loops=1)
>  Total runtime: 106.747 ms
> (2 rows)
>
> Time: 3421.424 ms

When the total run time measured by explain analyze is much lower than
the actual run time, this is usually either a trigger firing / fk
issue, or you've got a really expensive (cpu wise) time function on
your OS.  Since there's only one loop here, I'm gonna guess that
you've got some FK stuff going on.  Got a related fk/pk field in
another table that needs an index?  I thought that 8.3 gave some info
on that stuff in explain analyze, but I'm not really sure.

Fwd: Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
Thanks Scott. Responses below.



>>
>> (1) The culprit SELECT sql is (note that "MYUSER" in this example can
>> be an IP address) --
>
> So, it can be, but might not be?  Darn,  If it was always an ip I'd
> suggest changing types.
>


Yes, it can either be a registered USER ID or an IP address. I thought
of having two separate fields, where one is null or the other, and
then indexing the concatenation of those two which I could use for the
SQL. But it's difficult to revamp whole code. Instead of that, I have
"user_known". If user_known is 1, then it's a user_id, otherwise it's
an IP address. This is quicker than regexping for IP pattern
everytime.



>> explain analyze SELECT alias, id, title, private_key, aliasEntered
>>  FROM books
>>  WHERE user_id = 'MYUSER'  AND url_encrypted =
>> 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4' ;
>>
>>                                                        QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------
>> Index Scan using new_idx_books_userid on books  (cost=0.00..493427.14
>> rows=2 width=31) (actual time=0.428..8400.299 rows=1 loops=1)
>>   Index Cond: ((user_id)::text = 'MYUSER'::text)
>>   Filter: (url_encrypted = 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4'::bpchar)
>>  Total runtime: 8400.349 ms
>> (4 rows)
>
> 8.4 seconds is a very long time to spend looking up a single record.
> Is this table bloated?  What does
>
> vacuum verbose books;
>
> say about it?  Look for a line like this:
>
> There were 243 unused item pointers



Thanks but this table "books" has autovac on, and it's manually
vacuumed every hour!





>> (2) The culprit INSERT sql is as follows
>>
>> explain analyze
>> INSERT INTO books (id, book_id, url, user_known, user_id,
>> url_encrypted, alias, title, private_key, status, modify_date)
>>                values
>>                (
>>                  9107579
>>                 ,'5f7gb'
>>                 ,'http://www.google.com'
>>                 ,'0'
>>                 ,'MYUSER'
>>                 ,'73684da5ef05d9589f95d8ba9e4429ea062549c7'
>>                 ,'5f7gb'
>>                 ,''
>>                 ,''
>>                 ,'Y'
>>                 ,now()
>>                )
>> ;
>>
>>                                     QUERY PLAN
>> ------------------------------------------------------------------------------------
>>  Result  (cost=0.00..0.03 rows=1 width=0) (actual time=0.017..0.022
>> rows=1 loops=1)
>>  Total runtime: 106.747 ms
>> (2 rows)
>>
>> Time: 3421.424 ms
>
> When the total run time measured by explain analyze is much lower than
> the actual run time, this is usually either a trigger firing / fk
> issue, or you've got a really expensive (cpu wise) time function on
> your OS.  Since there's only one loop here, I'm gonna guess that
> you've got some FK stuff going on.  Got a related fk/pk field in
> another table that needs an index?  I thought that 8.3 gave some info
> on that stuff in explain analyze, but I'm not really sure.



Yes there is a table VISITCOUNT that has a foreign key on books(id).
But why should that be invoked? Shouldn't that fk be called into
question only when a row is being inserted/updated in VISITCOUNT table
and not BOOKS?

Thanks!

Re: Fwd: Tweaking PG (again)

От
Alvaro Herrera
Дата:
Phoenix Kiula escribió:

> >> ----------------------------------------------------------------------------------------------------------------
> >> Index Scan using new_idx_books_userid on books  (cost=0.00..493427.14
> >> rows=2 width=31) (actual time=0.428..8400.299 rows=1 loops=1)
> >>   Index Cond: ((user_id)::text = 'MYUSER'::text)
> >>   Filter: (url_encrypted = 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4'::bpchar)
> >>  Total runtime: 8400.349 ms
> >> (4 rows)
> >
> > 8.4 seconds is a very long time to spend looking up a single record.
> > Is this table bloated?  What does
> >
> > vacuum verbose books;

> Thanks but this table "books" has autovac on, and it's manually
> vacuumed every hour!

Perhaps try reindexing it.  What kind of index is new_idx_books_userid
anyway?  btree?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Tweaking PG (again)

От
Tom Lane
Дата:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> On Thu, Nov 13, 2008 at 2:04 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> explain analyze SELECT alias, id, title, private_key, aliasEntered
>> FROM books
>> WHERE user_id = 'MYUSER'  AND url_encrypted =
>> 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4' ;
>>
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------
>> Index Scan using new_idx_books_userid on books  (cost=0.00..493427.14
>> rows=2 width=31) (actual time=0.428..8400.299 rows=1 loops=1)
>> Index Cond: ((user_id)::text = 'MYUSER'::text)
>> Filter: (url_encrypted = 'bed59c2f2d92e306d4481a276bd8a78c4b0532d4'::bpchar)
>> Total runtime: 8400.349 ms
>> (4 rows)

> 8.4 seconds is a very long time to spend looking up a single record.

Yeah, but note that the planner knows darn well that this will be an
expensive query --- 493427.14 cost units estimated to fetch 2 rows!

My interpretation is that the condition on user_id is horribly
nonselective (at least for this value of user_id) and the planner knows
it.  The condition on url_encrypted *is* selective, and the planner
knows that too, but there's nothing it can do about it --- the best
available plan is to fetch all the rows matching by user_id and then
filter them on url_encrypted.

Consider creating an index on url_encrypted if you need this type of
query to go fast.

            regards, tom lane

Re: Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
On Fri, Nov 14, 2008 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:



> Yeah, but note that the planner knows darn well that this will be an
> expensive query --- 493427.14 cost units estimated to fetch 2 rows!
>
> My interpretation is that the condition on user_id is horribly
> nonselective (at least for this value of user_id) and the planner knows
> it.  The condition on url_encrypted *is* selective, and the planner
> knows that too, but there's nothing it can do about it --- the best
> available plan is to fetch all the rows matching by user_id and then
> filter them on url_encrypted.
>
> Consider creating an index on url_encrypted if you need this type of
> query to go fast.


Thanks Tom.

Yes, I have considered indexing url_encrypted too. That would be a
very large index though, space-wise, but may increase the speed. You
are right that only "user_id" is not too selective.

The two together (user_id, url_encrypted) should be unique in my case.
So I can now think of making a unique index with these two fields.

Questions:

1. If I have a unique index on (user_id, url_encrypted), then will
queries asking only for user_id also use this index? Or should i
simply have separate indexes on user_id and url_encrypted? I vaguely
recall reading somewhere that compound indexes may have been useful in
MySQL but according to PG's more advanced planner having two separate
indexes on the columns works better.

2. Is there a production equivalent of REINDEX? Last time I tried
CREATE INDEX CONCURRENTLY overnight, by the morning it had croaked
with these errors:

---
ERROR:  deadlock detected
DETAIL:  Process 6663 waits for ShareLock on transaction 999189656;
blocked by process 31768.
Process 31768 waits for ShareUpdateExclusiveLock on relation 50002 of
database 41249; blocked by process 6663
---

Naturally, when I see the table now, this attempted index is marked
"INVALID". The manual says I should either drop it and recreate it, or
REINDEX it again.  But this is a production database on a semi-busy
website and cannot take time off. What is my recourse for a kind of
live REINDEX? Can I create a new index without locking the database?

3. Basically, design wise, I use url_encrypted to check if a user_id
already has a url associated with him. This kind of a unique
constraint check (user_id, url_encrypted). Used only when INSERTing a
new record -- if the user has it already, then simply update values if
needed and return the current row. Otherwise, INSERT new row. I do
this check+update+insert with three SQLs. Is there one way of doing it
in SQL in PG?

Many thanks for all the help thus far!

Re: Tweaking PG (again)

От
Martijn van Oosterhout
Дата:
On Fri, Nov 14, 2008 at 12:57:32PM +0800, Phoenix Kiula wrote:
> 1. If I have a unique index on (user_id, url_encrypted), then will
> queries asking only for user_id also use this index? Or should i
> simply have separate indexes on user_id and url_encrypted? I vaguely
> recall reading somewhere that compound indexes may have been useful in
> MySQL but according to PG's more advanced planner having two separate
> indexes on the columns works better.

Yes. Maybe. If you build a combined index (user_id, url_encrypted) then
it can't be used in query that only look for url_encrypted. So it
depends on your queries. If you want to be able to search for
url_encrypted by itself sometimes, it might be an idea to have two
indexes.

> 2. Is there a production equivalent of REINDEX? Last time I tried
> CREATE INDEX CONCURRENTLY overnight, by the morning it had croaked
> with these errors:

Sorry, can't help you here...

> 3. Basically, design wise, I use url_encrypted to check if a user_id
> already has a url associated with him. This kind of a unique
> constraint check (user_id, url_encrypted). Used only when INSERTing a
> new record -- if the user has it already, then simply update values if
> needed and return the current row. Otherwise, INSERT new row. I do
> this check+update+insert with three SQLs. Is there one way of doing it
> in SQL in PG?

Stored procedure would do it. Alternativly you can reduce the number of
queries by one, by simply doing the UPDATE and if nothing is updated,
then doing the insert.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: Fwd: Tweaking PG (again)

От
tv@fuzzy.cz
Дата:
>> 8.4 seconds is a very long time to spend looking up a single record.
>> Is this table bloated?  What does
>>
>> vacuum verbose books;
>>
>> say about it?  Look for a line like this:
>>
>> There were 243 unused item pointers
>
> Thanks but this table "books" has autovac on, and it's manually
> vacuumed every hour!

The table may still be bloated - the default autovacuum parameters may not
be agressive enough for heavily modified tables.

> Yes there is a table VISITCOUNT that has a foreign key on books(id).
> But why should that be invoked? Shouldn't that fk be called into
> question only when a row is being inserted/updated in VISITCOUNT table
> and not BOOKS?

I don't see a reason to check the VISITCOUNT -> books(id) foreign key, as
it is an insert. Are there any foreign keys referencing other tables (from
the books table)? According to the table structure you've sent earlier,
there are no such columns.

Try to determine whether the insert is CPU or I/O bound - run some
monitoring tool (dstat for example), run the insert and observe if there
is a lot of CPU activity, if the CPU waits for I/O operations to complete,
and if the I/O operations are mostly reads or writes. This will give you
an overview of the total I/O activity of the system.

BTW have you checked the postgresql.log? Are there any clues regarding the
insert (i.e. logs at the same time)? Don't forget to enable checkpoint
warnings in the config!

regards
Tomas


Re: Fwd: Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
Thanks Tomas.


> The table may still be bloated - the default autovacuum parameters may not
> be agressive enough for heavily modified tables.


My autovacuum settings:


autovacuum                   = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay            = 20
autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01
checkpoint_warning           = 3600
random_page_cost             = 1


Is this not aggressive enough?

And I reindexed all my indexes on the main "books" table, and then ran
a vacuum verbose, but I still see this:

----
INFO:  "links": found 475 removable, 8684150 nonremovable row versions
in 472276 pages
DETAIL:  95 dead row versions cannot be removed yet.
There were 2132065 unused item pointers.
529 pages contain useful free space.
----

95 dead rows are an improvement, but after a fresh reindex shouldn't I
have none? Each reindex took about 600 seconds on average (some
longer) so the tables data may have changed, but how can I have
"2132065 unused item pointers"?




> I don't see a reason to check the VISITCOUNT -> books(id) foreign key, as
> it is an insert. Are there any foreign keys referencing other tables (from
> the books table)? According to the table structure you've sent earlier,
> there are no such columns.


No, no FK from books to elsewhere. I have reindexed all indexes in all
tables anyway.



> BTW have you checked the postgresql.log? Are there any clues regarding the
> insert (i.e. logs at the same time)? Don't forget to enable checkpoint
> warnings in the config!



Currently, with the settings above and a new index on "url_encrypted"
(took a while but seems worth it) the DB is running beter and the
postgresql.log has nothing at all! There are no logs. I think the
system is humming. But I am not sure if this is a false sense of
stability because the vacuum results of "books" seems to suggest so
many unused item pointers. Should I be worried?

Re: Fwd: Tweaking PG (again)

От
"Phoenix Kiula"
Дата:
On Fri, Nov 14, 2008 at 9:22 PM, Hoover, Jeffrey <jhoover@jcvi.org> wrote:
> "There were 2132065 unused item pointers."
>
> Looks to me like a large update or insert failed on this table....



Thanks. So what can I do? I have reindexed all indexes already!

Re: Fwd: Tweaking PG (again)

От
tv@fuzzy.cz
Дата:
> Thanks Tomas.
>
>
>> The table may still be bloated - the default autovacuum parameters may
>> not
>> be agressive enough for heavily modified tables.
>
>
> My autovacuum settings:
>
>
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
> vacuum_cost_delay            = 20
> autovacuum_naptime           = 10
> stats_start_collector        = on
> stats_row_level              = on
> autovacuum_vacuum_threshold  = 75
> autovacuum_analyze_threshold = 25
> autovacuum_analyze_scale_factor  = 0.02
> autovacuum_vacuum_scale_factor   = 0.01
> checkpoint_warning           = 3600
> random_page_cost             = 1
>
> Is this not aggressive enough?

The settings seems fine to me ...

> And I reindexed all my indexes on the main "books" table, and then ran
> a vacuum verbose, but I still see this:

It's not about about indexes in the first place, it's about the table. If
there are dead rows, the table will occupy more space (and so will the
index).

Try clustering the table according to the primary key - that will reclaim
the free space occupied by dead rows, and sort it according to the index.
But be careful, as it is quite expensive and an ACCESS EXCLUSIVE lock is
acquired on the table (both reads and writes are blocked).

> ----
> INFO:  "links": found 475 removable, 8684150 nonremovable row versions
> in 472276 pages
> DETAIL:  95 dead row versions cannot be removed yet.
> There were 2132065 unused item pointers.
> 529 pages contain useful free space.

I thought you had problems with the "books" table, right? But this output
suggests it's related to "links" table, not "books" (see the first INFO
table).

Anyway, the table has about 25% of reclaimed space - that's the 'unused
item pointers' value. There are 8.6 million of rows in total, 2.1 million
of them are reclaimed by VACUUM and marked as free (but the space is still
occupied by the relation). I believe this is related to max_fsm_pages /
max_fsm_relations, but this is beyond my knowledge.

But I don't understand why there is so much free space - why it is not
reused for new rows etc.

> 95 dead rows are an improvement, but after a fresh reindex shouldn't I
> have none? Each reindex took about 600 seconds on average (some
> longer) so the tables data may have changed, but how can I have
> "2132065 unused item pointers"?

The 95 dead rows are not a problem - AFAIK it just means the row was
modified (updated / deleted), but may not be reclaimed yet (maybe the
transaction that modified the row is still running).

regards
Tomas


Re: Tweaking PG (again)

От
tv@fuzzy.cz
Дата:
> On Fri, Nov 14, 2008 at 12:57:32PM +0800, Phoenix Kiula wrote:
>> 1. If I have a unique index on (user_id, url_encrypted), then will
>> queries asking only for user_id also use this index? Or should i
>> simply have separate indexes on user_id and url_encrypted? I vaguely
>> recall reading somewhere that compound indexes may have been useful in
>> MySQL but according to PG's more advanced planner having two separate
>> indexes on the columns works better.
>
> Yes. Maybe. If you build a combined index (user_id, url_encrypted) then
> it can't be used in query that only look for url_encrypted. So it
> depends on your queries. If you want to be able to search for
> url_encrypted by itself sometimes, it might be an idea to have two
> indexes.

Not true since 8.1 - according to docs:

  A multicolumn B-tree index can be used with query conditions that involve
  any subset of the index's columns, but the index is most efficient when
  there are constraints on the leading (leftmost) columns.

Sure, the index is most efficient for leftmost columns, but it may be used
for any subset.

See http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html

regards
Tomas Vondra


Re: Tweaking PG (again)

От
Tom Lane
Дата:
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> 2. Is there a production equivalent of REINDEX? Last time I tried
> CREATE INDEX CONCURRENTLY overnight, by the morning it had croaked
> with these errors:

> ERROR:  deadlock detected
> DETAIL:  Process 6663 waits for ShareLock on transaction 999189656;
> blocked by process 31768.
> Process 31768 waits for ShareUpdateExclusiveLock on relation 50002 of
> database 41249; blocked by process 6663

Can't help you on that unless you can show us what else you were doing.
It's hard to see how the above could have happened unless the CREATE
INDEX CONCURRENTLY were inside a transaction that had already made other
updates that conflicted with what some other transaction wanted to do.

            regards, tom lane