Re: index bloat

Поиск
Список
Период
Сортировка
От David Esposito
Тема Re: index bloat
Дата
Msg-id 200507121434.j6CEY20I018606@relay2.nnco.com
обсуждение исходный текст
Ответ на Re: index bloat  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, July 12, 2005 10:14 AM
>
> "David Esposito" <pgsql-general@esposito.newnetco.com> writes:
> > As promised, here are two runs of VACUUM VERBOSE on the
> problem table ...
> > There was a lot of activity on the campaign_email table on Friday
> > (Saturday's VACUUM) as compared with Monday (Tuesday's VACUUM)
>
> Well, what these numbers show is that you have 5% to 10%
> daily turnover
> of data in this table (maybe more --- are these two days
> representative,
> do you think)?  But anyway, taking that number as gospel, you'd expect
> that the table and indexes would settle at about 10% free space
> immediately after each VACUUM.  That would represent a steady state:
> just enough free space to get eaten up till the next VACUUM.
> The table
> itself seems to have stabilized, and the "email_campaign_idx" index as
> well -- note the latter didn't grow at all, and its internal
> free space
> is in the 10% ballpark:
>
> > INFO:  index "email_campaign_idx" now contains 5881215 row
> versions in 31435
> > pages
> > DETAIL:  501822 index row versions were removed.
> > 2016 index pages have been deleted, 896 are currently reusable.
>
> > INFO:  index "email_campaign_idx" now contains 5583831 row
> versions in 31435
> > pages
> > DETAIL:  280860 index row versions were removed.
> > 3266 index pages have been deleted, 2531 are currently reusable.
>
> I'm not sure why the other indexes don't seem to have reached their
> steady-state 10% free yet.  What can you tell us about the patterns
> of data being inserted into these various indexes?
>

These two days are fairly representative ... Friday's being a bit above
average, Monday's being a bit below ... the 10% turnover number sounds about
right ... and I expected to see what you describe: a rough 10% slack in the
data files which is why I am concerned about the growth of just a few
specific indexes ...

the data in this table is the past 50 days worth of history for an emailing
application ... when we send the email, we create the record then use that
record to track the bounce, opt-out, impression, and click activity for up
to 50 days then we roll up all of the data into a statistics table and purge
it ... As for the type of data that's in each of the indexes:

campaign_email_pkey - the primary key on the table to uniquely identify the
email sent; it's value is generated from a sequence ... obviously, the
values in this column don't change once the record is created ..

campaign_patron_unq - a concatenated index which ensures that a given email
address only receives one copy of a given email ... the values in the two
columns in this index don't change once the record is created

email_patron_idx - an index on the email address (actually an integer FK
which points at a table containing the email addresses) ... very high
cardinality ... i'd estimate that for a given key, there are at most 4 or 5
records ... the values in this column doesn't change once the record is
created

email_campaign_idx - an index on the email content that the person was sent
(an integer FK which points at the content in a seperate table) ... the
cardinality for this is pretty low ... there are probably only a few hundred
unique values in this index (still enough to make the index worthwhile, but
for a given key, thousands of records could come back) ... the values in
this column doesn't change once the record is created

The last detail that I thought of that shouldn't have any impact but I
figured was worth mentioning: all of these fields are NUMERIC(11,0) (it's a
long story involving an automated tool used to generate a schema definition
from an E/R diagram ... they should certainly just be INTEGERs, but that's
water under the bridge at this point .. :-)

Thanks again for your time and help,
Dave


В списке pgsql-general по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Converting MySQL tinyint to PostgreSQL
Следующее
От:
Дата:
Сообщение: utf-8 and cultural sensitive sorting