Re: Heavily modified big table bloat even in auto vacuum is running

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Heavily modified big table bloat even in auto vacuum is running
Дата
Msg-id CAA4eK1KpnqQb4R7BMiVHEyjivbch8wzniYSDQ-YAAFBi77Km6A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Heavily modified big table bloat even in auto vacuum is running  (Haribabu kommi <haribabu.kommi@huawei.com>)
Ответы Re: Heavily modified big table bloat even in auto vacuum is running
Список pgsql-hackers
On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
<haribabu.kommi@huawei.com> wrote:
> On 29 November 2013 12:00 Amit Kapila wrote:
>> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
>> <haribabu.kommi@huawei.com> wrote:
>> Few questions about your latest patch:
>> a. Is there any reason why you are doing estimation of dead tuples only
>> for Autovacuum and not for Vacuum.
>
> No, changed.
>
>> /* clear and get the new stats for calculating proper dead tuples */
>> pgstat_clear_snapshot(); tabentry =
>> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
>> b. In the above code, to get latest data you are first clearing
>> snapshot and then calling pgstat function. It will inturn perform I/O
>>     (read of stats file) and send/receive message from stats collector
>> to ensure it can read latest data. I think it will add overhead
>>     to Vacuum, especially if 'nkeep' calculated in function
>> lazy_scan_heap() can serve the purpose. In my simple test[1], I
>> observed
>>     that value of keep can serve the purpose.
>>
>> Can you please once try the test on 'nkeep' approach patch.
>
> Using the nkeep and snapshot approach, I ran the test for 40 mins with a
> high analyze_threshold and results are below.
>
>                         Auto vacuum count               Bloat size
> Master           11                      220MB
> Patched_nkeep      14                      215MB
> Patched_snapshot           18                              198MB
>
> Both the approaches are showing good improvement in the test.
> Updated patches, test script and configuration is attached in the mail.

I think updating dead tuple count using nkeep is good idea as similar
thing is done for Analyze as well in acquire_sample_rows().
One minor point, I think it is better to log dead tuples is below error message:
ereport(LOG,
(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
"pages: %d removed, %d remain\n"
"tuples: %.0f removed, %.0f remain\n"

"tuples: %.0f removed, %.0f remain, %.0f dead\n"


About your test, how to collect the data by running this script, are
you manually stopping it after 40 mins, because I ran it for more than
an hour,
the final result didn't came.
As I mentioned you last time, please simplify your test, for other
person in its current form, it is difficult to make meaning out of it.
Write comments on top of it in steps form to explain what exactly it
is doing and how to take data using it (for example, do I need
to wait, till script ends; how long this test can take to complete).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [PATCH] Add transforms feature
Следующее
От: Haribabu kommi
Дата:
Сообщение: Re: Performance Improvement by reducing WAL for Update Operation