Re: WAL usage calculation patch

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: WAL usage calculation patch
Дата
Msg-id CAFiTN-sONFPUHxMpn9m4=Z+m_MubCdvcVD75+QRK5jxVmUce8w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WAL usage calculation patch  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: WAL usage calculation patch
Список pgsql-hackers
On Thu, Apr 2, 2020 at 9:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> > >                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> > > ----------------------------------------------+-------+-----------+-------------+-------------
> > >  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
> > >  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
> > >  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
> > >  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
> > >  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
> > >  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
> > >  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
> > >  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
> > >  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
> > >  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
> > >  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> > > (11 rows)
> > >
> > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> > >       relname          | pg_relation_size
> > > -----------------------+------------------
> > >  t1_idx_parallel_0     |         22487040
> > >  t1_idx_parallel_0_bis |         22487040
> > >  t1_idx_parallel_0_ter |         22487040
> > >  t1_idx_parallel_2     |         22487040
> > >  t1_idx_parallel_1     |         22487040
> > >  t1_idx_parallel_4     |         22487040
> > >  t1_idx_parallel_3     |         22487040
> > >  t1_idx_parallel_5     |         22487040
> > >  t1_idx_parallel_6     |         22487040
> > >  t1_idx_parallel_7     |         22487040
> > >  t1_idx_parallel_8     |         22487040
> > > (9 rows)
> > >
> > >
> > > So while the number of WAL records and full page images stay constant, we can
> > > see some small fluctuations in the total amount of generated WAL data, even for
> > > multiple execution of the sequential create index.  I'm wondering if the
> > > fluctuations are due to some other internal details or if the WalUsage support
> > > is just completely broken (although I don't see any obvious issue ATM).
> > >
> >
> > I think we need to know the reason for this.  Can you try with small
> > size indexes and see if the problem is reproducible? If it is, then it
> > will be easier to debug the same.
>
> I have done some testing to see where these extra WAL size is coming
> from.  First I tried to create new db before every run then the size
> is consistent.  But, then on the same server, I tired as Julien showed
> in his experiment then I am getting few extra wal bytes from next
> create index onwards.  And, the waldump(attached in the mail) shows
> that is pg_class insert wal.  I still have to check that why we need
> to write an extra wal size.
>
> create extension pg_stat_statements;
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> select * from pg_stat_statements_reset() ;
> create index t1_idx_parallel_0 ON t1(id);
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
>                                       query
>            | calls | wal_bytes | wal_records | wal_num_fpw
>
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
>  create index t1_idx_parallel_0 ON t1(id)
>            |     1 |     49320 |          23 |          15
>
>
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> --select * from pg_stat_statements_reset() ;
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> create index t1_idx_parallel_1 ON t1(id);
>
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
> postgres[110383]=# select query, calls, wal_bytes, wal_records,
> wal_num_fpw from pg_stat_statements;
>                                       query
>            | calls | wal_bytes | wal_records | wal_num_fpw
>
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
>  create index t1_idx_parallel_1 ON t1(id)
>            |     1 |     50040 |          23 |          15
>
> wal_bytes diff = 50040-49320 = 720
>
> Below, WAL record is causing the 720 bytes difference, all other WALs
> are of the same size.
> t1_idx_parallel_0:
> rmgr: Heap        len (rec/tot):     54/  7498, tx:        489, lsn:
> 0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> t1_idx_parallel_1:
> rmgr: Heap        len (rec/tot):     54/  8218, tx:        494, lsn:
> 0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> wal diff: 8218 - 7498 = 720

I think now I got the reason.  Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page.  If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
is bigger.  You can refer below code in
XLogRecordAssemble
{
....
bimg.length = BLCKSZ - cbimg.hole_length;

if (cbimg.hole_length == 0)
{
....
}
else
{
/* must skip the hole */
rdt_datas_last->data = page;
rdt_datas_last->len = bimg.hole_offset;

rdt_datas_last->next = ®buf->bkp_rdatas[1];
rdt_datas_last = rdt_datas_last->next;

rdt_datas_last->data =
page + (bimg.hole_offset + cbimg.hole_length);
rdt_datas_last->len =
BLCKSZ - (bimg.hole_offset + cbimg.hole_length);
}


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yuri Astrakhan
Дата:
Сообщение: Re: Yet another fast GiST build
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Some problems of recovery conflict wait events