On Tuesday, October 13, 2020 10:09 AM, Tsunakawa-san wrote:
> Why do you do this way? I think the previous patch was more correct (while
> agreeing with Horiguchi-san in that nTotalBlocks may be unnecessary. What
> you want to do is "if the size of any fork could be inaccurate, do the traditional
> full buffer scan without performing any optimization for any fork," right? But
> the above code performs optimization for forks until it finds a fork with
> inaccurate size.
>
> (2)
> + * Get the total number of cached blocks and to-be-invalidated
> blocks
> + * of the relation. The cached value returned by smgrnblocks could
> be
> + * smaller than the actual number of existing buffers of the file.
>
> As you changed the meaning of the smgrnblocks() argument from cached to
> accurate, and you nolonger calculate the total blocks, the comment should
> reflect them.
>
>
> (3)
> In smgrnblocks(), accurate is not set to false when mdnblocks() is called.
> The caller doesn't initialize the value either, so it can see garbage value.
>
>
> (4)
> + if (nForkBlocks[i] != InvalidBlockNumber &&
> + nBlocksToInvalidate <
> BUF_DROP_FULL_SCAN_THRESHOLD)
> + {
> ...
> + }
> + else
> + break;
> + }
>
> In cases like this, it's better to reverse the if and else. Thus, you can reduce
> the nest depth.
Thank you for the review!
1. I have revised the patch addressing your comments/feedback. Attached are the latest set of patches.
2. Non-recovery Performance
I also included a debug version of the patch (0004) where I removed the recovery-related checks
to measure non-recovery performance.
However, I still can't seem to find the cause of why the non-recovery performance
does not change when compared to master. (1 min 15 s for the given test case below)
> - if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
> + if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
Here's how I measured it:
0. postgresql.conf setting
shared_buffers = 100GB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min
max_locks_per_transaction = 100
wal_log_hints = on
wal_keep_size = 100
max_wal_size = 20GB
1. createdb test
2. Create tables: SELECT create_tables(1000);
create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'create table tab_' || i::text || ' (a int);';
execute query_string;
end loop;
end;
$$ language plpgsql;
3 Insert rows to tables (3.5 GB db): SELECT insert_tables(1000);
create or replace function insert_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'insert into tab_' || i::text || ' SELECT generate_series(1, 100000);' ;
execute query_string;
end loop;
end;
$$ language plpgsql;
4. DELETE FROM tables: SELECT delfrom_tables(1000);
create or replace function delfrom_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'delete from tab_' || i::text;
execute query_string;
end loop;
end;
$$ language plpgsql;
5. Measure VACUUM timing
\timing
VACUUM;
Using the debug version of the patch, I have confirmed that it enters the optimization path
when it meets the conditions. Here are some printed logs from 018_wal_optimize_node_replica.log:
> make world -j4 -s && make -C src/test/recovery/ check PROVE_TESTS=t/018_wal_optimize.pl
WARNING: current fork 0, nForkBlocks[i] 1, accurate: 1
CONTEXT: WAL redo at 0/162B4E0 for Storage/TRUNCATE: base/13751/24577 to 0 blocks flags 7
WARNING: Optimization Loop.
buf_id = 41. nforks = 1. current fork = 0. forkNum: 0 == tag's forkNum: 0. curBlock: 0 < nForkBlocks[i] = 1. tag
blockNum:0 >= firstDelBlock[i]: 0. nBlocksToInvalidate = 1 < threshold = 32.
--
3. Recovery Performance (hot standby, failover)
OTOH, when executing recovery performance (using 0003 patch), the results were great.
| s_b | master | patched | %reg |
|-------|--------|---------|--------|
| 128MB | 3.043 | 2.977 | -2.22% |
| 1GB | 3.417 | 3.41 | -0.21% |
| 20GB | 20.597 | 2.409 | -755% |
| 100GB | 66.862 | 2.409 | -2676% |
To execute this on a hot standby setup (after inserting rows to tables)
1. [Standby] Pause WAL replay
SELECT pg_wal_replay_pause();
2. [Master] Measure VACUUM timing. Then stop server.
\timing
VACUUM;
\q
pg_ctl stop -mi -w
3. [Standby] Use the attached script to promote standby and measure the performance.
# test.sh recovery
So the current issue I'm still investigating is why the performance for non-recovery is bad,
while OTOH it's good when InRecovery.
Regards,
Kirk Jamison