Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
От | Peter Smith |
---|---|
Тема | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Дата | |
Msg-id | CAHut+PvFbWt3zKK7V34m_arbwbtdN0UsoZAg6UTu=ONpNuDM-Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication (Melih Mutlu <m.melihmutlu@gmail.com>) |
Ответы |
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
(Peter Smith <smithpb2250@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Aug 11, 2023 at 11:45 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > Again, I couldn't reproduce the cases where you saw significantly degraded performance. I wonder if I'm missing something.Did you do anything not included in the test scripts you shared? Do you think v26-0001 will perform 84% worse thanHEAD, if you try again? I just want to be sure that it was not a random thing. > Interestingly, I also don't see an improvement in above results as big as in your results when inserts/tx ratio is smaller.Even though it certainly is improved in such cases. > TEST ENVIRONMENTS I am running the tests on a high-spec machine: -- NOTE: Nobody else is using this machine during our testing, so there are no unexpected influences messing up the results. Linix Architecture: x86_64 CPU(s): 120 Thread(s) per core: 2 Core(s) per socket: 15 total used free shared buff/cache available Mem: 755G 5.7G 737G 49M 12G 748G Swap: 4.0G 0B 4.0G ~~~ The results I am seeing are not random. HEAD+v26-0001 is consistently worse than HEAD but only for some settings. With these settings, I see bad results (i.e. worse than HEAD) consistently every time using the dedicated test machine. Hou-san also reproduced bad results using a different high-spec machine Vignesh also reproduced bad results using just his laptop but in his case, it did *not* occur every time. As discussed elsewhere the problem is timing-related, so sometimes you may be lucky and sometimes not. ~ I expect you are running everything correctly, but if you are using just a laptop (like Vignesh) then like him you might need to try multiple times before you can hit the problem happening in your environment. Anyway, in case there is some other reason you are not seeing the bad results I have re-attached scripts and re-described every step below. ====== BUILDING -- NOTE: I have a very minimal configuration without any optimization/debug flags etc. See config.log $ ./configure --prefix=/home/peter/pg_oss -- NOTE: Of course, make sure to be running using the correct Postgres: echo 'set environment variables for OSS work' export PATH=/home/peter/pg_oss/bin:$PATH -- NOTE: Be sure to do git stash or whatever so don't accidentally build a patched version thinking it is the HEAD version -- NOTE: Be sure to do a full clean build and apply (or don't apply v26-0001) according to the test you wish to run. STEPS 1. sudo make clean 2. make 3. sudo make install ====== SCRIPTS & STEPS SCRIPTS testrun.sh do_one_test_setup.sh do_one_test_PUB.sh do_one_test_SUB.sh --- STEPS Step-1. Edit the testrun.sh tables=( 100 ) workers=( 2 4 8 16 ) size="0" prefix="0816headbusy" <-- edit to differentiate each test run ~ Step-2. Edit the do_one_test_PUB.sh IF commit_counter = 1000 THEN <-- edit this if needed. I wanted 1000 inserts/tx so nothing to do ~ Step-3: Check nothing else is running. If yes, then clean it up [peter@localhost testing_busy]$ ps -eaf | grep postgres peter 111924 100103 0 19:31 pts/0 00:00:00 grep --color=auto postgres ~ Step-4: Run the tests [peter@localhost testing_busy]$ ./testrun.sh num_tables=100, size=0, num_workers=2, run #1 <-- check the echo matched the config you set in the Setp-1 waiting for server to shut down.... done server stopped waiting for server to shut down.... done server stopped num_tables=100, size=0, num_workers=2, run #2 waiting for server to shut down.... done server stopped waiting for server to shut down.... done server stopped num_tables=100, size=0, num_workers=2, run #3 ... ~ Step-5: Sanity check When the test completes the current folder will be full of .log and .dat* files. Check for sanity that no errors happened [peter@localhost testing_busy]$ cat *.log | grep ERROR [peter@localhost testing_busy]$ ~ Step-6: Collect the results The results are output (by the do_one_test_SUB.sh) into the *.dat_SUB files Use grep to extract them [peter@localhost testing_busy]$ cat 0816headbusy_100t_0_2w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' 11742.019 12157.355 11773.807 11582.981 12220.962 12546.325 12210.713 12614.892 12015.489 13527.05 Repeat grep for other files: $ cat 0816headbusy_100t_0_4w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' $ cat 0816headbusy_100t_0_8w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' $ cat 0816headbusy_100t_0_16w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' ~ Step-7: Summarise the results Now I just cut/paste the results from Step-6 into a spreadsheet and report the median of the runs. For example, for the above HEAD run, it was: 2w 4w 8w 16w 1 11742 5996 1919 1582 2 12157 5960 1871 1469 3 11774 5926 2101 1571 4 11583 6155 1883 1671 5 12221 6310 1895 1707 6 12546 6166 1900 1470 7 12211 6114 2477 1587 8 12615 6173 2610 1715 9 12015 5869 2110 1673 10 13527 5913 2144 1227 Median 12184 6055 2010 1584 ~ Step-8: REPEAT -- repeat all above for different size transactions (editing do_one_test_PUB.sh) -- repeat all above after rebuilding again with HEAD+v26-0001 ------ Kind Regards, Peter Smith. Fujitsu Australia
Вложения
В списке pgsql-hackers по дате отправления: