Обсуждение: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals
I saw this just now: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580) while running 50 cascading instances on a single machine. select version(): PostgreSQL 11devel_HEAD_20171219_2158_7d3583ad9ae5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit I don't know if the admittedly somewhat crazy 50 instances make that error acceptable/expected but I thought I'd report it anyway. thanks, Erik Rijkers
On 2017-12-19 23:35, Erik Rijkers wrote: > I saw this just now: > > TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, > snap->xmin))", File: "snapbuild.c", Line: 580) > > while running 50 cascading instances on a single machine. Sorry, that was probably too terse, I should explain that a little. After initing 50 instances, I set up and run a pgbench session in the master session; the pgbench lines are: init: pgbench --port=6515 --quiet --initialize --scale=1 postgres run: pgbench -M prepared -c 16 -j 8 -T 1 -P 1 -n postgres -- scale 1 the other instances then catch up. The whole takes 5 minutes or so I vary scale, duration, and number of instances. I haven't had it fail in this way yet but I mostly tried with lower number of instances (up to 25 or so). > select version(): > PostgreSQL 11devel_HEAD_20171219_2158_7d3583ad9ae5 on > x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, > 64-bit > > > I don't know if the admittedly somewhat crazy 50 instances make that > error acceptable/expected but I thought I'd report it anyway. > > thanks, > > Erik Rijkers
On Wed, Dec 20, 2017 at 7:46 AM, Erik Rijkers <er@xs4all.nl> wrote: > Sorry, that was probably too terse, I should explain that a little. > > After initing 50 instances, I set up and run a pgbench session in the master > session; the pgbench lines are: > > init: pgbench --port=6515 --quiet --initialize --scale=1 postgres > run: pgbench -M prepared -c 16 -j 8 -T 1 -P 1 -n postgres -- scale 1 > > the other instances then catch up. The whole takes 5 minutes or so > > I vary scale, duration, and number of instances. I haven't had it fail in > this way yet but I mostly tried with lower number of instances (up to 25 or > so). Hm. Are you saying that it takes at least 50 cascading instances to see the problem you are seeing? And that you are not seeing any problems with a lower number of cascading instances? Are you enabling hot_standby_feedback? -- Michael
On 2017-12-20 06:27, Michael Paquier wrote: > On Wed, Dec 20, 2017 at 7:46 AM, Erik Rijkers <er@xs4all.nl> wrote: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580) >> Sorry, that was probably too terse, I should explain that a little. >> >> After initing 50 instances, I set up and run a pgbench session in the >> master >> session; the pgbench lines are: >> >> init: pgbench --port=6515 --quiet --initialize --scale=1 postgres >> run: pgbench -M prepared -c 16 -j 8 -T 1 -P 1 -n postgres -- scale >> 1 >> >> the other instances then catch up. The whole takes 5 minutes or so >> >> I vary scale, duration, and number of instances. I haven't had it >> fail in >> this way yet but I mostly tried with lower number of instances (up to >> 25 or >> so). > > Hm. Are you saying that it takes at least 50 cascading instances to > see the problem you are seeing? And that you are not seeing any > problems with a lower number of cascading instances? Are you enabling > hot_standby_feedback? That sounds more definitive than I meant it, but yes, only now that I tried a higher number of instances did I see this. But is also often succeeds at up to 100 instances (100 is the highest I have tried). These 50 instances were a logical replication chain, and hot_standby_feedback is off. Overnight I ran 80x the test that failed yesterday: now they all 80 succeeded. I am not sure what causes failure over success. (logical replication does the initial syncing of the instances one by one (sequentially) so it isn't as busy as expected; it just takes a long time) I wrote a simple perl program to test logical replication (attached, FWIW), running: ./cascade.pl --instances=50 --scale=1 --clients=16 --threads=8 --duration=1 --repeats=3 --waiting=10 This cascade.pl program uses knowledge of my setup so probably won't run elsewhere as is but it shows how the failing test was done. Erik
Вложения
On Wed, Dec 20, 2017 at 3:33 PM, Erik Rijkers <er@xs4all.nl> wrote: > (logical replication does the initial syncing of the instances one by one > (sequentially) so it isn't as busy as expected; it just takes a long time) This is quite different then. I thought that you meant physical replication with a set of cascading standbys! > I wrote a simple perl program to test logical replication (attached, FWIW), > running: > > ./cascade.pl --instances=50 --scale=1 --clients=16 --threads=8 --duration=1 > --repeats=3 --waiting=10 > > This cascade.pl program uses knowledge of my setup so probably won't run > elsewhere as is but it shows how the failing test was done. I can get that to work easily at quick glance in my environment. Likely that will help. -- Michael