Обсуждение: SQL workflow for crash testing correctness

Поиск
Список
Период
Сортировка

SQL workflow for crash testing correctness

От
Joseph Hammerman
Дата:
Good evening PGSQL admin email distribution list,

I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.

Does anyone know of prior art that does this?

Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.

Thanks in advance for any assistance anyone can provide,
Joseph Hammerman

Re: SQL workflow for crash testing correctness

От
Luca Ferrari
Дата:
On Wed, Sep 18, 2019 at 3:27 AM Joseph Hammerman
<jhammerman@squarespace.com> wrote:
> Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the
Postgresproject core.
 

It's quite hard to suggest without knowing what you are trying to achieve.
I would however look for an inspiration on the test suite for
PostgreSQL-XC, if available.

Hope it helps.
Luca



Re: SQL workflow for crash testing correctness

От
Joseph Hammerman
Дата:
Good afternoon Luca,

Thanks for the response.

My goal is to maintain a write and read stream during a crash and then generate a report showing that there are no lost or duplicated writes, thereby provably keeping my HA-CP promises as I evolve my platform.

I would then like to have a crash test suite, that instruments partial and full network partitions in addition to process and machine crashes.

I'll have a look at that projects code, thank you! Please let me know if you have any other thoughts or links or anything of that nature.

Regards,
Joe Hammerman

On Wed, Sep 18, 2019 at 11:22 AM Luca Ferrari <fluca1978@gmail.com> wrote:
On Wed, Sep 18, 2019 at 3:27 AM Joseph Hammerman
<jhammerman@squarespace.com> wrote:
> Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.

It's quite hard to suggest without knowing what you are trying to achieve.
I would however look for an inspiration on the test suite for
PostgreSQL-XC, if available.

Hope it helps.
Luca

Re: SQL workflow for crash testing correctness

От
Jeff Janes
Дата:
On Tue, Sep 17, 2019 at 9:27 PM Joseph Hammerman <jhammerman@squarespace.com> wrote:
Good evening PGSQL admin email distribution list,

I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.

Does anyone know of prior art that does this?


I have a testing framework which injects faults under high load, and then tests to see that automatic recovery happens correctly.  I have used it to find several bugs, but haven't turned up any in the last couple releases (likely because improved regression tests are now catching them before I get a chance to).  I've always just tested this as crash recovery within a single instance, but I think there is no reason the technique couldn't be used for multiple instances is well.  You can search for my name and "count.pl" on the hackers list to find multiple example of the testing harness.  The nature of the fault injected (torn page writes) is just a function of what I was working on at the time I wrote it, most of the bugs uncovered had nothing to do with the exact thing which caused the crash.

 
Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.

Looking at the core regression tests may also be a good idea.  Of course then you would have to ponder, if you test the same way as they do, will you find different bugs from what they find?  So I would view it more as inspiration than as instructions.
 
Cheers,

Jeff

Re: SQL workflow for crash testing correctness

От
Joseph Hammerman
Дата:
Thanks Jeff!

On Wed, Sep 18, 2019 at 2:39 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Sep 17, 2019 at 9:27 PM Joseph Hammerman <jhammerman@squarespace.com> wrote:
Good evening PGSQL admin email distribution list,

I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.

Does anyone know of prior art that does this?


I have a testing framework which injects faults under high load, and then tests to see that automatic recovery happens correctly.  I have used it to find several bugs, but haven't turned up any in the last couple releases (likely because improved regression tests are now catching them before I get a chance to).  I've always just tested this as crash recovery within a single instance, but I think there is no reason the technique couldn't be used for multiple instances is well.  You can search for my name and "count.pl" on the hackers list to find multiple example of the testing harness.  The nature of the fault injected (torn page writes) is just a function of what I was working on at the time I wrote it, most of the bugs uncovered had nothing to do with the exact thing which caused the crash.

 
Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.

Looking at the core regression tests may also be a good idea.  Of course then you would have to ponder, if you test the same way as they do, will you find different bugs from what they find?  So I would view it more as inspiration than as instructions.
 
Cheers,

Jeff