Обсуждение: Instability in the postgres_fdw regression test

Поиск
Список
Период
Сортировка

Instability in the postgres_fdw regression test

От
Tom Lane
Дата:
We've been seeing $subject since commit 1bc0100d2, with little clue
as to the cause.  Previous attempts to fix it by preventing autovacuum
from running on the relevant tables didn't seem to help.

I have now managed to reproduce the issue reliably enough to study it.
(It turns out that on longfin's host, running the buildfarm script *under
cron* produces the failure a reasonable percentage of the time.  The
identical test case, launched from an interactive shell, never fails.
I speculate that the kernel scheduler treats cron jobs differently.)
It is in fact a timing issue, and what triggers it is there being an
active autovacuum task in another database.  The fact that the buildfarm
script uses USE_MODULE_DB in the contrib tests greatly increases the
surface area of the problem, since that creates a lot more databases that
autovacuum could be active in.  Once I realized that, I found that it can
trivially be reproduced by hand, in a "make installcheck" test, simply by
having an open transaction in another DB in the cluster.  For instance
"select pg_sleep(60);" and then run make installcheck in postgres_fdw.

So now, drilling down to the specific problem: what we're seeing is that
the plans for some queries change because the "remote" server reports
a different rowcount estimate than usual for

EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE

Normally the estimate is one, but in the failure cases it's 12 or so.
This estimate is coming out of ineq_histogram_selectivity, of course.
Furthermore, the highest value actually present in the histogram is
1000, because that was the highest value of "C 1" when the test did
ANALYZE up at the top.  That means we'll invoke get_actual_variable_range
to try to identify the actual current maximum.  Most of the time, it
returns 2010, which is correct, and we end up estimating that only
about one row will exceed 2000.

However, in the failure cases, what's getting extracted from the index
is 9999.  That's because we'd inserted and deleted that value further
up in the test, and if there's been an open transaction holding back
RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that
entry still good.  This value is enough larger than 2000 to move the
selectivity estimate appreciably, and then kaboom.

This theory successfully explains the observed fact that some buildfarm
failures show differences in two query plans, while others show a
difference just in the first one.  In the latter cases, the external
transaction ended, so that RecentGlobalXmin could advance, in between.

What I propose to do to fix the instability is to change the test
stanza that uses 9999 as a key-chosen-at-random to use something less
far away from the normal range of "C 1" values, so that whether it's
still visible to get_actual_variable_range has less effect on this
selectivity estimate.  That's a hack, for sure, but I don't see any
other fix that's much less of a hack.

            regards, tom lane


Re: Instability in the postgres_fdw regression test

От
Robert Haas
Дата:
On Thu, Apr 12, 2018 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We've been seeing $subject since commit 1bc0100d2, with little clue
> as to the cause.  Previous attempts to fix it by preventing autovacuum
> from running on the relevant tables didn't seem to help.
>
> I have now managed to reproduce the issue reliably enough to study it.
> (It turns out that on longfin's host, running the buildfarm script *under
> cron* produces the failure a reasonable percentage of the time.  The
> identical test case, launched from an interactive shell, never fails.
> I speculate that the kernel scheduler treats cron jobs differently.)
> It is in fact a timing issue, and what triggers it is there being an
> active autovacuum task in another database.  The fact that the buildfarm
> script uses USE_MODULE_DB in the contrib tests greatly increases the
> surface area of the problem, since that creates a lot more databases that
> autovacuum could be active in.  Once I realized that, I found that it can
> trivially be reproduced by hand, in a "make installcheck" test, simply by
> having an open transaction in another DB in the cluster.  For instance
> "select pg_sleep(60);" and then run make installcheck in postgres_fdw.
>
> So now, drilling down to the specific problem: what we're seeing is that
> the plans for some queries change because the "remote" server reports
> a different rowcount estimate than usual for
>
> EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
>
> Normally the estimate is one, but in the failure cases it's 12 or so.
> This estimate is coming out of ineq_histogram_selectivity, of course.
> Furthermore, the highest value actually present in the histogram is
> 1000, because that was the highest value of "C 1" when the test did
> ANALYZE up at the top.  That means we'll invoke get_actual_variable_range
> to try to identify the actual current maximum.  Most of the time, it
> returns 2010, which is correct, and we end up estimating that only
> about one row will exceed 2000.
>
> However, in the failure cases, what's getting extracted from the index
> is 9999.  That's because we'd inserted and deleted that value further
> up in the test, and if there's been an open transaction holding back
> RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that
> entry still good.  This value is enough larger than 2000 to move the
> selectivity estimate appreciably, and then kaboom.
>
> This theory successfully explains the observed fact that some buildfarm
> failures show differences in two query plans, while others show a
> difference just in the first one.  In the latter cases, the external
> transaction ended, so that RecentGlobalXmin could advance, in between.
>
> What I propose to do to fix the instability is to change the test
> stanza that uses 9999 as a key-chosen-at-random to use something less
> far away from the normal range of "C 1" values, so that whether it's
> still visible to get_actual_variable_range has less effect on this
> selectivity estimate.  That's a hack, for sure, but I don't see any
> other fix that's much less of a hack.

Thanks for the detective work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Instability in the postgres_fdw regression test

От
Etsuro Fujita
Дата:
(2018/04/13 3:49), Robert Haas wrote:
> On Thu, Apr 12, 2018 at 12:49 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> We've been seeing $subject since commit 1bc0100d2, with little clue
>> as to the cause.  Previous attempts to fix it by preventing autovacuum
>> from running on the relevant tables didn't seem to help.
>>
>> I have now managed to reproduce the issue reliably enough to study it.
>> (It turns out that on longfin's host, running the buildfarm script *under
>> cron* produces the failure a reasonable percentage of the time.  The
>> identical test case, launched from an interactive shell, never fails.
>> I speculate that the kernel scheduler treats cron jobs differently.)
>> It is in fact a timing issue, and what triggers it is there being an
>> active autovacuum task in another database.  The fact that the buildfarm
>> script uses USE_MODULE_DB in the contrib tests greatly increases the
>> surface area of the problem, since that creates a lot more databases that
>> autovacuum could be active in.  Once I realized that, I found that it can
>> trivially be reproduced by hand, in a "make installcheck" test, simply by
>> having an open transaction in another DB in the cluster.  For instance
>> "select pg_sleep(60);" and then run make installcheck in postgres_fdw.
>>
>> So now, drilling down to the specific problem: what we're seeing is that
>> the plans for some queries change because the "remote" server reports
>> a different rowcount estimate than usual for
>>
>> EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1">  2000)) FOR UPDATE
>>
>> Normally the estimate is one, but in the failure cases it's 12 or so.
>> This estimate is coming out of ineq_histogram_selectivity, of course.
>> Furthermore, the highest value actually present in the histogram is
>> 1000, because that was the highest value of "C 1" when the test did
>> ANALYZE up at the top.  That means we'll invoke get_actual_variable_range
>> to try to identify the actual current maximum.  Most of the time, it
>> returns 2010, which is correct, and we end up estimating that only
>> about one row will exceed 2000.
>>
>> However, in the failure cases, what's getting extracted from the index
>> is 9999.  That's because we'd inserted and deleted that value further
>> up in the test, and if there's been an open transaction holding back
>> RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that
>> entry still good.  This value is enough larger than 2000 to move the
>> selectivity estimate appreciably, and then kaboom.
>>
>> This theory successfully explains the observed fact that some buildfarm
>> failures show differences in two query plans, while others show a
>> difference just in the first one.  In the latter cases, the external
>> transaction ended, so that RecentGlobalXmin could advance, in between.
>>
>> What I propose to do to fix the instability is to change the test
>> stanza that uses 9999 as a key-chosen-at-random to use something less
>> far away from the normal range of "C 1" values, so that whether it's
>> still visible to get_actual_variable_range has less effect on this
>> selectivity estimate.  That's a hack, for sure, but I don't see any
>> other fix that's much less of a hack.
>
> Thanks for the detective work.

Thanks a lot!

Best regards,
Etsuro Fujita