Re: pgsql: Support partition pruning at execution time

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: pgsql: Support partition pruning at execution time
Дата
Msg-id CAKJS1f8o2Yd=rOP=Et3A0FWgF+gSAOkFSU6eNhnGzTPV7nN8sQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Support partition pruning at execution time  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: pgsql: Support partition pruning at execution time  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-committers
On 11 April 2018 at 18:58, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 10 April 2018 at 08:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> David Rowley wrote:
>>>> Okay, I've written and attached a fix for this.  I'm not 100% certain
>>>> that this is the cause of the problem on pademelon, but the code does
>>>> look wrong, so needs to be fixed. Hopefully, it'll make pademelon
>>>> happy, if not I'll think a bit harder about what might be causing that
>>>> instability.
>>
>>> Pushed it just now.  Let's see what happens with pademelon now.
>>
>> I've had pademelon's host running a "make installcheck" loop all day
>> trying to reproduce the problem.  I haven't gotten a bite yet (although
>> at 15+ minutes per cycle, this isn't a huge number of tests).  I think
>> we were remarkably (un)lucky to see the problem so quickly after the
>> initial commit, and I'm afraid pademelon isn't going to help us prove
>> much about whether this was the same issue.
>>
>> This does remind me quite a bit though of the ongoing saga with the
>> postgres_fdw test instability.  Given the frequency with which that's
>> failing in the buildfarm, you would not think it's impossible to
>> reproduce outside the buildfarm, and yet I'm here to tell you that
>> it's pretty damn hard.  I haven't succeeded yet, and that's not for
>> lack of trying.  Could there be something about the buildfarm
>> environment that makes these sorts of things more likely?
>
> coypu just demonstrated that this was not the cause of the problem [1]
>
> I'll study the code a bit more and see if I can think why this might
> be happening.
>
> [1]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=coypu&dt=2018-04-11%2004%3A17%3A38&stg=install-check-C

I've spent a bit of time tonight trying to dig into this problem to
see if I can figure out what's going on.

I ended up running the following script on both a Linux x86_64 machine
and also a power8 machine.

#!/bin/bash
for x in {1..1000}
do
    echo "$x";
    for i in {1..1000}
    do
        psql -d postgres -f test.sql -o test.out
        diff -u test.out test.expect
    done
done

I was unable to recreate this problem after about 700k loops on the
Linux machine and 130k loops on the power8.

I've emailed the owner of coypu to ask if it would be possible to get
access to the machine, or have him run the script to see if it does
actually fail. Currently waiting to hear back.

I've made another pass over the nodeAppend.c code and I'm unable to
see what might cause this, although I did discover a bug where
first_partial_plan is not set taking into account that some subplans
may have been pruned away during executor init. The only thing I think
this would cause is for parallel workers to not properly help out with
some partial plans if some earlier subplans were pruned. I can see no
reason for this to have caused this particular issue since the
first_partial_plan would be 0 with and without the attached fix.

Tom, would there be any chance you could run the above script for a
while on pademelon to see if it can in fact reproduce the problem?
coypu did show this problem in the install check, so I don't think it
will need the other concurrent tests to fail.  If you can recreate,
after adjusting the expected output, does the problem still exist in
5c0675215?

I also checked with other tests perform an EXPLAIN ANALYZE of a plan
with a Parallel Append and I see there's none. So I've not ruled out
that this is an existing bug. git grep "explain.*analyze" also does
not show much outside of the partition_prune tests either.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-committers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: pgsql: Revert MERGE patch
Следующее
От: Teodor Sigaev
Дата:
Сообщение: pgsql: Cleanup covering infrastructure