Обсуждение: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

Поиск
Список
Период
Сортировка

Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

От
Alvaro Herrera
Дата:
Moving thread to -hackers, CC'ing Craig.

Michael Paquier wrote:

> hamster complains here:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-03-31%2016%3A00%3A06
> [...]
> # Copying slots to replica
> after_basebackup|test_decoding||547|0/5000060|0/5000098
> # Copying slot 'after_basebackup','test_decoding',NULL,'547','0/5000060','0/5000098'
> connection error: 'psql:<stdin>:1: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:<stdin>:1: connection to server was lost'

Ahem.

So, I see this:

DEBUG:  server process (PID 24166) exited with exit code 0
DEBUG:  forked new backend, pid=24168 socket=7
LOG:  statement: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060',
'0/5000098');
DEBUG:  server process (PID 24168) was terminated by signal 11: Segmentation fault
DETAIL:  Failed process was running: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547',
'0/5000060','0/5000098');
 
LOG:  server process (PID 24168) was terminated by signal 11: Segmentation fault

in pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log

Could you have a look at whether you have core dumps from these?  If so,
backtraces would be very useful.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

От
Petr Jelinek
Дата:
On 01/04/16 03:49, Alvaro Herrera wrote:
> Moving thread to -hackers, CC'ing Craig.
>
> Michael Paquier wrote:
>
>> hamster complains here:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-03-31%2016%3A00%3A06
>> [...]
>> # Copying slots to replica
>> after_basebackup|test_decoding||547|0/5000060|0/5000098
>> # Copying slot 'after_basebackup','test_decoding',NULL,'547','0/5000060','0/5000098'
>> connection error: 'psql:<stdin>:1: server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> psql:<stdin>:1: connection to server was lost'
>
> Ahem.
>
> So, I see this:
>
> DEBUG:  server process (PID 24166) exited with exit code 0
> DEBUG:  forked new backend, pid=24168 socket=7
> LOG:  statement: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', '0/5000060',
'0/5000098');
> DEBUG:  server process (PID 24168) was terminated by signal 11: Segmentation fault
> DETAIL:  Failed process was running: SELECT test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547',
'0/5000060','0/5000098');
 
> LOG:  server process (PID 24168) was terminated by signal 11: Segmentation fault
>
> in pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log
>
> Could you have a look at whether you have core dumps from these?  If so,
> backtraces would be very useful.
>

The function does following:
TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);

And we are passing NULL as that parameter, that could explain this.
Also while reading it I wonder if the function should be defined with
xid type rather than bigint and use similar input code as xid.c.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

От
Craig Ringer
Дата:
On 1 April 2016 at 11:13, Petr Jelinek <petr@2ndquadrant.com> wrote:
 

The function does following:
TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);

This should be reasonable enough though; down-casting it will discard the high bits but that's fine when we know there's nothing interesting there.

TransactionId is a uint32 anyway, but I had a reason for the above. There's no cast from integer to xid, which is why I used bigint here, since we don't have a uint32 native SQL type. What I *should've* done is simply used quoted-literal arguments like

    XID '1234'

or cast via text

    1234::text::xid

so there was no need to use a bigint instead. I'll adjust appropriately so it uses PG_GETARG_TRANSACTIONID(1) and is declared as accepting 'xid'.
 
And we are passing NULL as that parameter, that could explain this.

If we're on an int64-by-value platform that'd be wrong but still work, it'd just be zero. On an int64-by-reference platform it could indeed segfault. So yes, I'd say that's the cause.
 
Also while reading it I wonder if the function should be defined with
xid type rather than bigint and use similar input code as xid.c.

Yes, it should.

I'll prep a follow-up patch. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

От
Craig Ringer
Дата:
On 1 April 2016 at 12:47, Craig Ringer <craig@2ndquadrant.com> wrote:
 
I'll prep a follow-up patch. 


Done and attached.

Note that I can't use PG_GETARG_TRANSACTIONID directly since it's a macro defined only in xid.c . It didn't seem worth extracting it and moving it to postgres.h (where the other non-ADT-specific PG_GETARG_ macros are) or its own new header just for this, so I've spelled it out each time.

I now remember that that's part of why I used bigint as an argument type. The other part is that txid_current() returns bigint and there's no cast from bigint to xid. So the tests would have to CREATE CAST or cast via 'text'. They now do the latter.

We should probably have a cast from bigint to/from xid, but the type is so little-used that it's not much fuss.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

От
Alvaro Herrera
Дата:
Craig Ringer wrote:

> Note that I can't use PG_GETARG_TRANSACTIONID directly since it's a macro
> defined only in xid.c . It didn't seem worth extracting it and moving it to
> postgres.h (where the other non-ADT-specific PG_GETARG_ macros are) or its
> own new header just for this, so I've spelled it out each time.

Hm, we should probably fix this sometime.  We used to think of Xids are
purely internal objects, but they are becoming more and more visible to
userland.

> I now remember that that's part of why I used bigint as an argument type.
> The other part is that txid_current() returns bigint and there's no cast
> from bigint to xid. So the tests would have to CREATE CAST or cast via
> 'text'. They now do the latter.

I was rather surprised at this -- I thought that it'd break when the Xid
epoch got further than 0 (because casting from a number larger than 2^32
would throw an overflow error, I thought), but as it turns out the cast
from text to XID automatically discards the higher-order bits when a
higher epoch is used.  I imagine this is intentional, but it's not
actually documented in xidin() at all.  Also, I'm not sure what happens
if strtoul() refuses to work on > INT_MAX values ... it's documented to
set errno to ERANGE, so I assume the whole chain simply fails.  Maybe
there's just no platform on which that happens anymore.


With regards to the rest of the patch, I decided not to use your
approach: it seemed a bit odd to accept NULL values there.  I changed
the query to use COALESCE in the value that returned null instead.  With
your change, the function takes a null and interprets it as InvalidXid
anyway, so it seems to work fine.  (I also tested it manually.)

Let's see what hamster has to say about this.

It would be really good to have other buildfarm members run this code.
Currently only Hamster does, and only because Michaël patched the
buildfarm script ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services