SerializeParamList vs machines with strict alignment

Поиск
Список
Период
Сортировка
От Tom Lane
Тема SerializeParamList vs machines with strict alignment
Дата
Msg-id 11629.1536550032@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: SerializeParamList vs machines with strict alignment  (Amit Kapila <amit.kapila16@gmail.com>)
Re: SerializeParamList vs machines with strict alignment  (Lou Picciano <LouPicciano@comcast.net>)
Список pgsql-hackers
I wondered why buildfarm member chipmunk has been failing hard
for the last little while.  Fortunately, it's supplying us with
a handy backtrace:

Program terminated with signal 7, Bus error.
#0  EA_flatten_into (allocated_size=<optimized out>, result=0xb55ff30e, eohptr=0x188f440) at array_expanded.c:329
329        aresult->dataoffset = dataoffset;
#0  EA_flatten_into (allocated_size=<optimized out>, result=0xb55ff30e, eohptr=0x188f440) at array_expanded.c:329
#1  EA_flatten_into (eohptr=0x188f440, result=0xb55ff30e, allocated_size=<optimized out>) at array_expanded.c:293
#2  0x003c3dfc in EOH_flatten_into (eohptr=<optimized out>, result=<optimized out>, allocated_size=<optimized out>) at
expandeddatum.c:84
#3  0x003c076c in datumSerialize (value=3934060, isnull=<optimized out>, typByVal=<optimized out>, typLen=<optimized
out>,start_address=0xbea3bd54) at datum.c:341 
#4  0x002a8510 in SerializeParamList (paramLI=0x1889f18, start_address=0xbea3bd54) at params.c:195
#5  0x002342cc in ExecInitParallelPlan (planstate=0xffffffff, estate=0x18863e0, sendParams=0x46e, nworkers=1,
tuples_needed=-1)at execParallel.c:700 
#6  0x002461dc in ExecGather (pstate=0x18864f0) at nodeGather.c:151
#7  0x00236b20 in ExecProcNodeFirst (node=0x18864f0) at execProcnode.c:445
#8  0x0022fc2c in ExecProcNode (node=0x18864f0) at ../../../src/include/executor/executor.h:237
#9  ExecutePlan (execute_once=<optimized out>, dest=0x188a108, direction=<optimized out>, numberTuples=0,
sendTuples=<optimizedout>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x18864f0,
estate=0x18863e0)at execMain.c:1721 
#10 standard_ExecutorRun (queryDesc=0x188a138, direction=<optimized out>, count=0, execute_once=true) at execMain.c:362
#11 0x0023d630 in postquel_getnext (fcache=0x1888408, es=0x1889d68) at functions.c:867
#12 fmgr_sql (fcinfo=0x701c7c) at functions.c:1164

This is remarkably hard to replicate on other machines, but I eventually
managed to duplicate it on gaur's host, after which it became really
obvious that the parallel-query data transfer logic has never been
stressed very hard on machines with strict data alignment rules.

In particular, SerializeParamList does this:

        /* Write flags. */
        memcpy(*start_address, &prm->pflags, sizeof(uint16));
        *start_address += sizeof(uint16);

immediately followed by this:

        datumSerialize(prm->value, prm->isnull, typByVal, typLen,
                       start_address);

and datumSerialize might do this:

            EOH_flatten_into(eoh, (void *) *start_address, header);

Now, I will plead mea culpa that the expanded-object API doesn't
say in large red letters that the target address for EOH_flatten_into
is supposed to be maxaligned.  It only says

 * The flattened representation must be a valid in-line, non-compressed,
 * 4-byte-header varlena object.

Still, one might reasonably suspect from that that *at least* 4-byte
alignment is expected.  This code path isn't providing such alignment,
and machines that require it will crash.  The only reason we've not
noticed, AFAICS, is that nobody has been running with
force_parallel_mode = regress on alignment-picky hardware.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Higuchi, Daisuke"
Дата:
Сообщение: stat() on Windows might cause error if target file is larger than4GB
Следующее
От: Jinhua Luo
Дата:
Сообщение: Re: How to find local logical replication origin?