Обсуждение: Segmentation fault when max_parallel degree is very High
When parallel degree is set to very high say 70000, there is a segmentation fault in parallel code,
and that is because type casting is missing in the code..
Take a look at below test code:
create table abd(n int) with (parallel_degree=70000);
insert into abd values (generate_series(1,1000000)); analyze abd; vacuum abd;
set max_parallel_degree=70000;
explain analyze verbose select * from abd where n<=1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: LOG: server process (PID 41906) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: explain analyze verbose select * from abd where n<=1;
This is crashing because in ExecParallelSetupTupleQueues function,
for (i = 0; i < pcxt->nworkers; ++i)
{
...
(Here i is Int but arg to shm_mq_create is Size so when worker is beyond 32767 then 32767*65536 will overflow
the integer boundary, and it will access the illegal memory and will crash or corrupt some memory. Need to typecast
i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * PARALLEL_TUPLE_QUEUE_SIZE and this will fix
mq = shm_mq_create(tqueuespace + i * PARALLEL_TUPLE_QUEUE_SIZE, (Size)PARALLEL_TUPLE_QUEUE_SIZE);
...
}
Below attached patch will fix this issue, Apart from here I have done typecasting at other places also wherever its needed.
typecasting at other places will fix other issue (ERROR: requested shared memory size overflows size_t) also
described in below mail thread
--
Вложения
Dilip Kumar <dilipbalaut@gmail.com> writes: > When parallel degree is set to very high say 70000, there is a segmentation > fault in parallel code, > and that is because type casting is missing in the code.. I'd say the cause is not having a sane range limit on the GUC. > or corrupt some memory. Need to typecast > *i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and > this will fix That might "fix" it on 64-bit machines, but not 32-bit. regards, tom lane
On Wed, May 4, 2016 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dilip Kumar <dilipbalaut@gmail.com> writes:
> > When parallel degree is set to very high say 70000, there is a segmentation
> > fault in parallel code,
> > and that is because type casting is missing in the code..
>
> I'd say the cause is not having a sane range limit on the GUC.
>
>
> Dilip Kumar <dilipbalaut@gmail.com> writes:
> > When parallel degree is set to very high say 70000, there is a segmentation
> > fault in parallel code,
> > and that is because type casting is missing in the code..
>
> I'd say the cause is not having a sane range limit on the GUC.
>
I think it might not be advisable to have this value more than the number of CPU cores, so how about limiting it to 512 or 1024?
On Wed, May 4, 2016 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dilip Kumar <dilipbalaut@gmail.com> writes: >> When parallel degree is set to very high say 70000, there is a segmentation >> fault in parallel code, >> and that is because type casting is missing in the code.. > > I'd say the cause is not having a sane range limit on the GUC. > >> or corrupt some memory. Need to typecast >> *i * PARALLEL_TUPLE_QUEUE_SIZE --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and >> this will fix > > That might "fix" it on 64-bit machines, but not 32-bit. Yeah, I think what we should do here is use mul_size(), which will error out instead of crashing. Putting a range limit on the GUC is a good idea, too, but I like having overflow checks built into these code paths as a backstop, in case a value that we think is a safe upper limit turns out to be less safe than we think ... especially on 32-bit platforms. I'll go do that, and also limit the maximum parallel degree to 1024, which ought to be enough for anyone (see what I did there?). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company