Обсуждение: Killing a data modifying transaction

Поиск
Список
Период
Сортировка

Killing a data modifying transaction

От
William Temperley
Дата:
Hi All,

I've got two transactions I tried to kill 3 days ago using "select
pg_cancel_backend(<pid>)", then SIGTERM, and have since then been
using 100% of a cpu core each. They were supposed to insert the
results of large unions with PostGIS and appear to have failed.
Could someone tell me what's the least worst option here please? If I
kill -9 will I corrupt my data directory?  (I've searched for an
answer in the FAQ and the wiki, to no avail). I'm on Ubuntu server
8.04.1 and Postgres 8.3.7.

Thanks and regards,

Will Temperley

Re: Killing a data modifying transaction

От
Tom Lane
Дата:
William Temperley <willtemperley@gmail.com> writes:
> I've got two transactions I tried to kill 3 days ago using "select
> pg_cancel_backend(<pid>)", then SIGTERM, and have since then been
> using 100% of a cpu core each. They were supposed to insert the
> results of large unions with PostGIS and appear to have failed.
> Could someone tell me what's the least worst option here please? If I
> kill -9 will I corrupt my data directory?

No, you'll just lose all your open sessions.

It might be worth trying to identify where they're looping before
you zap them, though.  A stack trace from gdb would help.

            regards, tom lane

Re: Killing a data modifying transaction

От
William Temperley
Дата:
2009/6/22 Tom Lane <tgl@sss.pgh.pa.us>:
> William Temperley <willtemperley@gmail.com> writes:
>> I've got two transactions I tried to kill 3 days ago using "select
>> pg_cancel_backend(<pid>)", then SIGTERM, and have since then been
>> using 100% of a cpu core each. They were supposed to insert the
>> results of large unions with PostGIS and appear to have failed.
>> Could someone tell me what's the least worst option here please? If I
>> kill -9 will I corrupt my data directory?
>
> No, you'll just lose all your open sessions.
>
> It might be worth trying to identify where they're looping before
> you zap them, though.  A stack trace from gdb would help.
>
>                        regards, tom lane
>

Thanks Tom.

I'm wondering if I happened as I'd started the same query twice.
The first had work_mem = 1MB so I tried to kill it and started another
with work_mem = 1000MB, but both were attempting to insert the same id
into a PK:
"insert into world (geom, id) select st_union(geom), 1 from adminunit
where admin_level = '0'".
Just now when I killed the first process, the other terminated.

I'll run the query again and see if it wasn't just my impatience that
caused it - and post the stack trace if it fails.

Thanks,

Will Temperley.

Re: Killing a data modifying transaction

От
Tom Lane
Дата:
William Temperley <willtemperley@gmail.com> writes:
> I'm wondering if I happened as I'd started the same query twice.
> The first had work_mem = 1MB so I tried to kill it and started another
> with work_mem = 1000MB, but both were attempting to insert the same id
> into a PK:
> "insert into world (geom, id) select st_union(geom), 1 from adminunit
> where admin_level = '0'".
> Just now when I killed the first process, the other terminated.

Well, that last is expected --- as soon as you kill -9 one backend, the
postmaster is going to force-quit all the others and perform a database
restart.  So we don't really know anything more than before.

Given that they'd both be trying to insert the same PK values, it'd be
unsurprising for one of the processes to be blocked waiting to see if
the other one commits.  But didn't you say they were both eating CPU?

I'm personally inclined to think some PostGIS oddity here (which means
you might get more help asking about it on the postgis lists).  But
that's mere speculation.  A stack trace showing where it was looping
would've provided something more to go on ...

            regards, tom lane

Re: Killing a data modifying transaction

От
William Temperley
Дата:
2009/6/22 Tom Lane <tgl@sss.pgh.pa.us>:
> William Temperley <willtemperley@gmail.com> writes:
>> I'm wondering if I happened as I'd started the same query twice.
>> The first had work_mem = 1MB so I tried to kill it and started another
>> with work_mem = 1000MB, but both were attempting to insert the same id
>> into a PK:
>> "insert into world (geom, id) select st_union(geom), 1 from adminunit
>> where admin_level = '0'".
>> Just now when I killed the first process, the other terminated.
>
> Well, that last is expected --- as soon as you kill -9 one backend, the
> postmaster is going to force-quit all the others and perform a database
> restart.  So we don't really know anything more than before.
>
> Given that they'd both be trying to insert the same PK values, it'd be
> unsurprising for one of the processes to be blocked waiting to see if
> the other one commits.  But didn't you say they were both eating CPU?
>
> I'm personally inclined to think some PostGIS oddity here (which means
> you might get more help asking about it on the postgis lists).  But
> that's mere speculation.  A stack trace showing where it was looping
> would've provided something more to go on ...
>

Yes they were both eating CPU. I've tried the query again and it seems
to be stuck again - the trace has been the same for an hour or so now.
I guess I'd best post to the PostGIS list.

#0  0x00007fae68ec6a75 in
geos::DefaultCoordinateSequence::~DefaultCoordinateSequence () from
/usr/lib/libgeos.so.2
#1  0x00007fae68ed433e in geos::LineString::~LineString () from
/usr/lib/libgeos.so.2
#2  0x00007fae68ed23c7 in geos::LinearRing::~LinearRing () from
/usr/lib/libgeos.so.2
#3  0x00007fae68f20e68 in geos::PolygonBuilder::findEdgeRingContaining
() from /usr/lib/libgeos.so.2
#4  0x00007fae68f21740 in geos::PolygonBuilder::placeFreeHoles () from
/usr/lib/libgeos.so.2
#5  0x00007fae68f218d6 in geos::PolygonBuilder::add () from
/usr/lib/libgeos.so.2
#6  0x00007fae68f21a73 in geos::PolygonBuilder::add () from
/usr/lib/libgeos.so.2
#7  0x00007fae68f20204 in geos::OverlayOp::computeOverlay () from
/usr/lib/libgeos.so.2
#8  0x00007fae68f203e9 in geos::OverlayOp::getResultGeometry () from
/usr/lib/libgeos.so.2
#9  0x00007fae68f206bb in geos::OverlayOp::overlayOp () from
/usr/lib/libgeos.so.2
#10 0x00007fae68ecbfcc in geos::Geometry::Union () from /usr/lib/libgeos.so.2
#11 0x00007fae693c323c in GEOSUnion () from /usr/lib/libgeos_c.so.1
#12 0x00007fae695f3b4a in unite_garray () from
/usr/lib/postgresql/8.3/lib/liblwgeom.so
#13 0x00000000005387bb in ?? ()
#14 0x0000000000538aa7 in ExecAgg ()
#15 0x000000000052e08d in ExecProcNode ()
#16 0x0000000000542b60 in ?? ()
#17 0x0000000000534916 in ExecScan ()
#18 0x000000000052e11d in ExecProcNode ()
#19 0x000000000052d1e2 in ExecutorRun ()
#20 0x00000000005c73c2 in ?? ()
#21 0x00000000005c75d5 in ?? ()
#22 0x00000000005c7e74 in PortalRun ()
#23 0x00000000005c394a in ?? ()
#24 0x00000000005c47bc in PostgresMain ()
#25 0x0000000000599424 in ?? ()
#26 0x000000000059a1a1 in PostmasterMain ()
#27 0x000000000055123e in main ()

Best regards,

Will