Обсуждение: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

Поиск
Список
Период
Сортировка

BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created

От
dwaller@microsoft.com
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIzNwpMb2dnZWQgYnk6ICAg
ICAgICAgIERhdmlkIFdhbGxlcgpFbWFpbCBhZGRyZXNzOiAgICAgIGR3YWxs
ZXJAbWljcm9zb2Z0LmNvbQpQb3N0Z3JlU1FMIHZlcnNpb246IDkuMy43Ck9w
ZXJhdGluZyBzeXN0ZW06ICAgTGludXggKFVidW50dSkKRGVzY3JpcHRpb246
ICAgICAgICAKClN1bW1hcnk6IFdoaWxlIHJ1bm5pbmcg4oCYY3JlYXRlIGlu
ZGV4IGNvbmN1cnJlbnRseeKAmSBvbiBhIHZlcnkgbGFyZ2UgdGFibGUsCnJ1
bm5pbmcg4oCYZHJvcCBpbmRleOKAmSBmb3IgdGhlIHNhbWUgaW5kZXggY2F1
c2VkIFBvc3RncmVzIHRvIHBlcmZvcm0gdGVycmlibHkKYmFkbHksIHVudGls
IHRoZSDigJhjcmVhdGUgaW5kZXjigJkgc2VydmVyIHByb2Nlc3Mgd2FzIGtp
bGxlZC4gIEkgd291bGQgZXhwZWN0CnRoYXQgdGhlIOKAmGRyb3AgaW5kZXji
gJkgd291bGQgZWl0aGVyIGZhaWwgaW1tZWRpYXRlbHksIG9yIHdhaXQsIHdp
dGhvdXQKcGVyZm9ybWFuY2UgaW1wYWN0LCB1bnRpbCB0aGUg4oCYY3JlYXRl
IGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQgY29tcGxldGVkLg0KDQpJbiBk
ZXRhaWw6DQoNCkkgc3RhcnRlZCBhZGRpbmcgYW4gaW5kZXggdG8gYSBsYXJn
ZSB0YWJsZSB1c2luZyAnY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseQouLi4n
ICAoSSB3YXMgcnVubmluZyB0aGUgY29tbWFuZCB1c2luZyBhIFJhaWxzIG1p
Z3JhdGlvbiwgY29ubmVjdGluZyB0bwpQb3N0Z3JlcyByZW1vdGVseSwgdmlh
IFBnQm91bmNlci4pICBJdCBzZWVtcyB0aGF0IHNvbWV0aGluZyBmYWlsZWQg
aW4gdGhlCmNvbm5lY3Rpb24gdG8gUG9zdGdyZXMgYXMgbXkgbWlncmF0aW9u
IHRlcm1pbmF0ZWQgd2l0aCBhbiBlcnJvciBhZnRlciBhYm91dAoxLjUgaG91
cnMgKGZvciBjb21wYXJpc29uIGNyZWF0aW5nIHRoaXMgaW5kZXggc3VjY2Vz
c2Z1bGx5IHRvb2sgYWJvdXQgMy41CmhvdXJzKS4NCg0KXGQgdGFibGVfbmFt
ZSBzaG93ZWQgdGhlIGluZGV4IG1hcmtlZCBhcyBpbnZhbGlkIHNvIEkgcmFu
ICdkcm9wIGluZGV4JywKaW50ZW5kaW5nIHRvIHJ1biB0aGUgaW5kZXggY3Jl
YXRpb24gYWdhaW4uIFRoZSBhdmVyYWdlIHF1ZXJ5IGR1cmF0aW9uCmFnYWlu
c3QgdGhpcyBkYXRhYmFzZSBpbW1lZGlhdGVseSBpbmNyZWFzZWQgZnJvbSB+
MW1zIHRvIH4yNTBtcywgYW5kIHN0YXllZAppbiB0aGUgcmFuZ2UgNTDigJMz
NTBtcyBmb3IgdGhlIG5leHQgdGVuIG1pbnV0ZXMsIHVudGlsIHNvbWVvbmUg
c3BvdHRlZCB0aGF0CnRoZXJlIHdhcyBzdGlsbCBhIHByb2Nlc3Mgb24gdGhl
IHNlcnZlciB0aGF0IHdhcyBydW5uaW5nIOKAmGNyZWF0ZSBpbmRleOKAmS4g
CktpbGxpbmcgdGhhdCBwcm9jZXNzIGNhdXNlZCBwZXJmb3JtYW5jZSB0byBy
ZXR1cm4gdG8gbm9ybWFsLg0KDQpUaGlzIHdhcyByYXRoZXIgYmFkIGZvciB0
aGUgYXZhaWxhYmlsaXR5IG9mIG91ciBhcHBsaWNhdGlvbiBkdXJpbmcgdGhv
c2UgdGVuCm1pbnV0ZXMhDQoNCkkgd291bGQgZXhwZWN0IHRoYXQgcnVubmlu
ZyDigJhkcm9wIGluZGV44oCZIHdoaWxlIHRoYXQgaW5kZXggaXMgc3RpbGwg
YmVpbmcKY3JlYXRlZCB3b3VsZCBlaXRoZXIgZmFpbCBpbW1lZGlhdGVseSwg
b3Igd2FpdCwgd2l0aG91dCBwZXJmb3JtYW5jZSBpbXBhY3QsCnVudGlsIHRo
ZSDigJhjcmVhdGUgaW5kZXggY29uY3VycmVudGx54oCZIGhhZCBjb21wbGV0
ZWQuCgo=
dwaller@microsoft.com writes:
> Summary: While running ‘create index concurrently’ on a very large table,
> running ‘drop index’ for the same index caused Postgres to perform terribly
> badly, until the ‘create index’ server process was killed.  I would expect
> that the ‘drop index’ would either fail immediately, or wait, without
> performance impact, until the ‘create index concurrently’ had completed.

Hmm, the DROP INDEX should have blocked waiting for an exclusive lock on
the table, and then other queries on the table should have queued up
behind that lock request.  I think the fact that they didn't just stop
dead probably indicates that when the deadlock checker ran, it concluded
it needed to let them jump the queue and go in front of the DROP INDEX to
avoid a deadlock --- likely because the lock manager could see that the
DROP was blocked by the CREATE INDEX CONCURRENTLY while the latter was
waiting for the other queries to finish.  So your slowdown corresponds to
an extra wait of deadlock_timeout ms per query.  This is not a bug, but
designed behavior.

> I would expect that running ‘drop index’ while that index is still being
> created would either fail immediately, or wait, without performance impact,
> until the ‘create index concurrently’ had completed.

If the DROP's lock request is not allowed to block other incoming requests
for a table lock, the DROP could face indefinite lock starvation.  That is
not better.  Reducing deadlock_timeout to zero is not going to improve
your overall performance, either.

            regards, tom lane
T24gMDgvMDcvMjAxNiwgMTY6MDgsICJUb20gTGFuZSIgPHRnbEBzc3MucGdoLnBhLnVzPiB3cm90
ZToNCj4gZHdhbGxlckBtaWNyb3NvZnQuY29tIHdyaXRlczoNCj4gPiBTdW1tYXJ5OiBXaGlsZSBy
dW5uaW5nIOKAmGNyZWF0ZSBpbmRleCBjb25jdXJyZW50bHnigJkgb24gYSB2ZXJ5IGxhcmdlIHRh
YmxlLA0KPiA+IHJ1bm5pbmcg4oCYZHJvcCBpbmRleOKAmSBmb3IgdGhlIHNhbWUgaW5kZXggY2F1
c2VkIFBvc3RncmVzIHRvIHBlcmZvcm0gdGVycmlibHkNCj4gPiBiYWRseSwgdW50aWwgdGhlIOKA
mGNyZWF0ZSBpbmRleOKAmSBzZXJ2ZXIgcHJvY2VzcyB3YXMga2lsbGVkLiAgSSB3b3VsZCBleHBl
Y3QNCj4gPiB0aGF0IHRoZSDigJhkcm9wIGluZGV44oCZIHdvdWxkIGVpdGhlciBmYWlsIGltbWVk
aWF0ZWx5LCBvciB3YWl0LCB3aXRob3V0DQo+ID4gcGVyZm9ybWFuY2UgaW1wYWN0LCB1bnRpbCB0
aGUg4oCYY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQgY29tcGxldGVkLg0KPiANCj4g
SG1tLCB0aGUgRFJPUCBJTkRFWCBzaG91bGQgaGF2ZSBibG9ja2VkIHdhaXRpbmcgZm9yIGFuIGV4
Y2x1c2l2ZSBsb2NrIG9uDQo+IHRoZSB0YWJsZSwgYW5kIHRoZW4gb3RoZXIgcXVlcmllcyBvbiB0
aGUgdGFibGUgc2hvdWxkIGhhdmUgcXVldWVkIHVwDQo+IGJlaGluZCB0aGF0IGxvY2sgcmVxdWVz
dC4gIEkgdGhpbmsgdGhlIGZhY3QgdGhhdCB0aGV5IGRpZG4ndCBqdXN0IHN0b3ANCj4gZGVhZCBw
cm9iYWJseSBpbmRpY2F0ZXMgdGhhdCB3aGVuIHRoZSBkZWFkbG9jayBjaGVja2VyIHJhbiwgaXQg
Y29uY2x1ZGVkDQo+IGl0IG5lZWRlZCB0byBsZXQgdGhlbSBqdW1wIHRoZSBxdWV1ZSBhbmQgZ28g
aW4gZnJvbnQgb2YgdGhlIERST1AgSU5ERVggdG8NCj4gYXZvaWQgYSBkZWFkbG9jayAtLS0gbGlr
ZWx5IGJlY2F1c2UgdGhlIGxvY2sgbWFuYWdlciBjb3VsZCBzZWUgdGhhdCB0aGUNCj4gRFJPUCB3
YXMgYmxvY2tlZCBieSB0aGUgQ1JFQVRFIElOREVYIENPTkNVUlJFTlRMWSB3aGlsZSB0aGUgbGF0
dGVyIHdhcw0KPiB3YWl0aW5nIGZvciB0aGUgb3RoZXIgcXVlcmllcyB0byBmaW5pc2guICBTbyB5
b3VyIHNsb3dkb3duIGNvcnJlc3BvbmRzIHRvDQo+IGFuIGV4dHJhIHdhaXQgb2YgZGVhZGxvY2tf
dGltZW91dCBtcyBwZXIgcXVlcnkuICBUaGlzIGlzIG5vdCBhIGJ1ZywgYnV0DQo+IGRlc2lnbmVk
IGJlaGF2aW9yLg0KPg0KPiA+IEkgd291bGQgZXhwZWN0IHRoYXQgcnVubmluZyDigJhkcm9wIGlu
ZGV44oCZIHdoaWxlIHRoYXQgaW5kZXggaXMgc3RpbGwgYmVpbmcNCj4gPiBjcmVhdGVkIHdvdWxk
IGVpdGhlciBmYWlsIGltbWVkaWF0ZWx5LCBvciB3YWl0LCB3aXRob3V0IHBlcmZvcm1hbmNlIGlt
cGFjdCwNCj4gPiB1bnRpbCB0aGUg4oCYY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQg
Y29tcGxldGVkLg0KPiANCj4gSWYgdGhlIERST1AncyBsb2NrIHJlcXVlc3QgaXMgbm90IGFsbG93
ZWQgdG8gYmxvY2sgb3RoZXIgaW5jb21pbmcgcmVxdWVzdHMNCj4gZm9yIGEgdGFibGUgbG9jaywg
dGhlIERST1AgY291bGQgZmFjZSBpbmRlZmluaXRlIGxvY2sgc3RhcnZhdGlvbi4gIFRoYXQgaXMN
Cj4gbm90IGJldHRlci4gIFJlZHVjaW5nIGRlYWRsb2NrX3RpbWVvdXQgdG8gemVybyBpcyBub3Qg
Z29pbmcgdG8gaW1wcm92ZQ0KPiB5b3VyIG92ZXJhbGwgcGVyZm9ybWFuY2UsIGVpdGhlci4NCg0K
VGhhbmsgeW91IGZvciB0aGUgZGV0YWlsZWQgZXhwbGFuYXRpb24uICBUaGlzIGFsbCBzZWVtcyB2
ZXJ5IHNlbnNpYmxlLCBhbmQNCnJlYXNvbmFibGUgYmVoYXZpb3VyIGZyb20gUG9zdGdyZXMuICBZ
ZXQuLi4gaXQgc3RpbGwgJ2FsbG93ZWQnIG1lIHRvIHNob290IG15c2VsZg0KcGFpbmZ1bGx5IGlu
IHRoZSBmb290LiAgVXNlciBlcnJvciwgSSBhZ3JlZSwgeWV0IHBlb3BsZSBtYWtlIG1pc3Rha2Vz
IC0gY291bGQNClBvc3RncmVzIGJlaGF2ZSBtb3JlIGdyYWNlZnVsbHk/ICANCg0KRm9yIGV4YW1w
bGUsIHdvdWxkIGl0IGJlIGF0IGFsbCBmZWFzaWJsZSBmb3IgUG9zdGdyZXMgdG8gaGFuZGxlIERE
TCBzdGF0ZW1lbnRzDQpkaWZmZXJlbnRseSBmcm9tIHJlZ3VsYXIgcmVxdWVzdHM/ICBJbiB0aGlz
IGV4YW1wbGUgaXQgd2FzIHBvaW50bGVzcyBmb3IgRFJPUA0KSU5ERVggdG8gdGFrZSBhbnkgbG9j
a3Mgd2hpbGUgdGhlcmUgd2FzIGFscmVhZHkgYW5vdGhlciBEREwgc3RhdGVtZW50IChDUkVBVEUN
CklOREVYKSBydW5uaW5nLiAgQ291bGQgaXQgaGF2ZSBiZWVuIGFkZGVkIHRvIGEgcXVldWUgb2Yg
RERMIHN0YXRlbWVudHMgYWdhaW5zdA0KdGhhdCB0YWJsZSBhbmQgbm90IGF0dGVtcHRlZCB0byB0
YWtlIGEgbG9jayB1bnRpbCBDUkVBVEUgSU5ERVggY29tcGxldGVkIGFuZA0KRFJPUCBJTkRFWCB0
aGVuIHJlYWNoZWQgdGhlIGhlYWQgb2YgdGhlIHF1ZXVlPyAgDQoNClRoYW5rcywNCg0KRGF2aWQN
Cg0K
David Waller <dwaller@yammer-inc.com> writes:
> Thank you for the detailed explanation.  This all seems very sensible, and
> reasonable behaviour from Postgres.  Yet... it still 'allowed' me to shoot myself
> painfully in the foot.  User error, I agree, yet people make mistakes - could
> Postgres behave more gracefully?

Well, there are always tradeoffs.  You could choose to run with a
non-infinite setting of lock_timeout, which would have caused the DROP to
fail after waiting a second or two (or whatever you set the timeout to
be).  That would move the denial of service over to the problematic DDL,
which might be a good tradeoff for your environment.  But not everybody is
going to think that query failure is a "more graceful" solution.

> For example, would it be at all feasible for Postgres to handle DDL statements
> differently from regular requests?  In this example it was pointless for DROP
> INDEX to take any locks while there was already another DDL statement (CREATE
> INDEX) running.  Could it have been added to a queue of DDL statements against
> that table and not attempted to take a lock until CREATE INDEX completed and
> DROP INDEX then reached the head of the queue?

This is handwaving: the DROP already was in a lock queue.  I really doubt
there are any easy fixes that won't create as many problems as they solve.

            regards, tom lane
T24gMTEvMDcvMjAxNiwgMTg6MDgsICJUb20gTGFuZSIgPHRnbEBzc3MucGdoLnBhLnVzPiB3cm90
ZToNCj4gRGF2aWQgV2FsbGVyIDxkd2FsbGVyQHlhbW1lci1pbmMuY29tPiB3cml0ZXM6DQo+ID4g
VGhhbmsgeW91IGZvciB0aGUgZGV0YWlsZWQgZXhwbGFuYXRpb24uICBUaGlzIGFsbCBzZWVtcyB2
ZXJ5IHNlbnNpYmxlLCBhbmQNCj4gPiByZWFzb25hYmxlIGJlaGF2aW91ciBmcm9tIFBvc3RncmVz
LiAgWWV0Li4uIGl0IHN0aWxsICdhbGxvd2VkJyBtZSB0byBzaG9vdCBteXNlbGYNCj4gPiBwYWlu
ZnVsbHkgaW4gdGhlIGZvb3QuICBVc2VyIGVycm9yLCBJIGFncmVlLCB5ZXQgcGVvcGxlIG1ha2Ug
bWlzdGFrZXMgLSBjb3VsZA0KPiA+IFBvc3RncmVzIGJlaGF2ZSBtb3JlIGdyYWNlZnVsbHk/ICAN
Cj4gDQo+IFdlbGwsIHRoZXJlIGFyZSBhbHdheXMgdHJhZGVvZmZzLiAgWW91IGNvdWxkIGNob29z
ZSB0byBydW4gd2l0aCBhDQo+IG5vbi1pbmZpbml0ZSBzZXR0aW5nIG9mIGxvY2tfdGltZW91dCwg
d2hpY2ggd291bGQgaGF2ZSBjYXVzZWQgdGhlIERST1AgdG8NCj4gZmFpbCBhZnRlciB3YWl0aW5n
IGEgc2Vjb25kIG9yIHR3byAob3Igd2hhdGV2ZXIgeW91IHNldCB0aGUgdGltZW91dCB0bw0KPiBi
ZSkuICBUaGF0IHdvdWxkIG1vdmUgdGhlIGRlbmlhbCBvZiBzZXJ2aWNlIG92ZXIgdG8gdGhlIHBy
b2JsZW1hdGljIERETCwNCj4gd2hpY2ggbWlnaHQgYmUgYSBnb29kIHRyYWRlb2ZmIGZvciB5b3Vy
IGVudmlyb25tZW50LiAgQnV0IG5vdCBldmVyeWJvZHkgaXMNCj4gZ29pbmcgdG8gdGhpbmsgdGhh
dCBxdWVyeSBmYWlsdXJlIGlzIGEgIm1vcmUgZ3JhY2VmdWwiIHNvbHV0aW9uLg0KDQpUaGFuayB5
b3UhICBsb2NrX3RpbWVvdXQgc291bmRzIGxpa2UgZXhhY3RseSB3aGF0IEkgbmVlZCB0byBzZXQg
LSB0aGFuayB5b3UgZm9yDQpoZWxwaW5nIG91dC4gIA0KDQpOb3cgSSBrbm93IGFib3V0IGxvY2tf
dGltZW91dCBJIGFncmVlLCB0aGVyZSdzIG5vdGhpbmcgd29ydGggY2hhbmdpbmcgaGVyZS4NClBv
c3RncmVzIGFscmVhZHkgaGFzIHRoZSB0b29scyBidWlsdCBpbiB0byBhbGxvdyBtZSB0byBnZXQg
dGhlIGJlaGF2aW91ciBJDQp3YW50ZWQgaW4gdGhpcyBzaXR1dGlvbi4gIA0KDQpUaGFuayB5b3Ug
Zm9yIHlvdXIgcGF0aWVudCBleHBsYW5hdGlvbnMuDQoNCkRhdmlkDQoNCg0K