Обсуждение: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created
BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created
От
dwaller@microsoft.com
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIzNwpMb2dnZWQgYnk6ICAg ICAgICAgIERhdmlkIFdhbGxlcgpFbWFpbCBhZGRyZXNzOiAgICAgIGR3YWxs ZXJAbWljcm9zb2Z0LmNvbQpQb3N0Z3JlU1FMIHZlcnNpb246IDkuMy43Ck9w ZXJhdGluZyBzeXN0ZW06ICAgTGludXggKFVidW50dSkKRGVzY3JpcHRpb246 ICAgICAgICAKClN1bW1hcnk6IFdoaWxlIHJ1bm5pbmcg4oCYY3JlYXRlIGlu ZGV4IGNvbmN1cnJlbnRseeKAmSBvbiBhIHZlcnkgbGFyZ2UgdGFibGUsCnJ1 bm5pbmcg4oCYZHJvcCBpbmRleOKAmSBmb3IgdGhlIHNhbWUgaW5kZXggY2F1 c2VkIFBvc3RncmVzIHRvIHBlcmZvcm0gdGVycmlibHkKYmFkbHksIHVudGls IHRoZSDigJhjcmVhdGUgaW5kZXjigJkgc2VydmVyIHByb2Nlc3Mgd2FzIGtp bGxlZC4gIEkgd291bGQgZXhwZWN0CnRoYXQgdGhlIOKAmGRyb3AgaW5kZXji gJkgd291bGQgZWl0aGVyIGZhaWwgaW1tZWRpYXRlbHksIG9yIHdhaXQsIHdp dGhvdXQKcGVyZm9ybWFuY2UgaW1wYWN0LCB1bnRpbCB0aGUg4oCYY3JlYXRl IGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQgY29tcGxldGVkLg0KDQpJbiBk ZXRhaWw6DQoNCkkgc3RhcnRlZCBhZGRpbmcgYW4gaW5kZXggdG8gYSBsYXJn ZSB0YWJsZSB1c2luZyAnY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseQouLi4n ICAoSSB3YXMgcnVubmluZyB0aGUgY29tbWFuZCB1c2luZyBhIFJhaWxzIG1p Z3JhdGlvbiwgY29ubmVjdGluZyB0bwpQb3N0Z3JlcyByZW1vdGVseSwgdmlh IFBnQm91bmNlci4pICBJdCBzZWVtcyB0aGF0IHNvbWV0aGluZyBmYWlsZWQg aW4gdGhlCmNvbm5lY3Rpb24gdG8gUG9zdGdyZXMgYXMgbXkgbWlncmF0aW9u IHRlcm1pbmF0ZWQgd2l0aCBhbiBlcnJvciBhZnRlciBhYm91dAoxLjUgaG91 cnMgKGZvciBjb21wYXJpc29uIGNyZWF0aW5nIHRoaXMgaW5kZXggc3VjY2Vz c2Z1bGx5IHRvb2sgYWJvdXQgMy41CmhvdXJzKS4NCg0KXGQgdGFibGVfbmFt ZSBzaG93ZWQgdGhlIGluZGV4IG1hcmtlZCBhcyBpbnZhbGlkIHNvIEkgcmFu ICdkcm9wIGluZGV4JywKaW50ZW5kaW5nIHRvIHJ1biB0aGUgaW5kZXggY3Jl YXRpb24gYWdhaW4uIFRoZSBhdmVyYWdlIHF1ZXJ5IGR1cmF0aW9uCmFnYWlu c3QgdGhpcyBkYXRhYmFzZSBpbW1lZGlhdGVseSBpbmNyZWFzZWQgZnJvbSB+ MW1zIHRvIH4yNTBtcywgYW5kIHN0YXllZAppbiB0aGUgcmFuZ2UgNTDigJMz NTBtcyBmb3IgdGhlIG5leHQgdGVuIG1pbnV0ZXMsIHVudGlsIHNvbWVvbmUg c3BvdHRlZCB0aGF0CnRoZXJlIHdhcyBzdGlsbCBhIHByb2Nlc3Mgb24gdGhl IHNlcnZlciB0aGF0IHdhcyBydW5uaW5nIOKAmGNyZWF0ZSBpbmRleOKAmS4g CktpbGxpbmcgdGhhdCBwcm9jZXNzIGNhdXNlZCBwZXJmb3JtYW5jZSB0byBy ZXR1cm4gdG8gbm9ybWFsLg0KDQpUaGlzIHdhcyByYXRoZXIgYmFkIGZvciB0 aGUgYXZhaWxhYmlsaXR5IG9mIG91ciBhcHBsaWNhdGlvbiBkdXJpbmcgdGhv c2UgdGVuCm1pbnV0ZXMhDQoNCkkgd291bGQgZXhwZWN0IHRoYXQgcnVubmlu ZyDigJhkcm9wIGluZGV44oCZIHdoaWxlIHRoYXQgaW5kZXggaXMgc3RpbGwg YmVpbmcKY3JlYXRlZCB3b3VsZCBlaXRoZXIgZmFpbCBpbW1lZGlhdGVseSwg b3Igd2FpdCwgd2l0aG91dCBwZXJmb3JtYW5jZSBpbXBhY3QsCnVudGlsIHRo ZSDigJhjcmVhdGUgaW5kZXggY29uY3VycmVudGx54oCZIGhhZCBjb21wbGV0 ZWQuCgo=
dwaller@microsoft.com writes: > Summary: While running âcreate index concurrentlyâ on a very large table, > running âdrop indexâ for the same index caused Postgres to perform terribly > badly, until the âcreate indexâ server process was killed. I would expect > that the âdrop indexâ would either fail immediately, or wait, without > performance impact, until the âcreate index concurrentlyâ had completed. Hmm, the DROP INDEX should have blocked waiting for an exclusive lock on the table, and then other queries on the table should have queued up behind that lock request. I think the fact that they didn't just stop dead probably indicates that when the deadlock checker ran, it concluded it needed to let them jump the queue and go in front of the DROP INDEX to avoid a deadlock --- likely because the lock manager could see that the DROP was blocked by the CREATE INDEX CONCURRENTLY while the latter was waiting for the other queries to finish. So your slowdown corresponds to an extra wait of deadlock_timeout ms per query. This is not a bug, but designed behavior. > I would expect that running âdrop indexâ while that index is still being > created would either fail immediately, or wait, without performance impact, > until the âcreate index concurrentlyâ had completed. If the DROP's lock request is not allowed to block other incoming requests for a table lock, the DROP could face indefinite lock starvation. That is not better. Reducing deadlock_timeout to zero is not going to improve your overall performance, either. regards, tom lane
T24gMDgvMDcvMjAxNiwgMTY6MDgsICJUb20gTGFuZSIgPHRnbEBzc3MucGdoLnBhLnVzPiB3cm90 ZToNCj4gZHdhbGxlckBtaWNyb3NvZnQuY29tIHdyaXRlczoNCj4gPiBTdW1tYXJ5OiBXaGlsZSBy dW5uaW5nIOKAmGNyZWF0ZSBpbmRleCBjb25jdXJyZW50bHnigJkgb24gYSB2ZXJ5IGxhcmdlIHRh YmxlLA0KPiA+IHJ1bm5pbmcg4oCYZHJvcCBpbmRleOKAmSBmb3IgdGhlIHNhbWUgaW5kZXggY2F1 c2VkIFBvc3RncmVzIHRvIHBlcmZvcm0gdGVycmlibHkNCj4gPiBiYWRseSwgdW50aWwgdGhlIOKA mGNyZWF0ZSBpbmRleOKAmSBzZXJ2ZXIgcHJvY2VzcyB3YXMga2lsbGVkLiAgSSB3b3VsZCBleHBl Y3QNCj4gPiB0aGF0IHRoZSDigJhkcm9wIGluZGV44oCZIHdvdWxkIGVpdGhlciBmYWlsIGltbWVk aWF0ZWx5LCBvciB3YWl0LCB3aXRob3V0DQo+ID4gcGVyZm9ybWFuY2UgaW1wYWN0LCB1bnRpbCB0 aGUg4oCYY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQgY29tcGxldGVkLg0KPiANCj4g SG1tLCB0aGUgRFJPUCBJTkRFWCBzaG91bGQgaGF2ZSBibG9ja2VkIHdhaXRpbmcgZm9yIGFuIGV4 Y2x1c2l2ZSBsb2NrIG9uDQo+IHRoZSB0YWJsZSwgYW5kIHRoZW4gb3RoZXIgcXVlcmllcyBvbiB0 aGUgdGFibGUgc2hvdWxkIGhhdmUgcXVldWVkIHVwDQo+IGJlaGluZCB0aGF0IGxvY2sgcmVxdWVz dC4gIEkgdGhpbmsgdGhlIGZhY3QgdGhhdCB0aGV5IGRpZG4ndCBqdXN0IHN0b3ANCj4gZGVhZCBw cm9iYWJseSBpbmRpY2F0ZXMgdGhhdCB3aGVuIHRoZSBkZWFkbG9jayBjaGVja2VyIHJhbiwgaXQg Y29uY2x1ZGVkDQo+IGl0IG5lZWRlZCB0byBsZXQgdGhlbSBqdW1wIHRoZSBxdWV1ZSBhbmQgZ28g aW4gZnJvbnQgb2YgdGhlIERST1AgSU5ERVggdG8NCj4gYXZvaWQgYSBkZWFkbG9jayAtLS0gbGlr ZWx5IGJlY2F1c2UgdGhlIGxvY2sgbWFuYWdlciBjb3VsZCBzZWUgdGhhdCB0aGUNCj4gRFJPUCB3 YXMgYmxvY2tlZCBieSB0aGUgQ1JFQVRFIElOREVYIENPTkNVUlJFTlRMWSB3aGlsZSB0aGUgbGF0 dGVyIHdhcw0KPiB3YWl0aW5nIGZvciB0aGUgb3RoZXIgcXVlcmllcyB0byBmaW5pc2guICBTbyB5 b3VyIHNsb3dkb3duIGNvcnJlc3BvbmRzIHRvDQo+IGFuIGV4dHJhIHdhaXQgb2YgZGVhZGxvY2tf dGltZW91dCBtcyBwZXIgcXVlcnkuICBUaGlzIGlzIG5vdCBhIGJ1ZywgYnV0DQo+IGRlc2lnbmVk IGJlaGF2aW9yLg0KPg0KPiA+IEkgd291bGQgZXhwZWN0IHRoYXQgcnVubmluZyDigJhkcm9wIGlu ZGV44oCZIHdoaWxlIHRoYXQgaW5kZXggaXMgc3RpbGwgYmVpbmcNCj4gPiBjcmVhdGVkIHdvdWxk IGVpdGhlciBmYWlsIGltbWVkaWF0ZWx5LCBvciB3YWl0LCB3aXRob3V0IHBlcmZvcm1hbmNlIGlt cGFjdCwNCj4gPiB1bnRpbCB0aGUg4oCYY3JlYXRlIGluZGV4IGNvbmN1cnJlbnRseeKAmSBoYWQg Y29tcGxldGVkLg0KPiANCj4gSWYgdGhlIERST1AncyBsb2NrIHJlcXVlc3QgaXMgbm90IGFsbG93 ZWQgdG8gYmxvY2sgb3RoZXIgaW5jb21pbmcgcmVxdWVzdHMNCj4gZm9yIGEgdGFibGUgbG9jaywg dGhlIERST1AgY291bGQgZmFjZSBpbmRlZmluaXRlIGxvY2sgc3RhcnZhdGlvbi4gIFRoYXQgaXMN Cj4gbm90IGJldHRlci4gIFJlZHVjaW5nIGRlYWRsb2NrX3RpbWVvdXQgdG8gemVybyBpcyBub3Qg Z29pbmcgdG8gaW1wcm92ZQ0KPiB5b3VyIG92ZXJhbGwgcGVyZm9ybWFuY2UsIGVpdGhlci4NCg0K VGhhbmsgeW91IGZvciB0aGUgZGV0YWlsZWQgZXhwbGFuYXRpb24uICBUaGlzIGFsbCBzZWVtcyB2 ZXJ5IHNlbnNpYmxlLCBhbmQNCnJlYXNvbmFibGUgYmVoYXZpb3VyIGZyb20gUG9zdGdyZXMuICBZ ZXQuLi4gaXQgc3RpbGwgJ2FsbG93ZWQnIG1lIHRvIHNob290IG15c2VsZg0KcGFpbmZ1bGx5IGlu IHRoZSBmb290LiAgVXNlciBlcnJvciwgSSBhZ3JlZSwgeWV0IHBlb3BsZSBtYWtlIG1pc3Rha2Vz IC0gY291bGQNClBvc3RncmVzIGJlaGF2ZSBtb3JlIGdyYWNlZnVsbHk/ICANCg0KRm9yIGV4YW1w bGUsIHdvdWxkIGl0IGJlIGF0IGFsbCBmZWFzaWJsZSBmb3IgUG9zdGdyZXMgdG8gaGFuZGxlIERE TCBzdGF0ZW1lbnRzDQpkaWZmZXJlbnRseSBmcm9tIHJlZ3VsYXIgcmVxdWVzdHM/ICBJbiB0aGlz IGV4YW1wbGUgaXQgd2FzIHBvaW50bGVzcyBmb3IgRFJPUA0KSU5ERVggdG8gdGFrZSBhbnkgbG9j a3Mgd2hpbGUgdGhlcmUgd2FzIGFscmVhZHkgYW5vdGhlciBEREwgc3RhdGVtZW50IChDUkVBVEUN CklOREVYKSBydW5uaW5nLiAgQ291bGQgaXQgaGF2ZSBiZWVuIGFkZGVkIHRvIGEgcXVldWUgb2Yg RERMIHN0YXRlbWVudHMgYWdhaW5zdA0KdGhhdCB0YWJsZSBhbmQgbm90IGF0dGVtcHRlZCB0byB0 YWtlIGEgbG9jayB1bnRpbCBDUkVBVEUgSU5ERVggY29tcGxldGVkIGFuZA0KRFJPUCBJTkRFWCB0 aGVuIHJlYWNoZWQgdGhlIGhlYWQgb2YgdGhlIHF1ZXVlPyAgDQoNClRoYW5rcywNCg0KRGF2aWQN Cg0K
David Waller <dwaller@yammer-inc.com> writes: > Thank you for the detailed explanation. This all seems very sensible, and > reasonable behaviour from Postgres. Yet... it still 'allowed' me to shoot myself > painfully in the foot. User error, I agree, yet people make mistakes - could > Postgres behave more gracefully? Well, there are always tradeoffs. You could choose to run with a non-infinite setting of lock_timeout, which would have caused the DROP to fail after waiting a second or two (or whatever you set the timeout to be). That would move the denial of service over to the problematic DDL, which might be a good tradeoff for your environment. But not everybody is going to think that query failure is a "more graceful" solution. > For example, would it be at all feasible for Postgres to handle DDL statements > differently from regular requests? In this example it was pointless for DROP > INDEX to take any locks while there was already another DDL statement (CREATE > INDEX) running. Could it have been added to a queue of DDL statements against > that table and not attempted to take a lock until CREATE INDEX completed and > DROP INDEX then reached the head of the queue? This is handwaving: the DROP already was in a lock queue. I really doubt there are any easy fixes that won't create as many problems as they solve. regards, tom lane
T24gMTEvMDcvMjAxNiwgMTg6MDgsICJUb20gTGFuZSIgPHRnbEBzc3MucGdoLnBhLnVzPiB3cm90 ZToNCj4gRGF2aWQgV2FsbGVyIDxkd2FsbGVyQHlhbW1lci1pbmMuY29tPiB3cml0ZXM6DQo+ID4g VGhhbmsgeW91IGZvciB0aGUgZGV0YWlsZWQgZXhwbGFuYXRpb24uICBUaGlzIGFsbCBzZWVtcyB2 ZXJ5IHNlbnNpYmxlLCBhbmQNCj4gPiByZWFzb25hYmxlIGJlaGF2aW91ciBmcm9tIFBvc3RncmVz LiAgWWV0Li4uIGl0IHN0aWxsICdhbGxvd2VkJyBtZSB0byBzaG9vdCBteXNlbGYNCj4gPiBwYWlu ZnVsbHkgaW4gdGhlIGZvb3QuICBVc2VyIGVycm9yLCBJIGFncmVlLCB5ZXQgcGVvcGxlIG1ha2Ug bWlzdGFrZXMgLSBjb3VsZA0KPiA+IFBvc3RncmVzIGJlaGF2ZSBtb3JlIGdyYWNlZnVsbHk/ICAN Cj4gDQo+IFdlbGwsIHRoZXJlIGFyZSBhbHdheXMgdHJhZGVvZmZzLiAgWW91IGNvdWxkIGNob29z ZSB0byBydW4gd2l0aCBhDQo+IG5vbi1pbmZpbml0ZSBzZXR0aW5nIG9mIGxvY2tfdGltZW91dCwg d2hpY2ggd291bGQgaGF2ZSBjYXVzZWQgdGhlIERST1AgdG8NCj4gZmFpbCBhZnRlciB3YWl0aW5n IGEgc2Vjb25kIG9yIHR3byAob3Igd2hhdGV2ZXIgeW91IHNldCB0aGUgdGltZW91dCB0bw0KPiBi ZSkuICBUaGF0IHdvdWxkIG1vdmUgdGhlIGRlbmlhbCBvZiBzZXJ2aWNlIG92ZXIgdG8gdGhlIHBy b2JsZW1hdGljIERETCwNCj4gd2hpY2ggbWlnaHQgYmUgYSBnb29kIHRyYWRlb2ZmIGZvciB5b3Vy IGVudmlyb25tZW50LiAgQnV0IG5vdCBldmVyeWJvZHkgaXMNCj4gZ29pbmcgdG8gdGhpbmsgdGhh dCBxdWVyeSBmYWlsdXJlIGlzIGEgIm1vcmUgZ3JhY2VmdWwiIHNvbHV0aW9uLg0KDQpUaGFuayB5 b3UhICBsb2NrX3RpbWVvdXQgc291bmRzIGxpa2UgZXhhY3RseSB3aGF0IEkgbmVlZCB0byBzZXQg LSB0aGFuayB5b3UgZm9yDQpoZWxwaW5nIG91dC4gIA0KDQpOb3cgSSBrbm93IGFib3V0IGxvY2tf dGltZW91dCBJIGFncmVlLCB0aGVyZSdzIG5vdGhpbmcgd29ydGggY2hhbmdpbmcgaGVyZS4NClBv c3RncmVzIGFscmVhZHkgaGFzIHRoZSB0b29scyBidWlsdCBpbiB0byBhbGxvdyBtZSB0byBnZXQg dGhlIGJlaGF2aW91ciBJDQp3YW50ZWQgaW4gdGhpcyBzaXR1dGlvbi4gIA0KDQpUaGFuayB5b3Ug Zm9yIHlvdXIgcGF0aWVudCBleHBsYW5hdGlvbnMuDQoNCkRhdmlkDQoNCg0K