Обсуждение: Random penalties on GIN index updates?
Hi (running PG8.4.1) As far as I have gotten in my test of PG Full Text Search.. I have got over 6m documents indexed so far and the index has grown to 37GB. The systems didnt do any autovacuums in the process but I manually vacuumed a few times and that stopped growth for a short period of time. table_name | index_name | times_used | table_size | index_size | num_writes | definition ------------+-----------------+------------+------------+------------+------------+---------------------------------------------------------------------- ftstest | body_tfs_idx | 171 | 5071 MB | 37 GB | 6122086 | CREATE INDEX ftstest_tfs_idx ON ftstest USING gin (ftstest_body_fts) (1 row) This is sort of what I'd expect this is not more scary than the Xapian index it is comparing with. Search speed seems excellent. But I feel I'm getting a significant drop-off in indexing speed as time goes by, I dont have numbers to confirm this. If i understand the technicalities correct then INSERT/UPDATES to the index will be accumulated in the "maintainance_work_mem" and the "user" being unlucky to fill it up will pay the penalty of merging all the changes into the index? I currently have "maintainance_work_mem" set to 128MB and according to "pg_stat_activity" i currently have a insert sitting for over 1 hour. If I strace the postgres process-id it is reading and writing a lot on the filesystem and imposing an IO-wait load of 1 cpu. Can I do something to prevent this from happening? Is it "by design"? -- Jesper
jesper@krogh.cc writes: > If i understand the technicalities correct then INSERT/UPDATES to the > index will be accumulated in the "maintainance_work_mem" and the "user" > being unlucky to fill it up will pay the penalty of merging all the > changes into the index? You can turn off the "fastupdate" index parameter to disable that, but I think there may be a penalty in index bloat as well as insertion speed. It would be better to use a more conservative work_mem (work_mem, not maintenance_work_mem, is what limits the amount of stuff accumulated during normal inserts). regards, tom lane
Tom Lane wrote: > jesper@krogh.cc writes: >> If i understand the technicalities correct then INSERT/UPDATES to the >> index will be accumulated in the "maintainance_work_mem" and the "user" >> being unlucky to fill it up will pay the penalty of merging all the >> changes into the index? > > You can turn off the "fastupdate" index parameter to disable that, > but I think there may be a penalty in index bloat as well as insertion > speed. It would be better to use a more conservative work_mem > (work_mem, not maintenance_work_mem, is what limits the amount of stuff > accumulated during normal inserts). Ok, I read the manual about that. Seems worth testing, hat I'm seeing is stuff like this: 2009-10-21T16:32:21 2009-10-21T16:32:25 2009-10-21T16:32:30 2009-10-21T16:32:35 2009-10-21T17:10:50 2009-10-21T17:10:59 2009-10-21T17:11:09 ... then it went on steady for another 180.000 documents. Each row is a printout from the application doing INSERTS, it print the time for each 1000 rows it gets through. It is the 38minutes in the middle I'm a bit worried about. work_mem is set to 512MB, that may translate into 180.000 documents in my system? What I seems to miss a way to make sure som "background" application is the one getting the penalty, so a random user doing a single insert won't get stuck. Is that doable? It also seems to lock out other inserts while being in this state. -- Jesper
Jesper Krogh <jesper@krogh.cc> writes: > What I seems to miss a way to make sure som "background" application is > the one getting the penalty, so a random user doing a single insert > won't get stuck. Is that doable? You could force a vacuum every so often, but I don't think that will help the locking situation. You really need to back off work_mem --- 512MB is probably not a sane global value for that anyway. regards, tom lane
On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jesper Krogh <jesper@krogh.cc> writes: >> What I seems to miss a way to make sure som "background" application is >> the one getting the penalty, so a random user doing a single insert >> won't get stuck. Is that doable? > > You could force a vacuum every so often, but I don't think that will > help the locking situation. You really need to back off work_mem --- > 512MB is probably not a sane global value for that anyway. Yeah, it's hard to imagine a system where that doesn't threaten all kinds of other bad results. I bet setting this to 4MB will make this problem largely go away. Arguably we shouldn't be using work_mem to control this particular behavior, but... ...Robert
Robert Haas wrote: > On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jesper Krogh <jesper@krogh.cc> writes: >>> What I seems to miss a way to make sure som "background" application is >>> the one getting the penalty, so a random user doing a single insert >>> won't get stuck. Is that doable? >> You could force a vacuum every so often, but I don't think that will >> help the locking situation. You really need to back off work_mem --- >> 512MB is probably not a sane global value for that anyway. > > Yeah, it's hard to imagine a system where that doesn't threaten all > kinds of other bad results. I bet setting this to 4MB will make this > problem largely go away. > > Arguably we shouldn't be using work_mem to control this particular > behavior, but... I came from Xapian, where you only can have one writer process, but batching up in several GB's improved indexing performance dramatically. Lowering work_mem to 16MB gives "batches" of 11.000 documents and stall between 45 and 90s. ~ 33 docs/s -- Jesper