Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id 52E4141E.60609@fuzzy.cz
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 23.1.2014 17:22, Heikki Linnakangas wrote:
> I measured the time that query takes, and the number of pages hit, using
> "explain (analyze, buffers true) ...".
> 
> patches        time (ms)    buffers
> ---
> unpatched    650        1316
> patch 1        0.52        1316
> patches 1+2    0.50        1316
> patches 1+2+3    0.13        15
> 
> So, the second patch isn't doing much in this particular case. But it's
> trivial, and I think it will make a difference in other queries where
> you have the opportunity skip, but return a lot of tuples overall.
> 
> In summary, these are fairly small patches, and useful on their, so I
> think these should be committed now. But please take a look and see if
> the logic in scanGetItem/keyGetItem looks correct to you. After this, I
> think the main fast scan logic will go into keyGetItem.

Hi,

I've done some testing of the three patches today, and I've ran into an
infinite loop caused by the third patch. I don't know why exactly it
gets stuck, but with patches #1+#2 it works fine, and after applying #3
it runs infinitely.

I can't point to a particular line / condition causing this, but this is
wthat I see in 'perf top'
   54.16%  postgres         [.] gingetbitmap   32.38%  postgres         [.] ginPostingListDecodeAllSegments    3.03%
libc-2.17.so    [.] 0x000000000007fb88
 

I've tracked it down to this loop in ginget.c:840 (I've added the
logging for debugging / demonstration purposes):

=====================================================================
   elog(WARNING, "scanning entries");
   elog(WARNING, "advacepast=(%u,%d)",                 BlockIdGetBlockNumber(&advancePast.ip_blkid),
advancePast.ip_posid);
   while (entry->isFinished == FALSE &&          ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)   {
elog(WARNING,"current=(%u,%d)",                     BlockIdGetBlockNumber(&entry->curItem.ip_blkid),
entry->curItem.ip_posid);       entryGetItem(ginstate, entry, advancePast);   }
 
   elog(WARNING, "entries scanned");

=====================================================================

which is executed repeatedly, but the last invocation gets stuck and
produces this output:
   WARNING:  scanning entries   WARNING:  advacepast=(172058,0)   LOG:  entryLoadMoreItems, 172058/0, skip: 1
WARNING: getting item current=(171493,7)   WARNING:  getting item current=(116833,2)   WARNING:  getting item
current=(116833,3)  WARNING:  getting item current=(116833,4)   WARNING:  getting item current=(116833,5)   WARNING:
gettingitem current=(116838,1)   WARNING:  getting item current=(116838,2)
 
   ... increasing sequence of block IDs ...
   WARNING:  getting item current=(170743,5)   WARNING:  getting item current=(170746,4)   WARNING:  getting item
current=(171493,7)  LOG:  entryLoadMoreItems, 172058/0, skip: 1   WARNING:  getting item current=(116833,2)   WARNING:
gettingitem current=(116833,3)   WARNING:  getting item current=(116833,4)   WARNING:  getting item current=(116833,5)
 
   ... and repeat

=====================================================================

Not sure what went wrong, though - I suspect it does not set the
isFinished flag or something like that, but I don't know where/when
should that happen.

This is rather easy to reproduce - download the dump I already provided
two weeks ago [http://www.fuzzy.cz/tmp/message-b.data.gz] and load it
into a simple table:
  CREATE TABLE msgs (body tsvector);  COPY msgs FROM '/tmp/message-b.data';  CREATE INDEX msgidx ON msgs USING
gin(body); ANALYZE msgs;
 

And then run this query:
 SELECT body FROM msgs WHERE body @@ plainto_tsquery('english','string | x')   AND body @@
plainto_tsquery('english','versions| equivalent')   AND body @@ plainto_tsquery('english','usually | contain');
 

It should run infinitely. I suspect it's not perfectly stable, i.e, the
this query may work fine / another one will block. In that case try to
run this [http://www.fuzzy.cz/tmp/random-queries.sql] - it's a file with
1000 generated queries, at least one of them should block (that's how I
discovered the issue).

regards
Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Questionable coding in orderedsetaggs.c