Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id 52E42AD0.5060600@vmware.com
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: GIN improvements part2: fast scan  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On 01/25/2014 09:44 PM, Tomas Vondra wrote:
> This is rather easy to reproduce - download the dump I already provided
> two weeks ago [http://www.fuzzy.cz/tmp/message-b.data.gz] and load it
> into a simple table:
>
>     CREATE TABLE msgs (body tsvector);
>     COPY msgs FROM '/tmp/message-b.data';
>     CREATE INDEX msgidx ON msgs USING gin(body);
>     ANALYZE msgs;
>
> And then run this query:
>
>    SELECT body FROM msgs
>    WHERE body @@ plainto_tsquery('english','string | x')
>      AND body @@ plainto_tsquery('english','versions | equivalent')
>      AND body @@ plainto_tsquery('english','usually | contain');
>
> It should run infinitely. I suspect it's not perfectly stable, i.e, the
> this query may work fine / another one will block. In that case try to
> run this [http://www.fuzzy.cz/tmp/random-queries.sql] - it's a file with
> 1000 generated queries, at least one of them should block (that's how I
> discovered the issue).

Thanks, that's a great test suite! Indeed, it did get stuck for me as well.

I tracked it down to a logic bug in entryGetItem; an && should've been
||. Also, it tickled an assertion in the debug LOG statement that
bleated "entryLoadMoreItems, %u/%u, skip: %d" (I was using
ItemPointerGetBlockNumber, which contains a check that the argument is a
valid item pointer, which it isn't always in this case). Fixed that too.

Attached is a new version of the patch set, with those bugs fixed.

One interesting detail that I noticed while testing that query:

Using EXPLAIN (BUFFERS) shows that we're actually accessing *more* pages
with the patches than without it. The culprit is patch #3, which makes
us re-descend the posting tree from root, rather than just stepping
right from current page. I was very surprised by that at first - the
patch was supposed to *reduce* the number of page's accessed, by not
having to walk through all the leaf pages. But in this case, even when
you're skipping some items, almost always the next item you're
interested in is on the next posting tree page, so re-descending the
tree is a waste and you land on the right sibling of the original page
anyway.

It's not a big waste, though. The upper levels of the tree are almost
certainly in cache, so it's just a matter of some extra lw-locking and
binary searching, which is cheap compared to actually decoding and
copying all the items from the correct page. Indeed, I couldn't see any
meaningful difference in query time with or without the patch. (I'm sure
a different query that allows skipping more would show the patch to be a
win - this was a worst case scenario)

Alexander's patch contained a more complicated method for re-finding the
right leaf page. It ascended the tree back up the same path it was
originally descended, which is potentially faster if the tree is
many-levels deep, and you only skip forward a little. In practice,
posting trees are very compact, so to have a tree taller than 2 levels,
it must contain millions of items. A 4-level tree would be humongous. So
I don't think it's worth the complexity to try anything smarter than
just re-descending from root. However, that difference in implementation
shows up in EXPLAIN (BUFFERS) output; since Alexander's patch kept the
stack of upper pages pinned, ascending and descending the tree did not
increase the counters of pages accessed (I believe; I didn't actually
test it), while descending from root does.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: A minor correction in comment in heaptuple.c