Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Rod Taylor
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id CAKddOFBAp39whKbbYLvyK8sYOFXO_gkGGeFtD0rUVu=6pY18GQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
I checked out master and put together a test case using a small percentage of production data for a known problem we have with Pg 9.2 and text search scans.

A small percentage in this case means 10 million records randomly selected; has a few billion records.


Tests ran for master successfully and I recorded timings.



Applied the patch included here to master along with gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)

initdb, import dump


The GIN index fails to build with a segfault.

DETAIL:  Failed process was running: CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);


#0  XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn@entry=0x7fffcf341920, bkpb=bkpb@entry=0x7fffcf341960, rdata=0x468f11 <ginFindLeafPage+529>,
    rdata=0x468f11 <ginFindLeafPage+529>) at xlog.c:2339
#1  0x00000000004b9ddd in XLogInsert (rmid=rmid@entry=13 '\r', info=info@entry=16 '\020', rdata=rdata@entry=0x7fffcf341bf0) at xlog.c:936
#2  0x0000000000468a9e in createPostingTree (index=0x7fa4e8d31030, items=items@entry=0xfb55680, nitems=nitems@entry=762,
    buildStats=buildStats@entry=0x7fffcf343dd0) at gindatapage.c:1324
#3  0x00000000004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0, nitem=762, items=0xfb55680, category=<optimized out>, key=34078256,
    attnum=<optimized out>, ginstate=0x7fffcf341df0) at gininsert.c:281
#4  ginEntryInsert (ginstate=ginstate@entry=0x7fffcf341df0, attnum=<optimized out>, key=34078256, category=<optimized out>, items=0xfb55680, nitem=762,
    buildStats=buildStats@entry=0x7fffcf343dd0) at gininsert.c:351
#5  0x00000000004635b0 in ginbuild (fcinfo=<optimized out>) at gininsert.c:531
#6  0x0000000000718637 in OidFunctionCall3Coll (functionId=functionId@entry=2738, collation=collation@entry=0, arg1=arg1@entry=140346257507968,
    arg2=arg2@entry=140346257510448, arg3=arg3@entry=32826432) at fmgr.c:1649
#7  0x00000000004ce1da in index_build (heapRelation=heapRelation@entry=0x7fa4e8d30680, indexRelation=indexRelation@entry=0x7fa4e8d31030,
    indexInfo=indexInfo@entry=0x1f4e440, isprimary=isprimary@entry=0 '\000', isreindex=isreindex@entry=0 '\000') at index.c:1963
#8  0x00000000004ceeaa in index_create (heapRelation=heapRelation@entry=0x7fa4e8d30680,
    indexRelationName=indexRelationName@entry=0x1f4e660 "textsearch_gin_knn_idx", indexRelationId=16395, indexRelationId@entry=0,
    relFileNode=<optimized out>, indexInfo=indexInfo@entry=0x1f4e440, indexColNames=indexColNames@entry=0x1f4f728,
    accessMethodObjectId=accessMethodObjectId@entry=2742, tableSpaceId=tableSpaceId@entry=0, collationObjectId=collationObjectId@entry=0x1f4fcc8,
    classObjectId=classObjectId@entry=0x1f4fce0, coloptions=coloptions@entry=0x1f4fcf8, reloptions=reloptions@entry=0, isprimary=0 '\000',
    isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
    is_internal=0 '\000') at index.c:1082
#9  0x0000000000546a78 in DefineIndex (stmt=<optimized out>, indexRelationId=indexRelationId@entry=0, is_alter_table=is_alter_table@entry=0 '\000',
    check_rights=check_rights@entry=1 '\001', skip_build=skip_build@entry=0 '\000', quiet=quiet@entry=0 '\000') at indexcmds.c:594
#10 0x000000000065147e in ProcessUtilitySlow (parsetree=parsetree@entry=0x1f7fb68,
    queryString=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);", context=<optimized out>, params=params@entry=0x0, completionTag=completionTag@entry=0x7fffcf344c10 "", dest=<optimized out>) at utility.c:1163
#11 0x000000000065079e in standard_ProcessUtility (parsetree=0x1f7fb68, queryString=<optimized out>, context=<optimized out>, params=0x0,
    dest=<optimized out>, completionTag=0x7fffcf344c10 "") at utility.c:873
#12 0x000000000064de61 in PortalRunUtility (portal=portal@entry=0x1f4c350, utilityStmt=utilityStmt@entry=0x1f7fb68, isTopLevel=isTopLevel@entry=1 '\001',
    dest=dest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 "") at pquery.c:1187
#13 0x000000000064e9e5 in PortalRunMulti (portal=portal@entry=0x1f4c350, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08,
    altdest=altdest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 "") at pquery.c:1318
#14 0x000000000064f459 in PortalRun (portal=portal@entry=0x1f4c350, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
    dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 "") at pquery.c:816
#15 0x000000000064d2d5 in exec_simple_query (
    query_string=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);") at postgres.c:1048
#16 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1f2ad40, dbname=0x1f2abf8 "rbt", username=<optimized out>) at postgres.c:3992
#17 0x000000000045b1b4 in BackendRun (port=0x1f47280) at postmaster.c:4085
#18 BackendStartup (port=0x1f47280) at postmaster.c:3774
#19 ServerLoop () at postmaster.c:1585
#20 0x000000000060d031 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1f28b20) at postmaster.c:1240
#21 0x000000000045bb25 in main (argc=3, argv=0x1f28b20) at main.c:196



On Thu, Nov 14, 2013 at 12:26 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.

I'm going to mark this as "returned with feedback". For the next version, I'd like to see the API changed per above. Also, I'd like us to do something about the tidbitmap overhead, as a separate patch before this, so that we can assess the actual benefit of this patch. And a new test case that demonstrates the I/O benefits.

Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional information now.
2) New API with tri-state logic is introduced.

------
With best regards,
Alexander Korotkov.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Piotr Marcinczyk
Дата:
Сообщение: Re: Add \i option to bring in the specified file as a quoted literal
Следующее
От: Rod Taylor
Дата:
Сообщение: Re: GIN improvements part2: fast scan