Обсуждение: [PERFORM] GIN index not used if created in the same transaction as query

Поиск
Список
Период
Сортировка

[PERFORM] GIN index not used if created in the same transaction as query

От
Adam Brusselback
Дата:
Hey all, first off, i'm running: PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

At the high level, I am having an issue with a query not using an index, and in a very hard to reproduce way.

I have a function which builds two temp tables, fills each with data (in multiple steps), creates a gin index on one of the tables, analyzes each table, then runs a query joining the two.

My issue is, I am getting inconsistent results for if the query will use the index or not (with the exact same data each time, and no differences in the stats stored on the table between using the index or not).

If I just run the function, it will never use the index and the query will not finish.

If I pull the queries out of the function and run them manually, it will often use the index, but sometimes it won't, I can't make any sense of why/when it will use it vs not using it.

I tried to create a test case I could attach to this email by just saving the results of the temp tables, pg_dumping them, and creating a script to re-create the temp tables with that data and continue on with the index creation / analyze / query... but when I try that it runs perfectly (using the index) every time.

I've attached the test case, because it contains all the schema and query, regardless of if I can't make it reproducible.

Run the query_help_dump.sql first to populate regular tables, then within the query_help_test_case.sql I was attempting to replicate the same (very simplified) workflow that happens in my function, to no avail.


I cannot run an explain analyze on the query when it doesn't use the index, because it will not finish in a reasonable amount of time (let it run for 12 hours so far).

query without index: 
GroupAggregate  (cost=23622602.94..23622603.80 rows=43 width=48)
  Group Key: r.row_id
  ->  Sort  (cost=23622602.94..23622603.04 rows=43 width=20)
        Sort Key: r.row_id
        ->  Nested Loop  (cost=0.00..23622601.77 rows=43 width=20)
              Join Filter: ((r.delivery_date <@ con.date_range) AND (r.contractee_company_ids && con.contractee_company_id) AND ((r.distributor_company_ids && con.distributor_company_id) OR (con.distributor_company_id IS NULL)) AND (r.product_ids && con.product_id))
              ->  Seq Scan on _import_invoice_product_contract_match r  (cost=0.00..3525.52 rows=86752 width=145)
              ->  Materialize  (cost=0.00..874.50 rows=12100 width=542)
                    ->  Seq Scan on _contract_claim_match con  (cost=0.00..814.00 rows=12100 width=542)


query with index:
GroupAggregate  (cost=137639.13..137639.99 rows=43 width=48) (actual time=3944.309..4093.798 rows=57966 loops=1)
  Group Key: r.row_id
  ->  Sort  (cost=137639.13..137639.24 rows=43 width=20) (actual time=3944.280..3992.348 rows=145312 loops=1)
        Sort Key: r.row_id
        Sort Method: external merge  Disk: 4256kB
        ->  Nested Loop  (cost=0.02..137637.97 rows=43 width=20) (actual time=0.091..3701.039 rows=145312 loops=1)
              ->  Seq Scan on _import_invoice_product_contract_match r  (cost=0.00..3525.52 rows=86752 width=145) (actual time=0.011..46.663 rows=86752 loops=1)
              ->  Bitmap Heap Scan on _contract_claim_match con  (cost=0.02..1.54 rows=1 width=542) (actual time=0.033..0.040 rows=2 loops=86752)
                    Recheck Cond: ((r.contractee_company_ids && contractee_company_id) AND (r.product_ids && product_id))
                    Filter: ((r.delivery_date <@ date_range) AND ((r.distributor_company_ids && distributor_company_id) OR (distributor_company_id IS NULL)))
                    Rows Removed by Filter: 8
                    Heap Blocks: exact=793072
                    ->  Bitmap Index Scan on idx_tmp_contract_claim_match  (cost=0.00..0.02 rows=1 width=0) (actual time=0.023..0.023 rows=10 loops=86752)
                          Index Cond: ((r.contractee_company_ids && contractee_company_id) AND (r.product_ids && product_id))
Planning time: 0.804 ms
Execution time: 4106.043 ms

​​

Re: [PERFORM] GIN index not used if created in the same transaction as query

От
Tom Lane
Дата:
Adam Brusselback <adambrusselback@gmail.com> writes:
> I have a function which builds two temp tables, fills each with data (in
> multiple steps), creates a gin index on one of the tables, analyzes each
> table, then runs a query joining the two.
> My issue is, I am getting inconsistent results for if the query will use
> the index or not (with the exact same data each time, and no differences in
> the stats stored on the table between using the index or not).

Does the "multiple steps" part involve UPDATEs on pre-existing rows?
Do the updates change the column(s) used in the gin index?

What this sounds like is that you're getting "broken HOT chains" in which
there's not a unique indexable value among the updated versions of a given
row, so there's an interval in which the new index isn't usable for
queries.  If that's the correct diagnosis, what you need to do is create
the gin index before you start populating the table.  Fortunately, that
shouldn't create a really horrid performance penalty, because gin index
build isn't optimized all that much anyway compared to just inserting
the data serially.

            regards, tom lane


Re: [PERFORM] GIN index not used if created in the same transactionas query

От
Adam Brusselback
Дата:
Does the "multiple steps" part involve UPDATEs on pre-existing rows?
Do the updates change the column(s) used in the gin index?
 
 Yes they do, however the updates happen prior to the index creation.
 
I just tried, and that looks like the solution.  I really appreciate your help on this.

Is there any easy way I can know if an index is usable or not? Are there any catalog views or anything I could check that in?

Thanks,
-Adam

Re: [PERFORM] GIN index not used if created in the same transaction as query

От
Tom Lane
Дата:
Adam Brusselback <adambrusselback@gmail.com> writes:
> Is there any easy way I can know if an index is usable or not? Are there
> any catalog views or anything I could check that in?

IIRC, you can look at pg_index.indcheckxmin --- if that's set, then
the index had broken HOT chains during creation and may not be usable
right away.  Telling whether your own transaction can use it is harder
from SQL level, but if you're in the same transaction that made the
index then the answer is probably always "no" :-(

            regards, tom lane