Обсуждение: PostgreSQL does not choose my indexes well

Поиск
Список
Период
Сортировка

PostgreSQL does not choose my indexes well

От
Arcadio Ortega Reinoso
Дата:
Hi,

I don't understand why postgresql doesn't use clearly the most optimal 
index for a query in PLAN.

Can you help me?


create table public.tabla
(
     cod_tabla bigint not null,
     tabla varchar(31) not null,
     constraint pk_tabla primary key (cod_tabla)
);


create table public.entidad
(
     cod_entidad bigint not null,
     cod_tabla bigint not null,
     cod_entidad_tabla bigint not null,
     constraint pk_entidad primary key (cod_entidad),
     constraint fk_tabla_entidad foreign key (cod_tabla)
         references public.tabla (cod_tabla) match simple
         on update cascade
         on delete cascade
);


CREATE INDEX idx_tabla_entidad
     ON public.entidad USING btree
     (cod_tabla ASC NULLS LAST);

CREATE INDEX idx_entidad_tabla_4
     ON public.entidad USING btree
     (cod_entidad_tabla ASC NULLS LAST)
     INCLUDE(cod_entidad, cod_tabla, cod_entidad_tabla)
     WHERE cod_tabla::bigint = 4;




SELECT count(*) from entidad;
34.413.354

SELECT count(*) from entidad where cod_tabla = 4;
1.409.985



explain (analyze, buffers, format text) select * from entidad where 
cod_tabla = 4


Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41 
rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
   Index Cond: ((cod_tabla)::bigint = 4)
   Buffers: shared hit=12839
Planning Time: 0.158 ms
Execution Time: 311.828 ms



Why postgresql doesnt use the index idx_entidad_tabla_4?????

Thanks in advance





Re: PostgreSQL does not choose my indexes well

От
Thomas Kellerer
Дата:
> CREATE INDEX idx_tabla_entidad
>     ON public.entidad USING btree
>     (cod_tabla ASC NULLS LAST);
>
> CREATE INDEX idx_entidad_tabla_4
>     ON public.entidad USING btree
>     (cod_entidad_tabla ASC NULLS LAST)
>     INCLUDE(cod_entidad, cod_tabla, cod_entidad_tabla)
>     WHERE cod_tabla::bigint = 4;
>
>
> SELECT count(*) from entidad;
> 34.413.354
>
> SELECT count(*) from entidad where cod_tabla = 4;
> 1.409.985
>
>
> explain (analyze, buffers, format text) select * from entidad where cod_tabla = 4
> Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41 rows=1405216 width=20) (actual time=0.037..242.609
rows=1409985loops=1) 
>   Index Cond: ((cod_tabla)::bigint = 4)
>   Buffers: shared hit=12839
> Planning Time: 0.158 ms
> Execution Time: 311.828 ms
>
>
> Why postgresql doesnt use the index idx_entidad_tabla_4?????

Because that index does not contain the column from the WHERE clause as an "indexed" column (only as an included
column).
Plus: scanning idx_tabla_entidad is more efficient because that index is smaller.

What do you think that idx_entidad_tabla_4 would be the better choice?

Thomas




Re: PostgreSQL does not choose my indexes well

От
"David G. Johnston"
Дата:
On Thursday, April 23, 2020, Thomas Kellerer <shammat@gmx.net> wrote:
> CREATE INDEX idx_tabla_entidad
>     ON public.entidad USING btree
>     (cod_tabla ASC NULLS LAST);
>
> CREATE INDEX idx_entidad_tabla_4
>     ON public.entidad USING btree
>     (cod_entidad_tabla ASC NULLS LAST)
>     INCLUDE(cod_entidad, cod_tabla, cod_entidad_tabla)
>     WHERE cod_tabla::bigint = 4;
>
>
> SELECT count(*) from entidad;
> 34.413.354
>
> SELECT count(*) from entidad where cod_tabla = 4;
> 1.409.985
>
>
> explain (analyze, buffers, format text) select * from entidad where cod_tabla = 4
> Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41 rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
>   Index Cond: ((cod_tabla)::bigint = 4)
>   Buffers: shared hit=12839
> Planning Time: 0.158 ms
> Execution Time: 311.828 ms
>
>
> Why postgresql doesnt use the index idx_entidad_tabla_4?????

Because that index does not contain the column from the WHERE clause as an "indexed" column (only as an included column).

But it does match the partials index’s predicate
 
Plus: scanning idx_tabla_entidad is more efficient because that index is smaller.

Really?  The absence of 33 million rows in the partial index seems like it would compensate fully and then some for the extra included columns.
 
David J.

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thursday, April 23, 2020, Thomas Kellerer <shammat@gmx.net> wrote:
>> Plus: scanning idx_tabla_entidad is more efficient because that index is
>> smaller.

> Really?  The absence of 33 million rows in the partial index seems like it
> would compensate fully and then some for the extra included columns.

On the other hand, an indexscan is likely to end up being effectively
random-access rather than the purely sequential access involved in
a seqscan.  (If the index was built recently, then it might not be
so bad --- but the planner doesn't know that, so it assumes that the
index leaf pages are laid out pretty randomly.)  Moreover, unless the
table is mostly marked all-visible, there will be another pile of
randomized accesses into the heap to validate visibility of the index
entries.

Bottom line is that this choice is not nearly as open-and-shut as
the OP seems to think.  In fact, it's fairly likely that this is a
badly designed index, not a well-designed one that the planner is
unaccountably failing to use.  Both covering indexes and partial
indexes are easily-misused features that can make performance worse
not better.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
Michael Lewis
Дата:
"unless the table is mostly marked all-visible"

Is that taken into account during planning when evaluating index scan vs sequential scan?

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
Michael Lewis <mlewis@entrata.com> writes:
>> "unless the table is mostly marked all-visible"

> Is that taken into account during planning when evaluating index scan vs
> sequential scan?

It is, although the planner's estimate is based on what the last ANALYZE
saw, which might be out-of-date.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
"David G. Johnston"
Дата:
On Thu, Apr 23, 2020 at 8:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thursday, April 23, 2020, Thomas Kellerer <shammat@gmx.net> wrote:
>> Plus: scanning idx_tabla_entidad is more efficient because that index is
>> smaller.

> Really?  The absence of 33 million rows in the partial index seems like it
> would compensate fully and then some for the extra included columns.

On the other hand, an indexscan is likely to end up being effectively
random-access rather than the purely sequential access involved in
a seqscan.

I feel like I'm missing something as the OP's query is choosing indexscan - just it is choosing to scan the full index containing the searched upon field instead of a partial index that doesn't contain the field but whose predicate matches the where condition - in furtherance of a count(*) computation where the columns don't really matter.

I do get "its going to perform 1.4 million random index entries and heap lookup anyway - so it doesn't really matter" - but the first answer was "the full index is smaller than the partial" which goes against my intuition.

The sequential scan that isn't being used would have to touch 25x the number of records - so its non-preference seems reasonable.

David J.

Re: PostgreSQL does not choose my indexes well

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Thursday, April 23, 2020, Thomas Kellerer <shammat@gmx.net> wrote:
> >> Plus: scanning idx_tabla_entidad is more efficient because that index is
> >> smaller.
>
> > Really?  The absence of 33 million rows in the partial index seems like it
> > would compensate fully and then some for the extra included columns.
>
> On the other hand, an indexscan is likely to end up being effectively
> random-access rather than the purely sequential access involved in
> a seqscan.

An indexscan is what was chosen though, so this doesn't really seem to
be a question of index scan vs. seq scan, it's a question of why one
index vs. another, though it seems a bit odd that we'd pick a regular
index scan instead of a BitmapHeap/Index scan.

> (If the index was built recently, then it might not be
> so bad --- but the planner doesn't know that, so it assumes that the
> index leaf pages are laid out pretty randomly.)  Moreover, unless the
> table is mostly marked all-visible, there will be another pile of
> randomized accesses into the heap to validate visibility of the index
> entries.

If the table *is* marked all visible, though, then certainly that index
will be better, and I think that's what a lot of this is coming down to
in this particular case.

Populating the tables provided based on the minimal info we got,
minimizing the numbers of pages that 'cod_tabla=4' is on:

insert into tabla select generate_series, 'abcdef' from generate_series(1,20);
insert into entidad select generate_series, 4, generate_series+1 from generate_series(1,1409985);
insert into entidad select generate_series+1409985, generate_series % 20 + 1, generate_series+1 from
generate_series(1,34413354)where generate_series % 20 + 1 <> 4; 
vacuum analyze entidad;

With this, the table is 1.7GB, idx_tabla_entidad is about 700MB, while
idx_entidad_tabla_4 is only 81MB.

With this, on v12-HEAD, PG will happily use the partial index:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                     QUERY PLAN
                             

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..55375.20 rows=1422085 width=24) (actual
time=0.050..144.745rows=1409985 loops=1) 
   Heap Fetches: 0
   Buffers: shared hit=8497
 Planning Time: 0.338 ms
 Execution Time: 183.081 ms
(5 rows)

Dropping that index and then running it again shows:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on entidad  (cost=26641.72..608515.59 rows=1422085 width=24) (actual time=102.844..242.522
rows=1409985loops=1) 
   Recheck Cond: (cod_tabla = 4)
   Heap Blocks: exact=8981
   Buffers: shared read=12838
   ->  Bitmap Index Scan on idx_tabla_entidad  (cost=0.00..26286.20 rows=1422085 width=0) (actual time=101.969..101.969
rows=1409985loops=1) 
         Index Cond: (cod_tabla = 4)
         Buffers: shared read=3857
 Planning Time: 0.264 ms
 Execution Time: 277.854 ms
(9 rows)

If we spread out where the 'cod_tabla=4' tuples are, the partial index
is still used (note that we end up with more like 1.7M tuples instead of
1.4M, but I don't think that's terribly relevant):

truncate entidad;
insert into entidad select generate_series, generate_series % 20 + 1, generate_series+1 from
generate_series(1,34413354);

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                     QUERY PLAN
                             

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..65231.31 rows=1664459 width=24) (actual
time=0.036..185.171rows=1720668 loops=1) 
   Heap Fetches: 0
   Buffers: shared hit=10375
 Planning Time: 0.247 ms
 Execution Time: 233.205 ms
(5 rows)

Things get a lot worse when we drop that partial index:

drop index idx_entidad_tabla_4;

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on entidad  (cost=30088.12..270087.86 rows=1664459 width=24) (actual time=163.418..1465.733
rows=1720668loops=1) 
   Recheck Cond: (cod_tabla = 4)
   Heap Blocks: exact=219194
   Buffers: shared read=223609
   ->  Bitmap Index Scan on idx_tabla_entidad  (cost=0.00..29672.01 rows=1664459 width=0) (actual time=128.544..128.544
rows=1720668loops=1) 
         Index Cond: (cod_tabla = 4)
         Buffers: shared read=4415
 Planning Time: 0.094 ms
 Execution Time: 1515.066 ms
(9 rows)

To get the kind of plan that the OP got, I dropped random_page_cost to 1.0:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                   QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_tabla_entidad on entidad  (cost=0.56..251946.06 rows=1664459 width=24) (actual
time=0.216..1236.371rows=1720668 loops=1) 
   Index Cond: (cod_tabla = 4)
   Buffers: shared read=223609
 Planning Time: 0.192 ms
 Execution Time: 1283.460 ms
(5 rows)

Even in that case though, when I recreate the partial index:

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                     QUERY PLAN
                             

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..35033.31 rows=1664459 width=24) (actual
time=0.039..211.644rows=1720668 loops=1) 
   Heap Fetches: 0
   Buffers: shared hit=7 read=10368
 Planning Time: 0.144 ms
 Execution Time: 256.304 ms
(5 rows)

So it's not really clear what's happening in the OP's case, we'd really
need more information to figure it out, it seems to me.

> Bottom line is that this choice is not nearly as open-and-shut as
> the OP seems to think.  In fact, it's fairly likely that this is a
> badly designed index, not a well-designed one that the planner is
> unaccountably failing to use.  Both covering indexes and partial
> indexes are easily-misused features that can make performance worse
> not better.

While I agree they can be mis-used, and that it's not open-and-shut,
it's not really clear to me what's going on that's causing us to avoid
that partial index in the OP's case when we'll certainly use it in
general.  The partial index in this particular case seems like it'd be
perfectly well suited to this query and that we should be using it (as
we are in the tests I did above).

I do wonder if we are maybe missing a bet at times though, considering
that I'm pretty sure we'll always go through the index in order, and
therefore randomly, even when we don't actually need the results in
order..?  Has there been much consideration for just opening an index
and sequentially scanning it in cases like this where we have to go
through all of the index anyway and don't need the results in order?  I
get that we'd still have to consider random access costs if the VM is
out of date, but if it's not, I would think we could give such an
approach a lower cost as we'd be going through the index sequentially
instead of the normal random access that we do.

Thanks,

Stephen

Вложения

Re: PostgreSQL does not choose my indexes well

От
Stephen Frost
Дата:
Greetings,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Thu, Apr 23, 2020 at 8:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Thursday, April 23, 2020, Thomas Kellerer <shammat@gmx.net> wrote:
> > >> Plus: scanning idx_tabla_entidad is more efficient because that index is
> > >> smaller.
> >
> > > Really?  The absence of 33 million rows in the partial index seems like
> > it
> > > would compensate fully and then some for the extra included columns.
> >
> > On the other hand, an indexscan is likely to end up being effectively
> > random-access rather than the purely sequential access involved in
> > a seqscan.
>
> I feel like I'm missing something as the OP's query is choosing indexscan -
> just it is choosing to scan the full index containing the searched upon
> field instead of a partial index that doesn't contain the field but whose
> predicate matches the where condition - in furtherance of a count(*)
> computation where the columns don't really matter.

The actual query isn't a count(*) though, it's a 'select *'.

> I do get "its going to perform 1.4 million random index entries and heap
> lookup anyway - so it doesn't really matter" - but the first answer was
> "the full index is smaller than the partial" which goes against my
> intuition.

Yeah, I'm pretty sure the full index is quite a bit bigger than the
partial index- see my note from just a moment ago.

> The sequential scan that isn't being used would have to touch 25x the
> number of records - so its non-preference seems reasonable.

Agreed on that.

Thanks,

Stephen

Вложения

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I do wonder if we are maybe missing a bet at times though, considering
> that I'm pretty sure we'll always go through the index in order, and
> therefore randomly, even when we don't actually need the results in
> order..?  Has there been much consideration for just opening an index
> and sequentially scanning it in cases like this where we have to go
> through all of the index anyway and don't need the results in order?

As I recall, it's unsafe to do so because of consistency considerations,
specifically there's a risk of missing or double-visiting some entries due
to concurrent index page splits.  VACUUM has some way around that, but it
doesn't work for regular data-fetching cases.  (nbtree/README has more
about this, but I don't feel like looking it up for you.)

My guess based on your results is that the OP's table *isn't* all-visible,
or at least the planner doesn't know it is.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I do wonder if we are maybe missing a bet at times though, considering
> > that I'm pretty sure we'll always go through the index in order, and
> > therefore randomly, even when we don't actually need the results in
> > order..?  Has there been much consideration for just opening an index
> > and sequentially scanning it in cases like this where we have to go
> > through all of the index anyway and don't need the results in order?
>
> As I recall, it's unsafe to do so because of consistency considerations,
> specifically there's a risk of missing or double-visiting some entries due
> to concurrent index page splits.  VACUUM has some way around that, but it
> doesn't work for regular data-fetching cases.  (nbtree/README has more
> about this, but I don't feel like looking it up for you.)

That README isn't exactly small, but the mention of VACUUM having a
trick there helped me find this:

-------
The tricky part of this is to avoid missing any deletable tuples in the
presence of concurrent page splits: a page split could easily move some
tuples from a page not yet passed over by the sequential scan to a
lower-numbered page already passed over.  (This wasn't a concern for the
index-order scan, because splits always split right.)  To implement this,
we provide a "vacuum cycle ID" mechanism that makes it possible to
determine whether a page has been split since the current btbulkdelete
cycle started.  If btbulkdelete finds a page that has been split since
it started, and has a right-link pointing to a lower page number, then
it temporarily suspends its sequential scan and visits that page instead.
It must continue to follow right-links and vacuum dead tuples until
reaching a page that either hasn't been split since btbulkdelete started,
or is above the location of the outer sequential scan.  Then it can resume
the sequential scan.  This ensures that all tuples are visited.
-------

So the issue is with a page split happening and a tuple being moved to
an earlier leaf page, resulting in us potentially not seeing it even
though we should have during a sequential scan.  The trick that VACUUM
does seems pretty involved and would be more complicated for use for
this as it's not ok to return the same tuples multiple times (though
perhaps in a BitmapIndexScan we could handle that..).  Then again, maybe
the skipping scan mechanism that's been talked about recently would let
us avoid having to scan the entire index even in cases where the
conditional doesn't include the initial index columns, since it looks
like that might be what we're doing now.

> My guess based on your results is that the OP's table *isn't* all-visible,
> or at least the planner doesn't know it is.

Hrmpf, even then I seem to end up with an IndexOnlyScan-

=# select * from pg_visibility_map('entidad') where all_visible;
blkno | all_visible | all_frozen
-------+-------------+------------
(0 rows)

analyze entidad;

=# select relallvisible from pg_class where relname = 'entidad';
 relallvisible
---------------
             0
(1 row)

=# explain (analyze, buffers) select * from entidad where cod_tabla = 4;
                                                                      QUERY PLAN
                               

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_entidad_tabla_4 on entidad  (cost=0.43..170908.14 rows=1657114 width=24) (actual
time=0.312..3511.629rows=1720668 loops=1) 
   Heap Fetches: 3441336
   Buffers: shared hit=6444271 read=469499
 Planning Time: 2.831 ms
 Execution Time: 3563.413 ms
(5 rows)

I'm pretty suspicious that they've made some odd planner configuration
changes or something along those lines to end up with the plan they got,
or there's some reason we don't think we can use the partial index.

Thanks,

Stephen

Вложения

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I'm pretty suspicious that they've made some odd planner configuration
> changes or something along those lines to end up with the plan they got,
> or there's some reason we don't think we can use the partial index.

Yeah, the latter is definitely a possibility.  I find the apparently
unnecessary cast in the partial-index predicate to be suspicious ---
maybe that's blocking matching to the WHERE clause?  In principle
the useless cast should have gotten thrown away, but maybe what we
were shown isn't quite exactly the real DDL.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
"David G. Johnston"
Дата:
On Thu, Apr 23, 2020 at 1:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I find the apparently
unnecessary cast in the partial-index predicate to be suspicious ---
maybe that's blocking matching to the WHERE clause?

I noticed that too...I suspect its related to the ANALYZE result:

Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
   Index Cond: ((cod_tabla)::bigint = 4)

Since the index condition ended up cast to bigint the OP probably wrote the predicate to match.

David J.

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I noticed that too...I suspect its related to the ANALYZE result:

> Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
> rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
>    Index Cond: ((cod_tabla)::bigint = 4)

Yeah, that *strongly* suggests that cod_tabla isn't really bigint.
I'm wondering about domains, for instance.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
Arcadio Ortega Reinoso
Дата:
El 23/4/20 a las 22:45, Tom Lane escribió:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> I noticed that too...I suspect its related to the ANALYZE result:
>> Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
>> rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
>>     Index Cond: ((cod_tabla)::bigint = 4)
> Yeah, that *strongly* suggests that cod_tabla isn't really bigint.
> I'm wondering about domains, for instance.
>
>             regards, tom lane
>
>
Actually

CREATE DOMAIN cod_pk AS bigint;

create table public.tabla
(
     cod_tabla cod_pk not null,
     tabla varchar(31) not null,
     constraint pk_tabla primary key (cod_tabla)
);


Do you think is important?


Thank you very much to all






Re: PostgreSQL does not choose my indexes well

От
Jeff Janes
Дата:
On Thu, Apr 23, 2020 at 7:36 AM Arcadio Ortega Reinoso <arcadio.ortega@gmail.com> wrote:
explain (analyze, buffers, format text) select * from entidad where
cod_tabla = 4


Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
   Index Cond: ((cod_tabla)::bigint = 4)
   Buffers: shared hit=12839
Planning Time: 0.158 ms
Execution Time: 311.828 ms

In order to read 1409985 / 12839 = 109 rows per buffer page, the table must be extraordinarily well clustered on this index.  That degree of clustering is going to steal much of the thunder from the index-only scan.  But in my hands, it does still prefer the partial index with index-only scan by a cost estimate ratio of 3 to 1 (despite it actually being slightly slower) so I don't know why you don't get it being used.

This was how I populated the table:

insert into entidad select id, floor(random()*25)::int, floor(random()*10000000)::int from generate_series(1,34000000) f(id);
cluster entidad USING idx_tabla_entidad ;

0.3 seconds for 1.4 million rows is pretty good.  How much better are you hoping to get by micro-managing the planner?

To figure it out, it might help to see the  explain (analyze, buffers, format text) of the plan you want it to use.  But the only way I see to do that is to drop the other index.

If you don't want to "really" drop the index, you can drop it in a transaction, run the "explain (analyze, buffers, format text)" query, and rollback the transaction.  (Note this will lock the table for the entire duration of the transaction, so it is not something to do cavalierly in production)

Cheers,

Jeff

Re: PostgreSQL does not choose my indexes well

От
Stephen Frost
Дата:
Greetings,

* Jeff Janes (jeff.janes@gmail.com) wrote:
> In order to read 1409985 / 12839 = 109 rows per buffer page, the table must
> be extraordinarily well clustered on this index.  That degree of clustering
> is going to steal much of the thunder from the index-only scan.  But in my
> hands, it does still prefer the partial index with index-only scan by a
> cost estimate ratio of 3 to 1 (despite it actually being slightly slower)
> so I don't know why you don't get it being used.

Turns out to be because what was provided wasn't actually what was being
used- there's a domain in there and that seems to gum up the works and
make it so we don't consider the partial index as being something we can
use (see the discussion at the end of the other sub-thread).

Thanks,

Stephen

Вложения

Re: PostgreSQL does not choose my indexes well

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Turns out to be because what was provided wasn't actually what was being
> used- there's a domain in there and that seems to gum up the works and
> make it so we don't consider the partial index as being something we can
> use (see the discussion at the end of the other sub-thread).

Some simple experiments here don't find that a domain-type column prevents
use of the partial index.  So it's still not entirely clear what's
happening for the OP.  I concur with Jeff's suggestion to try forcing
use of the desired index, and see whether it happens at all and what
the cost estimate is.

I'm also wondering exactly which Postgres version this is.

            regards, tom lane



Re: PostgreSQL does not choose my indexes well

От
Jeff Janes
Дата:


On Fri, Apr 24, 2020 at 2:33 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Jeff Janes (jeff.janes@gmail.com) wrote:
> In order to read 1409985 / 12839 = 109 rows per buffer page, the table must
> be extraordinarily well clustered on this index.  That degree of clustering
> is going to steal much of the thunder from the index-only scan.  But in my
> hands, it does still prefer the partial index with index-only scan by a
> cost estimate ratio of 3 to 1 (despite it actually being slightly slower)
> so I don't know why you don't get it being used.

Turns out to be because what was provided wasn't actually what was being
used- there's a domain in there and that seems to gum up the works and
make it so we don't consider the partial index as being something we can
use (see the discussion at the end of the other sub-thread).


Thanks.  I somehow managed to overlook the existence of the entire last 24 hours of discussion.  But if I change the type of entidad.cod_tabla to
match the domain now shown in table.cod_table, I can still get the index only scan over the partial index.  Now the cost estimate has changed so it slightly prefers the other index instead (in agreement with the original report) but usage of the partial index-only can is still possible (e.g. if I drop the single column full-table index).  I don't understand why the domain changes the estimate without changing the execution, but it isn't something that is very important to me.  I'm more interested in the index only scan is not actually much if any faster.  Even if there is no IO benefit due to the clustering, I'd still expect there to be some CPU benefit of not jumping back and forth between index pages and heap pages, but iI don't know how much effort it is worth to put into that either.

Cheers,

Jeff

Re: PostgreSQL does not choose my indexes well

От
Stephen Frost
Дата:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Turns out to be because what was provided wasn't actually what was being
> > used- there's a domain in there and that seems to gum up the works and
> > make it so we don't consider the partial index as being something we can
> > use (see the discussion at the end of the other sub-thread).
>
> Some simple experiments here don't find that a domain-type column prevents
> use of the partial index.  So it's still not entirely clear what's
> happening for the OP.  I concur with Jeff's suggestion to try forcing
> use of the desired index, and see whether it happens at all and what
> the cost estimate is.

Once burned, twice shy, I suppose- considering we weren't given the
actual DDL the first round, I'm guessing there's other differences.

> I'm also wondering exactly which Postgres version this is.

Also a good question.

Thanks,

Stephen

Вложения

Re: PostgreSQL does not choose my indexes well

От
Arcadio Ortega Reinoso
Дата:
I'm also wondering exactly which Postgres version this is.

Also a good question.

Thanks,

Stephen

postgresql-12/bionic-pgdg,now 12.2-2.pgdg18.04+1 amd64 [instalado]
postgresql-client-12/bionic-pgdg,now 12.2-2.pgdg18.04+1 amd64 
[instalado, automático]
postgresql-client-common/bionic-pgdg,bionic-pgdg,now 213.pgdg18.04+1 all 
[instalado, automático]
postgresql-common/bionic-pgdg,bionic-pgdg,now 213.pgdg18.04+1 all 
[instalado, automático]
postgresql-doc-11/bionic-pgdg,bionic-pgdg,now 11.7-2.pgdg18.04+1 all 
[instalado]



Вложения