Re: Why does the query planner use two full indexes, when a dedicated partial index exists?

Поиск
Список
Период
Сортировка
От Richard Neill
Тема Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Дата
Msg-id 50D5EDDF.6010702@richardneill.org
обсуждение исходный текст
Ответ на Re: Why does the query planner use two full indexes, when a dedicated partial index exists?  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: Why does the query planner use two full indexes, when a dedicated partial index exists?  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
>     I've now installed 9.2. As you said, thanks to the change in 9.2 it
>     initially prefers the partial index.
>
>     BUT, after 1 cycle of inserting 500k rows, then deleting them all,
>     then starting to insert again, I find that the planner has reverted
>     to the former bad behaviour.
>
>
> Presumably the real work load has this type of turn over happen one row
> at a time, rather than all in one giant mass update transaction, right?
>   That makes a big difference in the way space is re-used.

Sorry - I meant a "real" workload here. I replayed a whole day's worth
of real data into the DB, and that's what I meant by a cycle. Everything
was row-at-a-time.
(It currently takes about an hour to do this)

>
>     Reindexing only takes a couple of seconds, and restores correctness.
>
>
> Even your slow query is pretty fast.  If you can't afford that, can you
> afford to take an exclusive lock for a couple of seconds every few minutes?

Yes, I can. If that's the root cause, I'll do that. But it seems to me
that I've stumbled upon some rather awkward behaviour that I need to
understand fully, and if the index is bloating that badly and that
quickly, then perhaps it's a PG bug (or at least cause for a logfile
warning).

BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today.
It still only has 252 matching rows.


>     What's going on? Do I need to run reindex in a cron-job? I thought
>     that reindex wasn't "normally" needed, and that index bloat happened
>     only after every row had changed value hundreds of times.
>
>
> The partial index is highly leveraged.  If every tuple in the table is
> updated once, that amounts to every tuple in the index being updated
> 25,000 times.

How so? That sounds like O(n_2) behaviour.


>
> For the same reason, it is probably not getting vacuum often enough.
>   The default settings have the table vacuumed once 20% of its rows
> turns over, but that means the partial index has been turned over many
> many times.  You could crank down the auto-vacuum settings for that
> table, or run manual vacuum with a cron job.
>
> Vacuum will not unbloat the index, but if you run it often enough it
> will keep the bloat from getting too bad in the first place.

Thanks. I've reduced  autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)

As I understand it, both of these can run in parallel, and I have 7
cores usually idle, while the other is maxed out.

> But what I think I'd do is change one of your full indexes to contain
> the other column as well, and get rid of the partial index.  It might
> not be quite as efficient as the partial index might theoretically be,
> but it should be pretty good and also be less fragile.

I'll try that.

Thanks,

Richard


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Slow queries after vacuum analyze
Следующее
От: Richard Neill
Дата:
Сообщение: Re: Why does the query planner use two full indexes, when a dedicated partial index exists?