Re: [PERFORM] query of partitioned object doesnt use index in qa

Поиск
Список
Период
Сортировка
От Mike Broers
Тема Re: [PERFORM] query of partitioned object doesnt use index in qa
Дата
Msg-id CAB9893gy7GC3S-_4raZ6b=Mpxn+j37X8j9hacwiicENsEOTojw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] query of partitioned object doesnt use index in qa  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-performance
Very helpful thank you for the additional insight - I'd never checked into pg_stats and that does reveal a difference in the distribution of the validation_status_code between qa and production:

prod:
│ most_common_vals       │ {P,F}                  │
│ most_common_freqs      │ {0.925967,0.000933333} │
│ histogram_bounds       │ ❏                      │
│ correlation            │ 0.995533               │

qa:
│ most_common_vals │ {P} │ 
│ most_common_freqs │ {0.861633} │ 
│ histogram_bounds │ ❏ │ 
│ correlation │ 0.999961 │ 

so the way I am reading this is that there is likely no sensible way to avoid postgres thinking it will just have to scan the whole table because of these statistics.  I can force it by setting session parameters for this particular query but I probably shouldnt be looking at system settings to brutally force random fetches.

thanks again for the assistance!



On Wed, Sep 20, 2017 at 6:05 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 21 September 2017 at 04:15, Mike Broers <mbroers@gmail.com> wrote:
> Ultimately I think this is just highlighting the need in my environment to
> set random_page_cost lower (we are on an SSD SAN anyway..), but I dont think
> I have a satisfactory reason by the row estimates are so bad in the QA
> planner and why it doesnt use that partition index there.

Without the index there are no stats to allow the planner to perform a
good estimate on "e.body->>'SID' is not null", so it applies a default
of 99.5%. So, as a simple example, if you have a partition with 1
million rows. If you apply 99.5% to that you get 995000 rows. Now if
you add the selectivity for "e.validation_status_code = 'P' ", let's
say that's 50%, the row estimate for the entire WHERE clause would be
497500 (1000000 * 0.995 * 0.5). Since the 99.5% is applied in both
cases, then the only variable part is validation_status_code. Perhaps
validation_status_code  = 'P' is much more common in QA than in
production.

You can look at the stats as gathered by ANALYZE with:

\x on
select * from pg_stats where tablename = 'event__99999999' and attname
= 'validation_status_code';
\x off

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tobias Gierke
Дата:
Сообщение: [PERFORM] Parallel sequential scan not supported for stored procedure withRETURN QUERY EXECUTE ?
Следующее
От: Subramaniam C
Дата:
Сообщение: [PERFORM] Slow query in JDBC