Re: [PERFORM] query of partitioned object doesnt use index in qa
| От | Mike Broers | 
|---|---|
| Тема | Re: [PERFORM] query of partitioned object doesnt use index in qa | 
| Дата | |
| Msg-id | CAB9893gy7GC3S-_4raZ6b=Mpxn+j37X8j9hacwiicENsEOTojw@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: [PERFORM] query of partitioned object doesnt use index in qa (David Rowley <david.rowley@2ndquadrant.com>) | 
| Список | pgsql-performance | 
Very helpful thank you for the additional insight - I'd never checked into pg_stats and that does reveal a difference in the distribution of the validation_status_code between qa and production:
prod:
│ most_common_vals       │ {P,F}                  │
│ most_common_freqs      │ {0.925967,0.000933333} │
│ histogram_bounds       │ ❏                      │
│ correlation            │ 0.995533               │
qa:
│ most_common_vals │ {P} │ 
│ most_common_freqs │ {0.861633} │
│ histogram_bounds │ ❏ │
│ correlation │ 0.999961 │
│ most_common_freqs │ {0.861633} │
│ histogram_bounds │ ❏ │
│ correlation │ 0.999961 │
so the way I am reading this is that there is likely no sensible way to avoid postgres thinking it will just have to scan the whole table because of these statistics.  I can force it by setting session parameters for this particular query but I probably shouldnt be looking at system settings to brutally force random fetches.
thanks again for the assistance!
On Wed, Sep 20, 2017 at 6:05 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 21 September 2017 at 04:15, Mike Broers <mbroers@gmail.com> wrote:
> Ultimately I think this is just highlighting the need in my environment to
> set random_page_cost lower (we are on an SSD SAN anyway..), but I dont think
> I have a satisfactory reason by the row estimates are so bad in the QA
> planner and why it doesnt use that partition index there.
Without the index there are no stats to allow the planner to perform a
good estimate on "e.body->>'SID' is not null", so it applies a default
of 99.5%. So, as a simple example, if you have a partition with 1
million rows. If you apply 99.5% to that you get 995000 rows. Now if
you add the selectivity for "e.validation_status_code = 'P' ", let's
say that's 50%, the row estimate for the entire WHERE clause would be
497500 (1000000 * 0.995 * 0.5). Since the 99.5% is applied in both
cases, then the only variable part is validation_status_code. Perhaps
validation_status_code = 'P' is much more common in QA than in
production.
You can look at the stats as gathered by ANALYZE with:
\x on
select * from pg_stats where tablename = 'event__99999999' and attname
= 'validation_status_code';
\x off
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-performance по дате отправления: