Re: Accounting for between table correlation

Поиск
Список
Период
Сортировка
От Alexander Stoddard
Тема Re: Accounting for between table correlation
Дата
Msg-id CADDNc-B7fHboEt8NZk4PPcXL6KmEpY9qgXywPs656_h_FThBWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Accounting for between table correlation  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Accounting for between table correlation  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: Accounting for between table correlation  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general
Unfortunately I'm not free to share the specific schema or the query plans. They derive from an upstream vendor that is 'protective' of their data model. To get to a proper example I'll need to recreate the behavior with generic data in a generified schema.

I apologize for being frustratingly vague. I do feel like an idiot for not at least saying this was with version PG 11.10.
It has been beneficial to me that the replies so far appear to validate my understanding that no version of postgres has cross table correlation statistics in the planner.

Analyze is done frequently, and I think at sufficient sample size. The context is in a large data analysis setting and the data is changed via bulk ETL not OLTP. The effect on analyzing is just instability - the plan can flip in either direction (without underlying data changing) between giving an answer in minutes and timing out after 10s of hours. That could be indicative of too small a sample but I think in this case it is more the statistics aren't useful because it is cross table correlation that can't be accounted for.

The 'fast plans' use parallel seq scans. The 'slow plans' is using index scans. It appears a good query plan correctly predicts it should be bulk processing the tables but bad ones get fooled into trashing (hard disk, not SSD) by mispredicting too few rows to join between the tables. 

If anyone has similar experiences and is generous enough with their time to share possible solutions/work arounds then I'm most grateful. If my description is too vague to be worthwhile answering then I quite understand and apologize for the time wasted in reading.

Thank you.





On Fri, Jan 15, 2021 at 10:26 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Jan 15, 2021 at 9:10 AM Thomas Kellerer <shammat@gmx.net> wrote:
Atul Kumar schrieb am 15.01.2021 um 16:29:
> As per Ron, you are not supposed to ask your questions here.
>
> As According to him, we should keep on doing research on internet
> rather than asking for support directly even you have done enough
> research and until unless “Ron” won’t be satisfied you have to do
> keep on researching.

Ron's question was perfectly valid.

Missing and wrong statistics are one reason for the planner to choose a bad execution plan.

Yeah, at first blush I didn't think analyze really mattered (and it mostly doesn't because while you can keep the statistics up-to-date the multi-table nature of the problem means they are only marginally helpful here), but that just points out the under-specified nature of the original posting.  Taken as a simple question of "is there a way to work around the lack of multi-table statistics" the analyze, and even the specific queries, don't matter all that much.  But it also would be much more useful if the OP would choose a single problematic query and show the schema, query, and explain results, hopefully both good and bad, and comment on how analyze seems to affect the plan choice.  But for the general question about overcoming our statistics limitations the analyze point is not relevant.

David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: Atul Kumar
Дата:
Сообщение: Re: Best tools to monitor and fine tune postgres
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Accounting for between table correlation