Re: Accounting for between table correlation

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Accounting for between table correlation
Дата
Msg-id 65a6db71-9182-1f0a-02c7-39cd011b34dd@aklaver.com
обсуждение исходный текст
Ответ на Re: Accounting for between table correlation  (Alexander Stoddard <alexander.stoddard@gmail.com>)
Список pgsql-general
On 1/15/21 9:21 AM, Alexander Stoddard wrote:
> Unfortunately I'm not free to share the specific schema or the query 
> plans. They derive from an upstream vendor that is 'protective' of their 
> data model. To get to a proper example I'll need to recreate the 
> behavior with generic data in a generified schema.
> 
> I apologize for being frustratingly vague. I do feel like an idiot for 
> not at least saying this was with version PG 11.10.
> It has been beneficial to me that the replies so far appear to validate 
> my understanding that no version of postgres has cross table correlation 
> statistics in the planner.
> 
> Analyze is done frequently, and I think at sufficient sample size. The 
> context is in a large data analysis setting and the data is changed via 
> bulk ETL not OLTP. The effect on analyzing is just instability - the 
> plan can flip in either direction (without underlying data changing) 
> between giving an answer in minutes and timing out after 10s of hours. 
> That could be indicative of too small a sample but I think in this case 
> it is more the statistics aren't useful because it is cross table 
> correlation that can't be accounted for.

So to be clear, the process imports the data, then you run a query and 
it completes in x time, you then ANALYZE the same data and it runs in y 
time. Is that correct?

> 
> The 'fast plans' use parallel seq scans. The 'slow plans' is using index 
> scans. It appears a good query plan correctly predicts it should be bulk 
> processing the tables but bad ones get fooled into trashing (hard disk, 
> not SSD) by mispredicting too few rows to join between the tables.
> 
> If anyone has similar experiences and is generous enough with their time 
> to share possible solutions/work arounds then I'm most grateful. If my 
> description is too vague to be worthwhile answering then I quite 
> understand and apologize for the time wasted in reading.
> 
> Thank you.
> 
> 
> 
> 
> 
> On Fri, Jan 15, 2021 at 10:26 AM David G. Johnston 
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
> 
>     On Fri, Jan 15, 2021 at 9:10 AM Thomas Kellerer <shammat@gmx.net
>     <mailto:shammat@gmx.net>> wrote:
> 
>         Atul Kumar schrieb am 15.01.2021 um 16:29:
>          > As per Ron, you are not supposed to ask your questions here.
>          >
>          > As According to him, we should keep on doing research on internet
>          > rather than asking for support directly even you have done enough
>          > research and until unless “Ron” won’t be satisfied you have to do
>          > keep on researching.
> 
>         Ron's question was perfectly valid.
> 
>         Missing and wrong statistics are one reason for the planner to
>         choose a bad execution plan.
> 
> 
>     Yeah, at first blush I didn't think analyze really mattered (and it
>     mostly doesn't because while you can keep the statistics up-to-date
>     the multi-table nature of the problem means they are only marginally
>     helpful here), but that just points out the under-specified nature
>     of the original posting.  Taken as a simple question of "is there a
>     way to work around the lack of multi-table statistics" the analyze,
>     and even the specific queries, don't matter all that much.  But it
>     also would be much more useful if the OP would choose a single
>     problematic query and show the schema, query, and explain results,
>     hopefully both good and bad, and comment on how analyze seems to
>     affect the plan choice.  But for the general question about
>     overcoming our statistics limitations the analyze point is not relevant.
> 
>     David J.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



В списке pgsql-general по дате отправления:

Предыдущее
От: Alexander Stoddard
Дата:
Сообщение: Re: Accounting for between table correlation
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Best tools to monitor and fine tune postgres