Re: Analyzing foreign tables & memory problems
От | Albe Laurenz |
---|---|
Тема | Re: Analyzing foreign tables & memory problems |
Дата | |
Msg-id | D960CB61B694CF459DCFB4B0128514C2049FCE85@exadv11.host.magwien.gv.at обсуждение исходный текст |
Ответ на | Analyzing foreign tables & memory problems ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Ответы |
Re: Analyzing foreign tables & memory problems
("Albe Laurenz" <laurenz.albe@wien.gv.at>)
|
Список | pgsql-hackers |
Noah Misch wrote: >> During ANALYZE, in analyze.c, functions compute_minimal_stats >> and compute_scalar_stats, values whose length exceed >> WIDTH_THRESHOLD (= 1024) are not used for calculating statistics >> other than that they are counted as "too wide rows" and assumed >> to be all different. >> >> This works fine with regular tables; >> With foreign tables the situation is different. Even though >> values exceeding WIDTH_THRESHOLD won't get used, the complete >> rows will be fetched from the foreign table. This can easily >> exhaust maintenance_work_mem. >> I can think of two remedies: >> 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation >> so that the authors of foreign data wrappers are aware of the >> problem and can avoid it on their side. >> This would be quite simple. > Seems reasonable. How would the FDW return an indication that a value was > non-NULL but removed due to excess width? The FDW would return a value of length WIDTH_THRESHOLD+1 that is long enough to be recognized as too long, but not long enough to cause a problem. > Not all databases can cheaply filter out wide column values; by the time the > remote side has an exact width, the remote I/O damage may already be done. To > dodge that problem, when a column has "SET STATISTICS 0", the FDW should be > able to completely omit reading it. (I haven't studied the API needs, if any, > to make that possible.) Depending on the capabilities of the remote side, a FDW can do more or less intelligent things to avoid the problem. But it must know WIDTH_THRESHOLD. Disabling statistics for a column as a workaround is an interesting idea, but would be more work for the FDW writer and the user. >> 2) Instead of one callback that returns all sample rows, have >> a callback that just returns the next table row (or the next >> table row from a subset of the pages of the table in the >> internal case). This function could be called in a short-lived >> memory context. Vitter's algorithm for selecting a sample >> and the truncation of excessively long values would then be >> handled in analyze.c. >> This would avoid the problem completely and make it easier >> to write a foreign data wrapper. > This solves the (in your downthread example) 30 GiB of memory consumption, but > you'll still read 30 GiB on the remote side and ship it all over the network. > To call this fixed, we'll need something like (1) that lets the FDW limit > volume at the remote side. You are right. I guess the first idea is the more promising one. Yours, Laurenz Albe
В списке pgsql-hackers по дате отправления: