Re: Statistics Import and Export
| От | Andres Freund | 
|---|---|
| Тема | Re: Statistics Import and Export | 
| Дата | |
| Msg-id | xe3yjkcthy7f377zjayuckxyd62z5cekgjau3j4vp6elbicson@cx2uiilwmxyy обсуждение исходный текст | 
| Ответ на | Re: Statistics Import and Export (Corey Huinker <corey.huinker@gmail.com>) | 
| Ответы | Re: Statistics Import and Export | 
| Список | pgsql-hackers | 
Hi, On 2025-03-05 22:00:42 -0500, Corey Huinker wrote: > On Wed, Mar 5, 2025 at 9:18 PM Andres Freund <andres@anarazel.de> wrote: > > On 2025-03-05 20:54:35 -0500, Corey Huinker wrote: > > > It's been considered and not ruled out, with a "let's see how the simple > > > thing works, first" approach. Considerations are: > > > > > > * pg_stats is keyed on schemaname + tablename (which can also be indexes) > > > and we need to use that because of the security barrier > > > > I don't think that has to be a big issue, you can just make the the query > > query multiple tables at once using an = ANY(ARRAY[]) expression or such. > > > > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you > suggesting we back the joins from pg_stats to pg_namespace and pg_class and > then filter by oids? I was thinking of one query per schema or something like that. But yea, a query to pg_namespace and pg_class wouldn't be a problem if we did it far fewer times than before. Or you could put the list of catalogs / tables to be queried into an unnest() with two arrays or such. Not sure how good the query plan for that would be, but it may be worth looking at. > > > * The stats data is kinda heavy (most common value lists, most common > > > elements lists, esp for high stattargets), which would be a considerable > > > memory impact and some of those stats might not even be needed (example, > > > index stats for a table that is filtered out) > > > > Doesn't the code currently have this problem already? Afaict the stats are > > currently all stored in memory inside pg_dump. > > > > Each call to getAttributeStats() fetches the pg_stats for one and only one > relation and then writes the SQL call to fout, then discards the result set > once all the attributes of the relation are done. I don't think that's true. For one my example demonstrated that it increases the peak memory usage substantially. That'd not be the case if the data was just written out to stdout or such. Looking at the code confirms that. The ArchiveEntry() in dumpRelationStats() is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which contains the "SELECT pg_restore_attribute_stats(...)". > I don't think the query itself would be a problem, a query querying all the > > required stats should probably use PQsetSingleRowMode() or > > PQsetChunkedRowsMode(). > > > That makes sense if we get the attribute stats from the result set in the > order that we need them, and I don't know how we could possibly do that. > We'd still need a table to bsearch() and that would be huge. I'm not following - what would be the problem with a bsearch()? Compared to the stats data an array to map from oid to an index in an array of stats data data would be very small. But with the unnest() idea from above it wouldn't even be needed, you could use SELECT ... FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname, tablename) ... ORDER BY ordinality or something along those lines. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: