Re: Statistics Import and Export

Поиск

Список

Период

Сортировка

От	Corey Huinker
Тема	Re: Statistics Import and Export
Дата	11 марта 2024 г. 20:08:05
Msg-id	CADkLM=eo4rmN8fFnxNycQtzTnsKwBxdXbAJMTvMC_A1_7HzRdA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Statistics Import and Export (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: Statistics Import and Export
Список	pgsql-hackers

Дерево обсуждения

> In which case we're failing nearly silently, yes, there is a null returned,
> but we have no idea why there is a null returned. If I were using this
> function manually I'd want to know what I did wrong, what parameter I
> skipped, etc.

I can see it both ways and don't feel super strongly about it ... I just
know that I've had some cases where we returned an ERROR or otherwise
were a bit noisy on NULL values getting passed into a function and it
was much more on the annoying side than on the helpful side; to the
point where we've gone back and pulled out ereport(ERROR) calls from
functions before because they were causing issues in otherwise pretty
reasonable queries (consider things like functions getting pushed down
to below WHERE clauses and such...).

I don't have strong feelings either. I think we should get more input on this. Regardless, it's easy to change...for now.

Sure. Not a huge deal either way, was just pointing out the difference.
I do think it'd be good to match what ANALYZE does here, so checking if
the values in pg_class are different and only updating if they are,
while keeping the code for pg_statistic where it'll just always update.

I agree that mirroring ANALYZE wherever possible is the ideal.

> I like the symmetry of a consistent interface, but we've already got an
> asymmetry in that the pg_class update is done non-transactionally (like
> ANALYZE does).

Don't know that I really consider that to be the same kind of thing when
it comes to talking about the interface as the other aspects we're
discussing ...

Fair.

> One persistent problem is that there is no _safe equivalent to ARRAY_IN, so
> that can always fail on us, though it should only do so if the string
> passed in wasn't a valid array input format, or the values in the array
> can't coerce to the attribute's basetype.

That would happen before we even get to being called and there's not
much to do about it anyway.

Not sure I follow you here. the ARRAY_IN function calls happen once for every non-null stavaluesN parameter, and it's done inside the function because the result type could be the base type for a domain/array type, or could be the type itself. I suppose we could move that determination to the caller, but then we'd need to call get_base_element_type() inside a client, and that seems wrong if it's even possible.

> I should also point out that we've lost the ability to check if the export
> values were of a type, and if the destination column is also of that type.
> That's a non-issue in binary upgrades, but of course if a field changed
> from integers to text the histograms would now be highly misleading.
> Thoughts on adding a typname parameter that the function uses as a cheap
> validity check?

Seems reasonable to me.

I'd like to hear what Tomas thinks about this, as he was the initial advocate for it.

> As for pg_dump, I'm currently leading toward the TOC entry having either a
> series of commands:
>
> SELECT pg_set_relation_stats('foo.bar'::regclass, ...);
> pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ...

I'm guessing the above was intended to be SELECT ..; SELECT ..;

Yes.

> Or one compound command
>
> SELECT pg_set_relation_stats(t.oid, ...)
> pg_set_attribute_stats(t.oid, 'id'::name, ...),
> pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
> ...
> FROM (VALUES('foo.bar'::regclass)) AS t(oid);
>
> The second one has the feature that if any one attribute fails, then the
> whole update fails, except, of course, for the in-place update of pg_class.
> This avoids having an explicit transaction block, but we could get that
> back by having restore wrap the list of commands in a transaction block
> (and adding the explicit lock commands) when it is safe to do so.

Hm, I like this approach as it should essentially give us the
transaction block we had been talking about wanting but without needing
to explicitly do a begin/commit, which would add in some annoying
complications. This would hopefully also reduce the locking concern
mentioned previously, since we'd get the lock needed in the first
function call and then the others would be able to just see that we've
already got the lock pretty quickly.

True, we'd get the lock needed in the first function call, but wouldn't we also release that very lock before the subsequent call? Obviously we'd be shrinking the window in which another process could get in line and take a superior lock, and the universe of other processes that would even want a lock that blocks us is nil in the case of an upgrade, identical to existing behavior in the case of an FDW ANALYZE, and perfectly fine in the case of someone tinkering with stats.

> Subject: [PATCH v8] Create pg_set_relation_stats, pg_set_attribute_stats.

[...]

> +Datum
> +pg_set_relation_stats(PG_FUNCTION_ARGS)

[...]

> + ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
> + if (!HeapTupleIsValid(ctup))
> + elog(ERROR, "pg_class entry for relid %u vanished during statistics import",
> + relid);

Maybe drop the 'during statistics import' part of this message? Also
wonder if maybe we should make it a regular ereport() instead, since it
might be possible for a user to end up seeing this?

Agreed and agreed. It was copypasta from ANALYZE.

This comment doesn't seem quite right. Maybe it would be better if it
was in the positive, eg: Only update pg_class if there is a meaningful
change.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Statistics Import and Export