RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
Дата
Msg-id OSAPR01MB29778635FA73F653E1A82DE0FE639@OSAPR01MB2977.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-committers
From: Robert Haas <robertmhaas@gmail.com>
> Yeah, exactly. I don't think it's super-easy to understand exactly how
> to make that work well for something like this. It would be easy
> enough to set a flag in the relcache whose value is computed the first
> time we need it and is then consulted every time after that, and you
> just invalidate it based on sinval messages. But, if you go with that
> design, you've got a big problem: now an insert has to lock all the
> tables in the partitioning hierarchy to decide whether it can run in
> parallel or not, and we do not want that. We want to be able to just
> lock the partitions we need, so really, we want to be able to test for
> parallel-safety without requiring a relation lock, or only requiring
> it on the partitioned table itself and not all the partitions.
> However, that introduces a race condition, because if you ever check
> the value of the flag without a lock on the relation, then you can't
> rely on sinval to blow away the cached state. I don't have a good
> solution to that problem in mind right now, because I haven't really
> devoted much time to thinking about it, but I think it might be
> possible to solve it with more thought.

One problem with caching the result is that the first access in each session has to experience the slow processing.
Somesevere customers of our proprietary database, which is not based on Postgres, have requested to eliminate even the
overheadassociated with the first access, and we have provided features for them.  As for the data file, users can use
pg_prewam. But what can we recommend users to do in this case?  Maybe the logon trigger feature, which is ready for
committerin PG 14, can be used to allow users to execute possible queries at session start (or establishing a
connectionpool), but I feel it's inconvenient.
 


> But if I had thought about it and had not come up with anything better
> than what you committed here, I would have committed nothing, and I
> think that's what you should have done, too. This patch is full of
> grotty hacks. Just to take one example, consider the code that forces
> a transaction ID assignment prior to the operation. You *could* have
> solved this problem by introducing new machinery to make it safe to
> assign an XID in parallel mode. Then, we'd have a fundamental new
> capability that we currently lack. Instead, you just force-assigned an
> XID before entering parallel mode. That's not building any new
> capability; that's just hacking around the lack of a capability to
> make something work, while ignoring the disadvantages of doing it that
> way, namely that sometimes an XID will be assigned for no purpose.

Regarding the picked xid assignment, I didn't think it's so grotty.  Yes, in fact, I felt it's a bit unclean.  But it's
onlya single line of code.  With a single line of code, we can provide great value to users.  Why don't we go for it?
Asdiscussed in the thread, the xid is wasted only when the source data is empty, which is impractical provided that the
userwants to load much data probably for ETL.
 

(I'm afraid "grotty" may be too strong a word considering the CoC statement "We encourage thoughtful, constructive
discussionof the software and this community, their current state, and possible directions for development. The focus
ofour discussions should be the code and related technology, community projects, and infrastructure.")
 


> Likewise, the XXX comment you added to max_parallel_hazard_walker
> claims that some of the code introduced there is to compensate for an
> unspecified bug in the rewriter. I'm a bit skeptical that the comment
> is correct, and there's no way to find out because the comment doesn't
> say what the bug supposedly is, but let's just say for the sake of
> argument that it's true. Well, you *could* have fixed the bug, but
> instead you hacked around it, and in a relatively expensive way that
> affects every query with a CTE in it whether it can benefit from this
> patch or not. That's not a responsible way of maintaining the core
> PostgreSQL code.

It'd be too sad if we have to be bothered by an existing bug and give up an attractive feature.  Adding more
explanationin the comment is OK?  Anyway, I think we can separate this issue.
 


> I also agree with Andres's criticism of the code in
> target_rel_index_max_parallel_hazard entirely. It's completely
> unacceptable to be doing index_open() here. If you don't understand
> the design of the planner well enough to know why that's not OK, then
> you shouldn't be committing patches like this.

This sounds like something to address.  I have to learn...


    Regards
Takayuki     Tsunakawa
                        


В списке pgsql-committers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode