Re: bg worker: patch 1 of 6 - permanent process

Поиск
Список
Период
Сортировка
От Markus Wanner
Тема Re: bg worker: patch 1 of 6 - permanent process
Дата
Msg-id 4C91D9A2.9060304@bluegap.ch
обсуждение исходный текст
Ответ на Re: bg worker: patch 1 of 6 - permanent process  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: bg worker: patch 1 of 6 - permanent process  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 09/15/2010 08:54 PM, Robert Haas wrote:
> I think that the bar for committing to another in-core replication
> solution right now is probably fairly high.

I'm not trying to convince you to accept the Postgres-R patch.. at least 
not now.

<showing-off>
BTW, that'd be what I call a huge patch:

bgworkers, excluding dynshmem and imessages: 34 files changed, 2910 insertions(+), 1421 deletions(-)

from there to Postgres-R: 98 files changed, 14856 insertions(+), 230 deletions(-)
</showing-off>

> I am pretty doubtful that
> our current architecture is going to get us to the full feature set
> we'd eventually like to have - multi-master, partial replication, etc.

Would be hard to do, due to the (physical) format of WAL, yes. That's 
why Postgres-R uses its own (logical) wire format.

>   But we're not ever going to have ten replication solutions in core,
> so we need to think pretty carefully about what we accept.

That's very understandable.

> That
> conversation probably needs to start from the other end - is the
> overall architecture correct for us? - before we get down to specific
> patches.  On the other hand, I'm very interested in laying the
> groundwork for parallel query

Cool. Maybe we should take another look at bgworkers, as soon as a 
parallel querying feature gets planned?

> and I think there are probably a number
> of bits of architecture both from this project and Postgres-XC, that
> could be valuable contributions to PostgreSQL;

(...note that Postgres-R is license compatible, as opposed to the GPL'ed 
Postgres-XC project...)

> however, in neither
> case do I expect them to be accepted without significant modification.

Sure, that's understandable as well. I've published this part of the 
infrastructure to get some feedback as early as possible on that part of 
Postgres-R.

As you can certainly imagine, it's important for me that any 
modification to such a patch from Postgres-R would still be compatible 
to what I use it for in Postgres-R and not cripple any functionality 
there, because that'd probably create more work for me than not getting 
the patch accepted upstream at all.

> I'm saying it's hard to think about committing any of them because
> they aren't really independent of each other or of other parts of
> Postgres-R.

As long as you don't consider imessages and dynshmem a part of 
Postgres-R, they are independent of the rest of Postgres-R in the 
technical sense.

And for any kind of parallel querying feature, imessages and dynshmem 
might be of help as well. So I currently don't see where I could 
de-couple these patches any further.

If you have a specific requirement, please don't hesitate to ask.

> I feel like there is an antagonistic thread to this conversation, and
> some others that we've had.  I hope I'm misreading that, because it's
> not my intent to piss you off.  I'm just offering my honest feedback.
> Your mileage may vary; others may feel differently; none of it is
> personal.

That's absolutely fine. I'm thankful for your feedback.

Also note that I initially didn't even want to add the bgworker patches 
to the commit fest. I've de-coupled and published these separate from 
Postgres-R with a) the hope to get feedback (more than for the overall 
Postgres-R patch) and b) to show others that such a facility exists and 
is ready to be reused.

I didn't really expect them to get accepted to Postgres core at the 
moment. But the Postgres team normally asks for sharing concepts and 
ideas as early as possible...

> OK, I think I understand what you're trying to say now.  I guess I
> feel like the ideal architecture for any sort of solution that needs a
> pool of workers would be to keep around the workers that most recently
> proved to be useful.  Upon needing a new worker, you look for one
> that's available and already bound to the correct database.  If you
> find one, you assign him to the new task.

That's mostly how bgworkers are designed, yes. The min/max idle 
background worker GUCs allow a loose control over how many spare 
processes you want to allow hanging around doing nothing.

> If not, you find the one
> that's been idle longest and either (a) kill him off and start a new
> one that is bound to the correct database or, even better, (b) tell
> him to flush his caches and rebind to the correct database.

Hm.. sorry if I didn't express this more clearly. What I'm trying to say 
is that (b) isn't worth implementing, because it doesn't offer enough of 
an improvement over (a). The only saving would be the fork() and some 
basic process initialization.

Being able to re-use a bgworker connected to the correct database 
already gives you most of the benefit, namely not having to fork() *and* 
re-connect to the database for every job.


Back at the technical issues, let me try to summarize the feedback and 
what I do with it.

In general, there's not much use for bgworkers for just autovacuum as 
the only background job. I agree.

Tom raised the 'lots of databases' issue. I agree that the bgworker 
infrastructure isn't optimized for such a work load, but argue that it's 
configurable to not hurt. If bgworkers ever gets accepted upstream, we'd 
certainly need to discuss about reasonable defaults for the relevant 
GUCs. Additionally, more cleverness about when to start or stop (spare) 
workers from the coordinator couldn't hurt.

I had a lengthy discussion with Dimitri about whether or not bgworkers 
could help him with some kind of PgQ daemon. I think we now agree that 
bgworkers isn't the right tool for that job.

You are questioning, whether the min_idle_bgworkers GUC is really 
necessary. I'm arguing that it is necessary in Postgres-R to cover load 
spikes, because starting bgworkers is slow.


So, overall, I now got quite a bit of feedback. There doesn't seem to be 
any stumbling block in the general design of bgworkers. So I'll happily 
continue to use (and refine) bgworkers for Postgres-R. And I'm looking 
forward to more discussions once parallel querying gets more serious 
attention.

Regards

Markus Wanner


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Abhijit Menon-Sen
Дата:
Сообщение: [REVIEW] Re: I: About "Our CLUSTER implementation is pessimal" patch
Следующее
От: Markus Wanner
Дата:
Сообщение: Re: TODO note