Re: Hash join not finding which collation to use for string hashing
От | Tom Lane |
---|---|
Тема | Re: Hash join not finding which collation to use for string hashing |
Дата | |
Msg-id | 13291.1580416177@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hash join not finding which collation to use for string hashing (Mark Dilger <mark.dilger@enterprisedb.com>) |
Ответы |
Re: Hash join not finding which collation to use for string hashing
(Mark Dilger <mark.dilger@enterprisedb.com>)
|
Список | pgsql-hackers |
Mark Dilger <mark.dilger@enterprisedb.com> writes: > Would it make sense to catch a qual with unassigned collation > somewhere in the planner, where the qual's operator family is > estatblished, by checking if the operator family behavior is sensitive > to collations? > I’m not sure how to do that. pg_opfamily doesn’t seem to have a field for that. Can you recommend how I would proceedthere? There's no such information attached to opfamilies, which is more or less forced by the fact that individual operators don't expose it either. There's not much hope of making that better without incompatible changes in the requirements for extensions to define operators and/or operator families. > So, for = and !=, I’m looking at the definition of texteq, and it calls check_collation_set as one of the very first thingsit does. That’s where the error that annoys me comes out. But I don’t think it really needs to be doing this. Itshould first be determining if collation *matters*. But of course it matters. How do you know whether the operation is deterministic if you don't know the collation? > 3) Extend the concept of collations to collation sets. Right now, I’m only thinking about a collation set as having twovalues, the lefthand and the righthand side, but maybe there are other cases like (Left, (Left,Right)) that get builtup and need to work. Anyway, at the point in the executor that the collations don’t match, instead of passing NULLdown the line, pass in a collation set (Left, Right), and functions like texteq can see that they’re dealing with twodifferent collations and decide if they can deal with that or if they need to throw an error. Maybe this could work. I think it would get messy when bubbling up collations, but as long as you're talking about "sets" not "pairs" it might be possible to postpone collation resolution. To me, though, the main advantage of this is that we could throw a more explicit error like "collations "ja_JP" and "tr_TR" cannot be unified", since that information would still be there at runtime. I'm still pretty dubious that having texteq special-case the situation where the collations are different but all deterministic is a reasonable thing to do. One practical problem is that postponing that work to runtime could be a huge performance hit, because you'd have to do it over again on each call of the operator. I suppose some caching might be possible. Another issue is that you're still putting far too much emphasis on the fact that a hash-join plan manages to avoid this error, and ignoring the problem that a lot of other plans for the same query will not avoid it. What if the planner had chosen a merge-join, for instance? How useful is it to allow the join if things still break the moment you add an ORDER BY? regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Mark DilgerДата:
Сообщение: Re: Hash join not finding which collation to use for string hashing