Обсуждение: WIP: Access method extendability
Hackers,
Postgres was initially designed to support access methods extendability. This extendability lives to present day. However, this is mostly internal in-core extendability. One can quite easily add new access method into PostgreSQL core. But if one try to implement access method as external module, he will be faced with following difficulties:
- Need to directly insert into pg_am, because of no "CREATE ACCESS METHOD" command. And no support of dependencies between am and opclasses etc.
- Module can't define xlog records. So, new am would be not WAL-logged.
The first problem is purely mechanical. Nothing prevents us to implement "CREATE ACCESS METHOD" and "DROP ACCESS METHOD" commands and support all required dependencies.
Problem of WAL is a bit more complex. According to previous discussions, we don't want to let extensions declare their own xlog records. If we let them then recovery process will depend on extensions. That is much violates reliability. Solution is to implement some generic xlog record which is able to represent difference between blocks in some general manner.
3 patches are attached:
- CREATE/DROP ACCESS METHOD commands. With support of dependencies.
- New generic xlog record type.
- Bloom contrib module as example of usage of previous two features. This module was posted few years ago by Teodor Sigaev. Now, it's wrapped as an extension and WAL-logged.
Patches are in WIP state. No documentation and very little of comments. However, it believe that it's enough to do some general concept review.
Some notes about generic xlog format. Generic xlog represent difference between pages as operations set of two kinds:
- Move memory inside the page. Optional flag is to zero gap on a previous memory location.
- Copy memory fragment of memory from xlog record to page. As an option bitwise logical operations are supported as well as plain copy.
Generic xlog can open page in two modes:
- Create mode: page is zeroed independent on its lsn.
- Update mode: page is updated only if it's lsn is lower than record lsn
Usually, xlog record is filled in critical sections when memory allocations is prohibited. Thus, user have to previously initialize it with knowledge of pages count, total operations count and total length of data.
------
With best regards,
Alexander Korotkov.
With best regards,
Alexander Korotkov.
Вложения
On 15 October 2014 13:08, Alexander Korotkov <aekorotkov@gmail.com> wrote: > Postgres was initially designed to support access methods extendability. > This extendability lives to present day. However, this is mostly internal > in-core extendability. One can quite easily add new access method into > PostgreSQL core. But if one try to implement access method as external > module, he will be faced with following difficulties: ... > Problem of WAL is a bit more complex. According to previous discussions, we > don't want to let extensions declare their own xlog records. If we let them > then recovery process will depend on extensions. That is much violates > reliability. Solution is to implement some generic xlog record which is able > to represent difference between blocks in some general manner. Thank you for progressing with these thoughts. I'm still a little uncertain about the approach, now my eyes are open to the problems of extendability. The main problem we had in the past was that GiST and GIN indexes both had faulty implementations for redo, which in some cases caused severe issues. Adding new indexes will also suffer the same problems, so I see a different starting place. The faults there raised the need for us to be able to mark specific indexes as corrupt, so that they could be avoided during Hot Standby and in normal running after promotion. Here's the order of features I think we need 1. A mechanism to mark an index as corrupt so that it won't be usable by queries. That needs to work during recovery, so we can persist a data structure which tells us which indexes are corrupt. Then something that checks whether an index is known corrupt during relcache access. So if we decide an index is bad, we record the index as corrupt and then fire a relcache invalidation. 2. Some additional code in Autovacuum to rebuild corrupt indexes at startup, using AV worker processes to perform a REINDEX CONCURRENTLY. This will give us what we need to allow an AM to behave sensibly, even in the face of its own bugs. It also gives us UNLOGGED indexes for free. Unlogged indexes means we can change the way unlogged tables behave to allow them to truncate down to the highest unchanged data at recovery, so we don't lose all the data when we crash. 3. That then allows us to move towards having indexes that are marked "changed" when we perform first DML on the table in any checkpoint cycle. Which allows us to rebuild indexes which were in the middle of being changed when we crashed. (The way we'd do that is to have an LSN on the metapage and then only write WAL for the metapage). The difference here is that they are UNLOGGED but do not get trashed on recovery unless they were in the process of changing. If we do those things, then we won't even need to worry about needing AMs to write their own WAL records. Recovery will be safe AND we won't need to go through problems of buggy persistence implementations in new types of index. Or put it another way, it will be easier to write new index AMs because we'll be able to skip the WAL part until we know we want it. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 28, 2014 at 10:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Or put it another way, it will be easier to write new index AMs > because we'll be able to skip the WAL part until we know we want it. I like the feature you are proposing, but I don't think that we should block Alexander from moving forward with a more-extensible WAL format. I believe that's a general need even if we get the features you're proposing, which would reduce the need for it. After all, if somebody builds an out-of-core index AM, ignoring WAL-logging, and then decides that it works well enough that they want to add WAL-logging, I think we should make that possible without requiring them to move the whole thing in-core. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 October 2014 14:53, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 28, 2014 at 10:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Or put it another way, it will be easier to write new index AMs >> because we'll be able to skip the WAL part until we know we want it. > > I like the feature you are proposing, but I don't think that we should > block Alexander from moving forward with a more-extensible WAL format. > I believe that's a general need even if we get the features you're > proposing, which would reduce the need for it. After all, if somebody > builds an out-of-core index AM, ignoring WAL-logging, and then decides > that it works well enough that they want to add WAL-logging, I think > we should make that possible without requiring them to move the whole > thing in-core. I'm not proposing an alternate or additional feature. I'm saying that the first essential step in adding WAL support to new AMs is to realise that they *will* have bugs (since with the greatest respect, the last two AMs from our friends did have multiple bugs) and so we must have a mechanism that prevents such bugs from screwing everything else up. Which is the mark-corrupt-index and rebuild requirement. We skip straight to the add-buggy-AMs part at our extreme peril. We've got about 10x as many users now since the 8.x bugs and all the new users like the reputation Postgres has for resilience. I think we should put the safety net in place first before we start to climb. The patch as submitted doesn't have any safety checks for whether the WAL records refer to persistent objects defined by the AM. At the very least we need to be able to isolate an AM to only screw up their own objects. Such checks look like they'd require some careful thought and refactoring first. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2014-10-15 16:08:38 +0400, Alexander Korotkov wrote: > Postgres was initially designed to support access methods extendability. > This extendability lives to present day. However, this is mostly internal > in-core extendability. One can quite easily add new access method into > PostgreSQL core. But if one try to implement access method as external > module, he will be faced with following difficulties: > > 1. Need to directly insert into pg_am, because of no "CREATE ACCESS > METHOD" command. And no support of dependencies between am and opclasses > etc. > 2. Module can't define xlog records. So, new am would be not WAL-logged. > > The first problem is purely mechanical. Nothing prevents us to implement > "CREATE ACCESS METHOD" and "DROP ACCESS METHOD" commands and support all > required dependencies. > > Problem of WAL is a bit more complex. According to previous discussions, we > don't want to let extensions declare their own xlog records. If we let them > then recovery process will depend on extensions. That is much violates > reliability. Solution is to implement some generic xlog record which is > able to represent difference between blocks in some general manner. I think this is a somewhat elegant way to attack this problem. But I'm not so sure it's actually sufficient. Consider e.g. how to deal with hot standby conflicts? How would you transport the knowledge that there's a xid conflict to the client? I guess my question essentially is whether it's actually sufficient for real world AMs. The other thing I'm not sure about is that I'm unconvinced that we really want external AMs... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 28, 2014 at 7:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
 
 
On 28 October 2014 14:53, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 28, 2014 at 10:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Or put it another way, it will be easier to write new index AMs
>> because we'll be able to skip the WAL part until we know we want it.
>
> I like the feature you are proposing, but I don't think that we should
> block Alexander from moving forward with a more-extensible WAL format.
> I believe that's a general need even if we get the features you're
> proposing, which would reduce the need for it. After all, if somebody
> builds an out-of-core index AM, ignoring WAL-logging, and then decides
> that it works well enough that they want to add WAL-logging, I think
> we should make that possible without requiring them to move the whole
> thing in-core.
I'm not proposing an alternate or additional feature.
I'm saying that the first essential step in adding WAL support to new
AMs is to realise that they *will* have bugs (since with the greatest
respect, the last two AMs from our friends did have multiple bugs) and
so we must have a mechanism that prevents such bugs from screwing
everything else up. Which is the mark-corrupt-index and rebuild
requirement.
We skip straight to the add-buggy-AMs part at our extreme peril. We've
got about 10x as many users now since the 8.x bugs and all the new
users like the reputation Postgres has for resilience. I think we
should put the safety net in place first before we start to climb.
agree and we thought about this
The patch as submitted doesn't have any safety checks for whether the
WAL records refer to persistent objects defined by the AM. At the very
least we need to be able to isolate an AM to only screw up their own
objects. Such checks look like they'd require some careful thought and
refactoring first.
the patch Alexander submitted is the PoC, we wanted to hear developers opinion and
I see no principal objection to work in this direction and we'll continue to work on all
possible issues. 
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Andres Freund (andres@2ndquadrant.com) wrote:
> The other thing I'm not sure about is that I'm unconvinced that we
> really want external AMs...
I was wondering about this also and curious as to if there's been any
prior on-list discussion about this proposal that I've simply missed..?
Would be happy to go back and review earlier discussions, of course, but
I don't recall there being any.
Thanks,
    Stephen
			
		On 28 October 2014 16:19, Stephen Frost <sfrost@snowman.net> wrote: > Would be happy to go back and review earlier discussions, of course, but > I don't recall there being any. It depends how far back you go. I think I've had at least 2 tries at writing something, but not in last 5 years. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 October 2014 14:22, Simon Riggs <simon@2ndquadrant.com> wrote: > Or put it another way, it will be easier to write new index AMs > because we'll be able to skip the WAL part until we know we want it. To be clear: I am suggesting you do *less* work, not more. By allowing AMs to avoid writing WAL we get * higher performance unlogged indexes * we get fewer bugs in early days of new AMs * writers of new AMs are OK to avoid majority of hard work and hard testing So overall, we get new AMs working faster because we can skip writing the WAL code until we are certain the new AM code is useful and bug free. For example, if GIN had avoided implementing WAL it would have been easier to change on-disk representation. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Stephen Frost <sfrost@snowman.net> writes:
> * Andres Freund (andres@2ndquadrant.com) wrote:
>> The other thing I'm not sure about is that I'm unconvinced that we
>> really want external AMs...
> I was wondering about this also and curious as to if there's been any
> prior on-list discussion about this proposal that I've simply missed..?
We've touched on the issue a few times, but I don't think there's been
any attempt to define a project policy about it.
My own thought is that allowing external AMs is simply a natural
consequence of PG's general approach to extensibility, and it would
be surprising if we were to decide we didn't want to allow that.
But having said that, it's quite unclear to me that we need the
CREATE/DROP ACCESS METHOD infrastructure proposed here.  The traditional
theory about that is that if you're competent to develop an AM at all,
you can certainly manage to insert a row into pg_am manually.  I'm afraid
that we'd be adopting and maintaining thousands of lines of code that
won't ever come close to pulling their weight in usefulness, or probably
ever be fully debugged.  (The submitted patch is about 1K lines in itself,
and it doesn't appear to address any of the consequences of establishing
an expectation that AMs are something that can be dropped or modified.
Can you say "cache flush"?)
So I'd be inclined to put that part of the patch on the back burner until
there are actually multiple externally maintained AMs that could use it.
Even then, I'm not sure we want to buy into DROP ACCESS METHOD.
I think we *do* need some credible method for extensions to emit WAL
records, though.  I've not taken any close look at the code proposed
for that, but the two-sentence design proposal in the original post
sounded plausible as far as it went.
So my vote is to pursue the WAL extensibility part of this, but not the
additional SQL commands.
As for the proposed contrib module, we don't need it to test the WAL
extensibility stuff: we could just rewrite some existing core code to emit
the "extensible" WAL records instead of whatever it's emitting now.
        regards, tom lane
			
		On 2014-10-28 13:06:52 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Andres Freund (andres@2ndquadrant.com) wrote: > >> The other thing I'm not sure about is that I'm unconvinced that we > >> really want external AMs... > > > I was wondering about this also and curious as to if there's been any > > prior on-list discussion about this proposal that I've simply missed..? > > We've touched on the issue a few times, but I don't think there's been > any attempt to define a project policy about it. > > My own thought is that allowing external AMs is simply a natural > consequence of PG's general approach to extensibility, and it would > be surprising if we were to decide we didn't want to allow that. It'd be entirely politicial. I agree. I'm pretty unhappy with the thought that we end up with several 'for pay' index ams out there. But then, PG is BSD style licensed. What I think we need to make absolutely sure is that we preserve the freedom to tinker with the AM functions. I think we'll very heavily curse ourselves if we can't as easily add new features there anymore. > But having said that, it's quite unclear to me that we need the > CREATE/DROP ACCESS METHOD infrastructure proposed here. The traditional > theory about that is that if you're competent to develop an AM at all, > you can certainly manage to insert a row into pg_am manually. The problem with doing that is that you not only need to add a row in pg_am, but also pg_depend. And a way to remove that row when the respective extension is dropped. Especially the latter imo changed the landscape a fair bit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-10-28 13:06:52 -0400, Tom Lane wrote:
>> But having said that, it's quite unclear to me that we need the
>> CREATE/DROP ACCESS METHOD infrastructure proposed here.  The traditional
>> theory about that is that if you're competent to develop an AM at all,
>> you can certainly manage to insert a row into pg_am manually.
> The problem with doing that is that you not only need to add a row in
> pg_am, but also pg_depend.
(1) That's not that hard; initdb makes pg_depend entries from SQL.
(2) You only need a row in pg_depend if you provide a DROP command
that would need to pay attention to it.
> And a way to remove that row when the
> respective extension is dropped.
I'm not at all sold on the idea that we need to support dropping AMs.
I think it'd be fine to consider that installing an AM into a given
database is a one-way operation.  Then you just need to insert some
pg_depend entries that "pin" the AM's individual functions, and you're
done.
Yeah, sure, CREATE/DROP ACCESS METHOD would be cleaner.  But in this
case I'm not buying the "if you build it they will come" argument.
External AMs *can* be built without any such SQL-level support, and if
there were really much demand for them, there would be some out there.
Let's build the essential WAL support first, and leave the syntactic
sugar till we see some demand.
        regards, tom lane
			
		On 28 October 2014 17:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My own thought is that allowing external AMs is simply a natural > consequence of PG's general approach to extensibility, and it would > be surprising if we were to decide we didn't want to allow that. If it wasn't clear from my two earlier attempts, yes, +1 to that. I'd like to avoid all of the pain by making persistent AMs that are recoverable after a crash, rather than during crash recovery. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-10-28 17:45:36 +0000, Simon Riggs wrote: > I'd like to avoid all of the pain by making persistent AMs that are > recoverable after a crash, rather than during crash recovery. Besides the actual difficulities of supporting this, imo not being available on HS and directly after a failover essentially makes them next to useless. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 28 October 2014 17:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My own thought is that allowing external AMs is simply a natural
>> consequence of PG's general approach to extensibility, and it would
>> be surprising if we were to decide we didn't want to allow that.
> If it wasn't clear from my two earlier attempts, yes, +1 to that.
> I'd like to avoid all of the pain by making persistent AMs that are
> recoverable after a crash, rather than during crash recovery.
I think the notion of having AMs that explicitly don't have WAL support
is quite orthogonal to what's being discussed in this thread.  It might
be worth doing that just to get the hash AM into a less-weird state
(given that nobody is stepping up to the plate to fix it properly).
        regards, tom lane
			
		On 2014-10-28 13:37:33 -0400, Tom Lane wrote: > I'm not at all sold on the idea that we need to support dropping AMs. > I think it'd be fine to consider that installing an AM into a given > database is a one-way operation. Then you just need to insert some > pg_depend entries that "pin" the AM's individual functions, and you're > done. I think that'd be somewhat ugly. An extension adding such a AM would then either actively need to block dropping (e.g. by pinned entries, as you mention) or do rather odd things on recreation. I think that'd be dropping our own standards. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/28/14, 9:22 AM, Simon Riggs wrote: > 2. Some additional code in Autovacuum to rebuild corrupt indexes at > startup, using AV worker processes to perform a REINDEX CONCURRENTLY. I don't think loading more functionality into autovac is the right way to do that. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Oct 28, 2014 at 8:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
------
With best regards,
Alexander Korotkov.
On 28 October 2014 14:22, Simon Riggs <simon@2ndquadrant.com> wrote:
> Or put it another way, it will be easier to write new index AMs
> because we'll be able to skip the WAL part until we know we want it.
To be clear: I am suggesting you do *less* work, not more.
By allowing AMs to avoid writing WAL we get
* higher performance unlogged indexes
* we get fewer bugs in early days of new AMs
* writers of new AMs are OK to avoid majority of hard work and hard testing
So overall, we get new AMs working faster because we can skip writing
the WAL code until we are certain the new AM code is useful and bug
free.
For example, if GIN had avoided implementing WAL it would have been
easier to change on-disk representation.
Major problem of changing on-disk representation is that we have to support both binary formats because of pg_upgrade. This problem is even burdened with WAL, because WAL record redo function also have to support both formats. However, it's also quite independent of WAL.
Having access methods as extensions can significantly improves situations here. Imagine, GIN was an extension. One day we decide to change its binary format. Then we can issue new extension, GIN2 for instance. User can upgrade from GIN to GIN2 in following steps:
- CREATE EXTENSION gin2;
- CREATE INDEX CONCURRENTLY [new_index] USING gin2 ([index_def]);
- DROP INDEX  CONCURRENTLY [old_index];
- DROP EXTENSION gin;
No need to write and debug the code which reads both old and new binary format. For sure, we need to support old GIN extension for some time. But, we have to support old in-core versions too. 
Unfortunately, I didn't mention this in the first post because I consider this as a serious argument for extensible AMs.
Also, I'm not sure that many users have enough of courage to use unlogged AMs. Absence of durability is only half of trouble, another half is lack of streaming replication. I think if we have unlogged GIN then external indexing engines would be used by majority of users instead of GIN.
------
With best regards,
Alexander Korotkov.
On Tue, Oct 28, 2014 at 01:51:21PM -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On 28 October 2014 17:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> My own thought is that allowing external AMs is simply a natural > >> consequence of PG's general approach to extensibility, and it would > >> be surprising if we were to decide we didn't want to allow that. > > > If it wasn't clear from my two earlier attempts, yes, +1 to that. > > > I'd like to avoid all of the pain by making persistent AMs that are > > recoverable after a crash, rather than during crash recovery. > > I think the notion of having AMs that explicitly don't have WAL support > is quite orthogonal to what's being discussed in this thread. It might > be worth doing that just to get the hash AM into a less-weird state > (given that nobody is stepping up to the plate to fix it properly). > > regards, tom lane > Hi, I think that someone is working on the hash index WAL problem, but are coming up to speed on the whole system, which takes time. I know that I have not had a large enough block of time to spend on it either. :( Regards, Ken
* Alexander Korotkov (aekorotkov@gmail.com) wrote:
> Having access methods as extensions can significantly improves situations
> here. Imagine, GIN was an extension. One day we decide to change its binary
> format. Then we can issue new extension, GIN2 for instance. User can
> upgrade from GIN to GIN2 in following steps:
We could support this without having GIN be an extension by simply
having a GIN2 in core also, so I don't buy off on this being a good
reason for extensions to provide AMs.  For my 2c, I'm pretty happy with
the general idea of "read-old, write-new" to deal with transistions.
It's more complicated, certainly, but I don't think trying to force
users off of an old version is actually going to work all that well and
we'd just end up having to support both the old and new extensions
indefinitely anyway.
Thanks,
    Stephen
			
		On 28 October 2014 17:47, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-10-28 17:45:36 +0000, Simon Riggs wrote: >> I'd like to avoid all of the pain by making persistent AMs that are >> recoverable after a crash, rather than during crash recovery. > > Besides the actual difficulities of supporting this, imo not being > available on HS and directly after a failover essentially makes them > next to useless. Broken WAL implementations are worse than useless. I'm saying we should work on how to fix broken indexes first, before we allow a crop of new code that might cause them. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 October 2014 17:58, Alexander Korotkov <aekorotkov@gmail.com> wrote: > Also, I'm not sure that many users have enough of courage to use unlogged > AMs. Absence of durability is only half of trouble, another half is lack of > streaming replication. I think if we have unlogged GIN then external > indexing engines would be used by majority of users instead of GIN. Please answer the problem I have raised. How will you act when the new AM that you write breaks? How will you advise your users that the durability they sensibly desire is actually causing them data loss? I haven't opposed your ideas in this patch; I have only observed the necessary surrounding infrastructure is incomplete. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 October 2014 17:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 10/28/14, 9:22 AM, Simon Riggs wrote: >> >> 2. Some additional code in Autovacuum to rebuild corrupt indexes at >> startup, using AV worker processes to perform a REINDEX CONCURRENTLY. > > > I don't think loading more functionality into autovac is the right way to do > that. You'd need to explain why and/or suggest your right way. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-10-28 20:17:57 +0000, Simon Riggs wrote: > On 28 October 2014 17:47, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-10-28 17:45:36 +0000, Simon Riggs wrote: > >> I'd like to avoid all of the pain by making persistent AMs that are > >> recoverable after a crash, rather than during crash recovery. > > > > Besides the actual difficulities of supporting this, imo not being > > available on HS and directly after a failover essentially makes them > > next to useless. > > Broken WAL implementations are worse than useless. > > I'm saying we should work on how to fix broken indexes first, before > we allow a crop of new code that might cause them. Why do we presume all of them will be that buggy? And why is that different for nbtree, gin, gist? And how is any form of automated invalidation changing anything fundamentally? To me this is a pretty independent issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 October 2014 23:25, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-10-28 20:17:57 +0000, Simon Riggs wrote: >> On 28 October 2014 17:47, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2014-10-28 17:45:36 +0000, Simon Riggs wrote: >> >> I'd like to avoid all of the pain by making persistent AMs that are >> >> recoverable after a crash, rather than during crash recovery. >> > >> > Besides the actual difficulities of supporting this, imo not being >> > available on HS and directly after a failover essentially makes them >> > next to useless. >> >> Broken WAL implementations are worse than useless. ...because the indexes are also unavailable during HS. >> I'm saying we should work on how to fix broken indexes first, before >> we allow a crop of new code that might cause them. > > Why do we presume all of them will be that buggy? And why is that > different for nbtree, gin, gist? And how is any form of automated > invalidation changing anything fundamentally? The current system does not allow for the possibility of a corruption bug. If one occurs, the only thing an AM can do is PANIC. It has no mechanism to isolate the problem and deal with it, which affects the whole server. So the issue is one of risk of PANIC or data loss - things we have always taken strong measures against. That is all I have requested as a first step. And I request it because I remember and dealt with many bugs and user problems in earlier times of 6-9 years ago. You are also right: btree, GIN and GIST will benefit from this also. > To me this is a pretty independent issue. Initial users of GiST and GIN were rare. The clear target here is indexing of JSONB data. I don't expect the users of that to be rare. I expect adoption to be rapid and the effect of bugs to be widespread. So I see the ability to report bugs and prevent data loss as essential in the context of new AMs. Automatic fixing of the problem could be fairly easy, but might be regarded as nice to have, as long as manual fixing of the problem is easily possible. I don't regard any of this as an insult or comment that certain people write buggy code, while others are better people. Everybody has bugs and WAL code in complex new index types is complex enough that it is more likely than other places. I salute those who write innovative new code for Postgres. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 28, 2014 at 7:25 PM, Andres Freund <andres@2ndquadrant.com> wrote: > To me this is a pretty independent issue. I quite agree. As Stephen was at pains to remind me last night on another thread, we cannot force people to write the patches we think they should write. They get to pursue what they think the priorities are, not what anyone else thinks they are. Of course we can and should block patches that we think are a bad idea, or that are badly-designed or badly-implemented for what they are, but we cannot and should not block someone who feels that the first priority is A just because we think it is B or C. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 October 2014 09:27, Simon Riggs <simon@2ndquadrant.com> wrote: > The current system does not allow for the possibility of a corruption > bug. If one occurs, the only thing an AM can do is PANIC. It has no > mechanism to isolate the problem and deal with it, which affects the > whole server. > > So the issue is one of risk of PANIC or data loss - things we have > always taken strong measures against. That is all I have requested as > a first step. And I request it because I remember and dealt with many > bugs and user problems in earlier times of 6-9 years ago. > > You are also right: btree, GIN and GIST will benefit from this also. Since I feel this is a real concern, I will contribute this feature, outside of the current patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/28/14, 3:27 PM, Simon Riggs wrote: > On 28 October 2014 17:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> On 10/28/14, 9:22 AM, Simon Riggs wrote: >>> >>> 2. Some additional code in Autovacuum to rebuild corrupt indexes at >>> startup, using AV worker processes to perform a REINDEX CONCURRENTLY. >> >> >> I don't think loading more functionality into autovac is the right way to do >> that. > > You'd need to explain why and/or suggest your right way. Why wouldn't we register it as a background worker? Not only doesn't this have anything to do with vacuum, but it should operate differently as well: once we've rebuilt everythingthat needs to be rebuilt the process should go away until the next startup. That's the opposite of what autovacdoes. The one potential commonality I see is having a launcher process that's responsible for launching multiple workers (if wewant to be rebuilding multiple indexes at once), but AFAICT that capability is also provided by bgworker.c. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2014-10-29 14:33:00 -0500, Jim Nasby wrote: > On 10/28/14, 3:27 PM, Simon Riggs wrote: > >On 28 October 2014 17:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > >>On 10/28/14, 9:22 AM, Simon Riggs wrote: > >>> > >>>2. Some additional code in Autovacuum to rebuild corrupt indexes at > >>>startup, using AV worker processes to perform a REINDEX CONCURRENTLY. > >> > >> > >>I don't think loading more functionality into autovac is the right way to do > >>that. > > > >You'd need to explain why and/or suggest your right way. > > Why wouldn't we register it as a background worker? > > Not only doesn't this have anything to do with vacuum, but it should operate differently as well: once we've rebuilt everythingthat needs to be rebuilt the process should go away until the next startup. That's the opposite of what autovacdoes. That's pretty much how autovac workers work. Do stuff until not needed anymore. The difference is that you have a process that starts them. It'd not be a good idea to throw this together with user defined bgworkers because there's a finite number of slots for them. So at the very least we'd need a separate pool for system bgworkers. Which would persistently take up resources (PGPROC entries et al). So it seems better to use the existing pool of autovac workers. > The one potential commonality I see is having a launcher process that's responsible for launching multiple workers (ifwe want to be rebuilding multiple indexes at once), but AFAICT that capability is also provided by bgworker.c. There really is no need to use bgworkers for builtin things. They are useful because they allow extensions to do what in core already could do for a long time. Greetings, Andres Freund PS: You mails would be easier to read if htey had sane line lenghts... -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr"><div class="gmail_extra"><div class="gmail_extra">Hi!</div><div class="gmail_extra"><br /></div><div class="gmail_extra">Thankseverybody for discussion. Sorry for delay. I'll try to address most of questions arised in thisdiscussion.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">In general, I'm agree with concept ofmarking index as invalid in certain cases.</div><div class="gmail_extra">I see following possible consequences of buggyWAL-logged custom AM:</div><div class="gmail_extra">1) Server crash during insert/update.</div><div class="gmail_extra">2)Server crash during select. </div><div class="gmail_extra">3) Invalid query answers.</div><div class="gmail_extra">4)Server crash during vacuum.</div><div class="gmail_extra">5) Server crash in recovery.</div><div class="gmail_extra"><br/></div><div class="gmail_extra">#5 is totally unacceptable. I've tried to design generic WAL recordso that it should be always possible to purely mechanically apply the record. It's always possible to move piece ofmemory inside the page. It's always possible to copy piece of memory from WAL-record to the page. Buggy WAL for sure couldcause an index corruption as well as any other bug in AM. WAL support doesn't bring nothing special to this expect thefact that WAL is quite hard to debug.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">However, in currentimplementation I missed some hidden asserts about page structure. Page could be so corrupted that we can't, for instance,safely call XLogReadBufferForRedo(). All this cases must be worked out. If we can't update page during recovery,index must be marked as invalid. It's some amount of work, but it seems to be very feasible.</div><div class="gmail_extra"><br/></div><div class="gmail_extra">#4 seems problematic too. If autovacuum crashes on some index, thenpostgres can enter a crash loop. So, if autovacuum crashes on extension provided AM, that index should be marked as invalid.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Consequences #1, #3 and #3 could be easily causedby buggy opclass as well. The reason why we're not knee-deep in extension provied bugs in GiST/GIN opclasses is noteasyness of writing correct GiST/GIN opclasses. Actual reason is that we have only few GiST/GIN opclasses which weren'twritten by GiST/GIN authors. </div><div class="gmail_extra"><br /></div><div class="gmail_extra">So, I don't expectPostgreSQL to be flooded with buggy AMs once we add AM extendability. Our motivation behind this proposal is that wewant to be able to develop custom extensions with AMs. We want to be able to provide our AMs to our customers whithouthaving to push that into PostgreSQL core or fork PostgreSQL. Bugs in that AMs in our responsibility to out customers.Some commercial companies could implement patented AMs (for instance, fractal tree), and that is their responsibilityto their customers. </div><div class="gmail_extra"><br /></div><div class="gmail_extra">Also, I think it'sOK to put adopting custom AMs to changes of AM interface to authors of those custom AMs. AM extensions anyway shouldbe adopted to each new major release. AFAIR, interface of RelationOpen() function has been changed not too long ago.Custom AM would use many functions which we use to access relations. Their interface can be changed in the next release.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">PostGIS GiST opclass has bugs which are reproducable,known and not fixed. This is responsibility of PostGIS to their customers. We can feel sorry for PostGIS thatthey are so slow on fixing this. But we shouldn't feel sorry for GiST extendability, that woulde be redicilous.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Some recearches could write their own extensions.We can hope that they are smart enough to not recommend it for production use. We can back our hope with warningduring installing extension provided AM. That warning could say that all corruption caused by extension provided AMis up to AM developer. This warning could make users to beware of using extension provided AMs in production </div><divclass="gmail_extra">unless they are fully trust extension developer (have support subscription if it'scommercial).</div><div class="gmail_extra"><br /></div><div class="gmail_extra">PostgreSQL doesn't have any kind of safeextensions. Every extension must be trusted. Every extension can break (almost) everything.When one types CREATE EXTENSIONhe must completely trust extension author. This applies to every extension. </div><div class="gmail_extra"><br /></div><divclass="gmail_extra">I would be very careful with discouraging commercial AM extensions. We should always keenin the mind how many of PostgreSQL developers are working for companies which own their commercial PostgreSQL forks andhow big their contribution is. Patented extensions looks scary for sure. But it's up to software patents not up to PostgreSQLextendability...</div><div class="gmail_extra"><br /></div><div class="gmail_extra">Particular steps I'm goingto do on these patches:</div><div class="gmail_extra">1) Make generic_redo never crash on broken pages.</div><div class="gmail_extra">2)Make autovacuum launcher mark index as invalid if vacuum process crashed on custom AM index. Since,we can't write something into postgres cluster when one process has crushed, ITSM autovacuum should have some separateplace to put this information. Thus, after restart postgres can read it and mark index as invalid.</div><div class="gmail_extra"><br/></div><div class="gmail_extra">Don't allowing CREATE ACCESS METHOD command seems problematic forme. How could it work with pg_upgrade? pg_dump wouldn't dump extra pg_am records. So, pg_upgrade would break at creatingoperator classes on new cluster. So, I agree with dropping create am command only if we let pg_dump to dump extrapg_am records...</div><div class="gmail_extra"><br /></div>------<br />With best regards,<br />Alexander Korotkov.</div></div>
On 11/10/2014 10:30 PM, Alexander Korotkov wrote: > Don't allowing CREATE ACCESS METHOD command seems problematic for me. How > could it work with pg_upgrade? pg_dump wouldn't dump extra pg_am records. > So, pg_upgrade would break at creating operator classes on new cluster. So, > I agree with dropping create am command only if we let pg_dump to dump > extra pg_am records... pg_dump would dump the CREATE EXTENSION command, and the extension's installation script inserts the row to pg_am. pg_dump doesn't dump objects that are part of an extension, so that's how this would work with the CREATE ACCESS METHOD command, too. Backtracking a bit, to summarize the discussion so far: * It would be nice to have indexes that are not WAL-logged, but are automatically re-created after a crash. But it's a completely different and orthogonal feature, so there's no need to discuss that further in this thread. * If an extension is buggy, it can crash and corrupt the whole database. There isn't really anything we can do about that, and this patch doesn't make that any better or worse. * CREATE ACCESS METHOD command isn't worth it. Looking at the patch itself: * It has bitrotted, thanks to my WAL format changes. * The memcpy/memmove based API seems difficult to use correctly. Looking at the bloom filter example, it seems that you need a lot more code to implement WAL-logging with this, than you would with a rmgr-specific redo function. I was hoping this to make it simpler. I think the API has to be more high-level. Or at least also provide a higher-level API, in addition to the low level one. Looking at the indexams we have, almost all pages use the standard page layout, with PageAddItem etc., plus a metapage, plus some indexam-specific metadata in the special area. The proposed API makes it pretty complicated to deal with that common case. After calling PageAddItem, you need intimate knowledge of what PageAddItem did, to create a correct WAL record for it. For PageIndexMultiDelete, it's next to impossible. I think we'll have to accept that the WAL records with this generic API are going to be much bulkier than ones tailored for the AM. We could provide a compact representation for common operations like PageAddItem, but for any more complicated operations, just WAL-log a full page image, as it's too fiddly to track the changes at a finer level. For any serious performance critical stuff, you'll just have to write an old-fashioned rmgr. (marking this as "returned with feedback" in the commitfest) - Heikki
Hi, Heikki!
Thank you for summarizing. In general, I agree with your notes with some exceptions.
On Mon, Nov 24, 2014 at 1:52 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
 
------
With best regards,
Alexander Korotkov.
On 11/10/2014 10:30 PM, Alexander Korotkov wrote:Don't allowing CREATE ACCESS METHOD command seems problematic for me. How
could it work with pg_upgrade? pg_dump wouldn't dump extra pg_am records.
So, pg_upgrade would break at creating operator classes on new cluster. So,
I agree with dropping create am command only if we let pg_dump to dump
extra pg_am records...
pg_dump would dump the CREATE EXTENSION command, and the extension's installation script inserts the row to pg_am. pg_dump doesn't dump objects that are part of an extension, so that's how this would work with the CREATE ACCESS METHOD command, too.
In binary upgrade mode pg_dump have to guarantee that all database objects will have same oids. That's why in binary upgrade mode pg_dump dumps extension contents instead of just CREATE EXTENSION command.
Backtracking a bit, to summarize the discussion so far:
* It would be nice to have indexes that are not WAL-logged, but are automatically re-created after a crash. But it's a completely different and orthogonal feature, so there's no need to discuss that further in this thread.
* If an extension is buggy, it can crash and corrupt the whole database. There isn't really anything we can do about that, and this patch doesn't make that any better or worse.
* CREATE ACCESS METHOD command isn't worth it.
Taking into account my previous note, how can custom extensions survive pg_upgrade without CREATE ACCESS METHOD command?
With best regards,
Alexander Korotkov.
Hackers,
			
		there is next revision of patches providing access method extendability.
Now it's based on another patch which reworks access method interface.
http://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
Besides access method interface, major change is generic xlog interface. Now, generic xlog interface is more user friendly. Generic xlog compares initial and changed versions of page by itself. The only thing it can't do is to find data moves inside page, because it would be too high overhead. So in order to get compact WAL records one should use GenericXLogMemmove(dst, src, size) in order to move data inside page. If this function wasn't used then WAL records would just not so compact.
In general pattern of generic WAL usage is following.
1) Start using generic WAL: specify relation
GenericXLogStart(index);
2) Register buffers
GenericXLogRegister(0, buffer1, false);
GenericXLogRegister(1, buffer2, true);
first argument is a slot number, second is the buffer, third is flag indicating new buffer
3) Do changes in the pages. Use GenericXLogMemmove() if needed.
4) Finish using GenericXLogFinish(), or abort using GenericXLogAbort(). In the case of abort initial state of pages will be reverted.
Generic xlog takes care about critical section, unlogged relation, setting lsn, making buffer dirty. User code is just simple and clear.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
> In general pattern of generic WAL usage is following. > > 1) Start using generic WAL: specify relation M-m, what about extensions which wants to use WAL but WAL record doesn't connected to any relation? For example, transaction manager or kind of FDW. > > GenericXLogStart(index); > > 2) Register buffers > > GenericXLogRegister(0, buffer1, false); > GenericXLogRegister(1, buffer2, true); > > first argument is a slot number, second is the buffer, third is flag indicating > new buffer Why do we need a slot number? to replace already registered buffer? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Tue, Sep 1, 2015 at 6:50 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
In general pattern of generic WAL usage is following.
1) Start using generic WAL: specify relation
M-m, what about extensions which wants to use WAL but WAL record doesn't connected to any relation? For example, transaction manager or kind of FDW.
GenericXLogStart(index);
2) Register buffers
GenericXLogRegister(0, buffer1, false);
GenericXLogRegister(1, buffer2, true);
first argument is a slot number, second is the buffer, third is flag indicating
new buffer
Why do we need a slot number? to replace already registered buffer?
For further simplification slot number could be omitted. In the attached patches, generic xlog replay applies changes in the same order GenericXLogRegister was called.
Patches was rebased against latest version of am interface rework patch.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения
Some notices: 1) create-am.3.patch.gz As I understand, you didn't add schema name to access method. Why? Suppose, if we implement SQL-like interface for AM screation/dropping then we should provide a schema qualification for them 2) create-am.3.patch.gz get_object_address_am() + switch (list_length(objname)) ... + case 2: + catalogname = strVal(linitial(objname)); + amname = strVal(lthird(objname)); ^^^^^^ seems, it shouldbe lsecond() 3) create-am.3.patch.gz Suppose, RM_GENERIC_ID is part of generic-xlog.3.patch.gz 4) Makefile(s) As I can see, object files are lexicographically ordered 5) gindesc.c -> genericdesc.c in file header 6) generic-xlog.3.patch.gz I don't like an idea to split START_CRIT_SECTION and END_CRIT_SECTION to GenericXLogStart and GenericXLogFinish. User's code could throw a error or allocate memory between them and error will become a panic. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 2015-09-07 17:41, Teodor Sigaev wrote: > Some notices: > > 1) create-am.3.patch.gz > As I understand, you didn't add schema name to access method. Why? > Suppose, if we implement SQL-like interface for AM screation/dropping > then we should provide a schema qualification for them Why would access methods need to be schema qualified? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek wrote: > On 2015-09-07 17:41, Teodor Sigaev wrote: >> Some notices: >> >> 1) create-am.3.patch.gz >> As I understand, you didn't add schema name to access method. Why? >> Suppose, if we implement SQL-like interface for AM screation/dropping >> then we should provide a schema qualification for them > > Why would access methods need to be schema qualified? Opfamily and opclass could be schema-qualified. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Mon, Sep 7, 2015 at 3:32 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Petr Jelinek wrote: >> >> On 2015-09-07 17:41, Teodor Sigaev wrote: >>> >>> Some notices: >>> >>> 1) create-am.3.patch.gz >>> As I understand, you didn't add schema name to access method. Why? >>> Suppose, if we implement SQL-like interface for AM screation/dropping >>> then we should provide a schema qualification for them >> >> >> Why would access methods need to be schema qualified? > > > Opfamily and opclass could be schema-qualified. So what? Making access methods schema-qualified seems like a completely separate project, and an unnecessary one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Some notices:
1) create-am.3.patch.gz
As I understand, you didn't add schema name to access method. Why? Suppose, if we implement SQL-like interface for AM screation/dropping then we should provide a schema qualification for them
Per subsequent discussion it's unclear why we need to make access methods schema qualified.
2) create-am.3.patch.gz get_object_address_am()
+ switch (list_length(objname))
...
+ case 2:
+ catalogname = strVal(linitial(objname));
+ amname = strVal(lthird(objname));
^^^^^^ seems, it should be lsecond()
Fixed.
3) create-am.3.patch.gz
Suppose, RM_GENERIC_ID is part of generic-xlog.3.patch.gz
Fixed.
4) Makefile(s)
As I can see, object files are lexicographically ordered
Fixed. 
5) gindesc.c -> genericdesc.c in file header
Fixed.
6) generic-xlog.3.patch.gz
I don't like an idea to split START_CRIT_SECTION and END_CRIT_SECTION to GenericXLogStart and GenericXLogFinish. User's code could throw a error or allocate memory between them and error will become a panic.
Fixed. Now, critical section is only inside GenericXLogFinish. GenericXLogRegister returns pointer to the page copies which could be modified. GenericXLogFinish swaps the data between page copy and page itself. That allow us to avoid critical section in used code.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
Hi!
In the attached version of patch access methods get support of pg_dump.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
Alexander Korotkov wrote: > Hi! > > Patches was rebased against master. > > In the attached version of patch access methods get support of pg_dump. Thanks. This patch came in just as the commitfest was ending, so I'm moving it to the next one. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, failed
Spec compliant:           not tested
Documentation:            not tested
There are currently no docs or unit tests. I suspect this patch is still WIP?
create-am.5.patch:
General notes:
Needs catversion bump.
IndexQualInfo and GenericCosts have been moved to src/include/utils/selfuncs.h.
METHOD becomes an unreserved keyword.
generic-xlog.5.patch:
generic_xlog.c: At least needs a bunch of explanatory comments, if not info in a README. Since I don't really
understandwhat the design here is my review is just cursory.
 
static memoryMove() - seems like an overly-generic function name to me...
writeCopyFlagment(), writeMoveFlagment(): s/Fl/Fr/?
bloom-control.5:
README:
+ CREATE INDEX bloomidx ON tbloom(i1,i2,i3) 
+        WITH (length=5, col1=2, col2=2, col3=4);
+ 
+ Here, we create bloom index with signature length 80 bits and attributes
+ i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.
It's not clear to me where 80 bits is coming from...
bloom.h:
+ #define BITBYTE     (8)
ISTR seeing this defined somewhere in the Postgres headers; dunno if it's worth using that definition instead.
Testing:
I ran the SELECT INTO from the README, as well as CREATE INDEX bloomidx. I then ran
insert into tbloom select       (generate_series(1,1000)*random())::int as i1,
(generate_series(1,1000)*random())::intas i2,       (generate_series(1,1000)*random())::int as i3,
(generate_series(1,1000)*random())::intas i4,       (generate_series(1,1000)*random())::int as i5,
(generate_series(1,1000)*random())::intas i6,       (generate_series(1,1000)*random())::int as i7,
(generate_series(1,1000)*random())::intas i8,       (generate_series(1,1000)*random())::int as i9,
(generate_series(1,1000)*random())::intas i10,       (generate_series(1,1000)*random())::int as i11,
(generate_series(1,1000)*random())::intas i12,       (generate_series(1,1000)*random())::int as i13from
generate_series(1,999);
and kill -9'd the backend. After restart I did VACUUM VERBOSE without issue. I ran the INSERT INTO, kill -9 and VACUUM
VERBOSEsequence again. This time I got an assert:
 
TRAP: FailedAssertion("!(((bool) (((const void*)((ItemPointer) left) != ((void*)0)) && (((ItemPointer) left)->ip_posid
!=0))))", File: "vacuumlazy.c", Line: 1823)
 
That line is
lblk = ItemPointerGetBlockNumber((ItemPointer) left);
which does
AssertMacro(ItemPointerIsValid(pointer)), \
which is the assert that's failing.
I've squirreled this install away for now, in case you can't repro this failure.
			
		Hi!
Next version of patch is attached.
On Tue, Feb 2, 2016 at 4:09 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
 
 
 
 
 
 
 
 
 
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested
There are currently no docs or unit tests. I suspect this patch is still WIP?
I hope to exit WIP state soon...
create-am.5.patch:
General notes:
Needs catversion bump.
Yes. I think catversion bump is committer responsibility and shouldn't be included into patch.
IndexQualInfo and GenericCosts have been moved to src/include/utils/selfuncs.h.
Yes, they have been moved in order to be accessed by custom index access method.
METHOD becomes an unreserved keyword.
Seems to be inevitable if we want CREATE ACCESS METHOD command.
generic-xlog.5.patch:
generic_xlog.c: At least needs a bunch of explanatory comments, if not info in a README. Since I don't really understand what the design here is my review is just cursory.
Make detailed explanation of API in generic_xlog.h. Could be moved into README if needed.
static memoryMove() - seems like an overly-generic function name to me...
I've simplified generic xlog to just per byte comparison without support of memory move. Now it would be much easier to commit. Support of memory move could be added in the separate patch though. 
writeCopyFlagment(), writeMoveFlagment(): s/Fl/Fr/?
Fixed. 
bloom-control.5:
README:
+ CREATE INDEX bloomidx ON tbloom(i1,i2,i3)
+ WITH (length=5, col1=2, col2=2, col3=4);
+
+ Here, we create bloom index with signature length 80 bits and attributes
+ i1, i2 mapped to 2 bits, attribute i3 - to 4 bits.
It's not clear to me where 80 bits is coming from...
length is measure in uint16...
For now, it's documented.
bloom.h:
+ #define BITBYTE (8)
ISTR seeing this defined somewhere in the Postgres headers; dunno if it's worth using that definition instead.
Got rid of this. Using BITS_PER_BYTE now.
Testing:
I ran the SELECT INTO from the README, as well as CREATE INDEX bloomidx. I then ran
insert into tbloom select
(generate_series(1,1000)*random())::int as i1,
(generate_series(1,1000)*random())::int as i2,
(generate_series(1,1000)*random())::int as i3,
(generate_series(1,1000)*random())::int as i4,
(generate_series(1,1000)*random())::int as i5,
(generate_series(1,1000)*random())::int as i6,
(generate_series(1,1000)*random())::int as i7,
(generate_series(1,1000)*random())::int as i8,
(generate_series(1,1000)*random())::int as i9,
(generate_series(1,1000)*random())::int as i10,
(generate_series(1,1000)*random())::int as i11,
(generate_series(1,1000)*random())::int as i12,
(generate_series(1,1000)*random())::int as i13
from generate_series(1,999);
and kill -9'd the backend. After restart I did VACUUM VERBOSE without issue. I ran the INSERT INTO, kill -9 and VACUUM VERBOSE sequence again. This time I got an assert:
TRAP: FailedAssertion("!(((bool) (((const void*)((ItemPointer) left) != ((void*)0)) && (((ItemPointer) left)->ip_posid != 0))))", File: "vacuumlazy.c", Line: 1823)
That line is
lblk = ItemPointerGetBlockNumber((ItemPointer) left);
which does
AssertMacro(ItemPointerIsValid(pointer)), \
which is the assert that's failing.
I've squirreled this install away for now, in case you can't repro this failure.
Should be fixed.
General notes about current version of patch:
* A lot of comments added.
* bloom documentation is moved from README to SGML with a lot of addons and cleanup.
* Memory move support in generic xlog is removed. Now it's much more simple and clean.
* Tests for CREATE ACCESS METHOD added. For now, it creates a mirror of GiST access method.
* Syntax for CREATE ACCESS METHOD is changed. For now, it's "CREATE ACCESS METHOD amname TYPE INDEX HANDLER handler;" in respect of parallel work on sequential access methods. New access method attribute added: amtype.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
On Mon, Feb 15, 2016 at 5:30 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
General notes about current version of patch:* A lot of comments added.* bloom documentation is moved from README to SGML with a lot of addons and cleanup.* Memory move support in generic xlog is removed. Now it's much more simple and clean.* Tests for CREATE ACCESS METHOD added. For now, it creates a mirror of GiST access method.* Syntax for CREATE ACCESS METHOD is changed. For now, it's "CREATE ACCESS METHOD amname TYPE INDEX HANDLER handler;" in respect of parallel work on sequential access methods. New access method attribute added: amtype.
Next version of patch is attached:
 * Documentation for CREATE ACCESS METHOD command is added.
 * Refactoring and comments for bloom contrib.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
> Next version of patch is attached: > * Documentation for CREATE ACCESS METHOD command is added. > * Refactoring and comments for bloom contrib. Cool work, nice to see. Some comments 1 create-am.7.patch: function lookup_index_am_handler_func() why doesn't it emit error in case of emtpy input? If it checks signature of function and empty handler is not allowed then, seems, all checks about handler have to be moved in lookup_index_am_handler_func(). 2 create-am.7.patch: Comment near get_am_name() incorrectly mentions get_am_oid function 3 create-am.7.patch: get_am_name(Oid amOid, char amtype). Seems, amtype argument is overengineering. get_am_name() is used only in error messages and additional check in it will do nothing or even confuse user. 4 create-am.7.patch: Inconsistentcy with psql help. As I can see other code, it's forbidden to create access method without handler postgres=# \h create access method Command: CREATE ACCESS METHOD Description: define a new access method Syntax: CREATE ACCESS METHOD name TYPE INDEX [ HANDLER handler_function | NO HANDLER ] postgres=# create access method yyy type index no handler; ERROR: syntax error at or near "no" LINE 1: create access method yyy type index no handler; 5 create-am.7.patch: file src/bin/pg_dump/pg_dump.h. Extra comma near DO_POLICY: *** 77,83 **** DO_POST_DATA_BOUNDARY, DO_EVENT_TRIGGER, DO_REFRESH_MATVIEW, ! DO_POLICY } DumpableObjectType; typedef struct _dumpableObject --- 78,84 ---- DO_POST_DATA_BOUNDARY, DO_EVENT_TRIGGER, DO_REFRESH_MATVIEW, ! DO_POLICY, } DumpableObjectType; 6 generic-xlog.7.patch: writeDelta() Seems, even under USE_ASSERT_CHECKING define checking diff by its applying is to expensive. May be, let we use here GENERIC_WAL_DEBUG macros? 7 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c It's unclear for me, what does MATCH_THRESHOLD do or intended for? Pls, add comments here. 8 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c. Again, it's unclear for me, what is motivation of size of PageData.data? 9 generic-xlog.7.patch: GenericXLogRegister(), Seems, emitting an error if caller tries to register buffer which is already registered isn't good idea. In practice, say, SP-GIST, buffer variable is used and page could be easily get from buffer. Suppose, GenericXLogRegister() should not emit an error and ignore isnew flag if case of second registration of the same buffer. 10 bloom-contrib.7.patch: blutils.c: In function 'initBloomState': blutils.c:128:20: warning: variable 'opaque' set but not used [-Wunused-but-set-variable] BloomPageOpaque opaque; 11 I'd really like to see regression tests (TAP-tests?) for replication with generic xlog. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Wed, Feb 17, 2016 at 11:56 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > 11 I'd really like to see regression tests (TAP-tests?) for replication with > generic xlog. The recovery test suite can help to accomplish that. -- Michael
On Wed, Feb 17, 2016 at 5:56 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Next version of patch is attached:
* Documentation for CREATE ACCESS METHOD command is added.
* Refactoring and comments for bloom contrib.
Cool work, nice to see.
Some comments
1 create-am.7.patch: function lookup_index_am_handler_func() why doesn't it emit error in case of emtpy input? If it checks signature of function and
empty handler is not allowed then, seems, all checks about handler have to be moved in lookup_index_am_handler_func().
Fixed. Now it's assumed that lookup_index_am_handler_func() returns valid function Oid.
2 create-am.7.patch: Comment near get_am_name() incorrectly mentions get_am_oid function
Fixed.
3 create-am.7.patch: get_am_name(Oid amOid, char amtype). Seems, amtype argument is overengineering. get_am_name() is used only in error messages and additional check in it will do nothing or even confuse user.
Fixed.
4 create-am.7.patch: Inconsistentcy with psql help. As I can see other code, it's forbidden to create access method without handler
postgres=# \h create access method
Command: CREATE ACCESS METHOD
Description: define a new access method
Syntax:
CREATE ACCESS METHOD name
TYPE INDEX
[ HANDLER handler_function | NO HANDLER ]
postgres=# create access method yyy type index no handler;
ERROR: syntax error at or near "no"
LINE 1: create access method yyy type index no handler;
It comes from inconsistency in docs. Fixed.
5 create-am.7.patch: file src/bin/pg_dump/pg_dump.h. Extra comma near DO_POLICY:
*** 77,83 ****
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
! DO_POLICY
} DumpableObjectType;
typedef struct _dumpableObject
--- 78,84 ----
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
! DO_POLICY,
} DumpableObjectType;
Fixed.
6 generic-xlog.7.patch: writeDelta() Seems, even under USE_ASSERT_CHECKING define checking diff by its applying is to expensive. May be, let we use here GENERIC_WAL_DEBUG macros?
I decided not to introduce special macros for this. Now, this check is enabled on WAL_DEBUG.
7 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c It's unclear for me, what does MATCH_THRESHOLD do or intended for? Pls, add comments here.
Explicit comments are added.
8 generic-xlog.7.patch: src/backend/access/transam/generic_xlog.c. Again, it's unclear for me, what is motivation of size of PageData.data?
Explicit comments are added.
9 generic-xlog.7.patch: GenericXLogRegister(), Seems, emitting an error if caller tries to register buffer which is already registered isn't good idea. In practice, say, SP-GIST, buffer variable is used and page could be easily get from buffer. Suppose, GenericXLogRegister() should not emit an error and ignore isnew flag if case of second registration of the same buffer.
Changed.
10 bloom-contrib.7.patch:
blutils.c: In function 'initBloomState':
blutils.c:128:20: warning: variable 'opaque' set but not used [-Wunused-but-set-variable]
BloomPageOpaque opaque;
Fixed.
11 I'd really like to see regression tests (TAP-tests?) for replication with generic xlog.
TAP test for replication added to bloom contrib. This test run on additional make target wal-check.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения
On Fri, Feb 19, 2016 at 12:51 AM, Alexander Korotkov wrote: >> 11 I'd really like to see regression tests (TAP-tests?) for replication >> with generic xlog. > > > TAP test for replication added to bloom contrib. This test run on additional > make target wal-check. Just putting my eyes on that... diff --git a/contrib/bloom/WALTest.pm b/contrib/bloom/WALTest.pm new file mode 100644 index ...b2daf8b *** a/contrib/bloom/WALTest.pm --- b/contrib/bloom/WALTest.pm This is basically a copy of RewindTest.pm. This is far from generic. If this patch gets committed first I would suggest to wait for the more-generic routines that would be added in PostgresNode.pm and then come back to it. -- Michael
On Fri, Feb 19, 2016 at 4:08 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Feb 19, 2016 at 12:51 AM, Alexander Korotkov wrote:
>> 11 I'd really like to see regression tests (TAP-tests?) for replication
>> with generic xlog.
>
>
> TAP test for replication added to bloom contrib. This test run on additional
> make target wal-check.
Just putting my eyes on that...
diff --git a/contrib/bloom/WALTest.pm b/contrib/bloom/WALTest.pm
new file mode 100644
index ...b2daf8b
*** a/contrib/bloom/WALTest.pm
--- b/contrib/bloom/WALTest.pm
This is basically a copy of RewindTest.pm. This is far from generic.
If this patch gets committed first I would suggest to wait for the
more-generic routines that would be added in PostgresNode.pm and then
come back to it.
Yes, that's it. Now, with committed changes to PostgresNode.pm, I get rid of separate WALTest.pm.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
On Mon, Feb 29, 2016 at 7:42 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Fri, Feb 19, 2016 at 4:08 AM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> This is basically a copy of RewindTest.pm. This is far from generic. >> If this patch gets committed first I would suggest to wait for the >> more-generic routines that would be added in PostgresNode.pm and then >> come back to it. > > > Yes, that's it. Now, with committed changes to PostgresNode.pm, I get rid of > separate WALTest.pm. The test is cleaner in this shape, thanks. + # Run few queries on both master and stanbdy and check their results match. s/stanbdy/standby -- Michael
Hi, I went over the latest version of this, here are my notes. create-am.9: + case DO_ACCESS_METHOD: + snprintf(buf, bufsize, + ""); + return; Missing the actual description. + appendPQExpBuffer(q, "CREATE ACCESS METHOD %s HANDLER %s;\n", + qamname, aminfo->amhandler); Generates invalid syntax (missing TYPE). I don't like that you hardcode 'i' everywhere in the code as am type index. It would be better to define it in pg_am.h and use that. I was also debating in my head if amtype is correct name as it maps to relkind so amkind might be better name for the column, otoh then it won't map to the syntax we have agreed on for CREATE ACCESS METHOD so I guess there is no ideal name here. object_type_map record is missing, as is getObjectTypeDescription and getObjectIdentityParts support (you can check what I sent as part of sequence access methods where I fixed these things, otherwise it reuses most of your code) generic-xlog.9: Not much to say here, the api makes sense to me, it will have performance impact but don't see how we can do this generically without callbacks to extension in any better way. One thing I don't understand is why there is support for unlogged relations. Shouldn't that be handled by whoever is using this to actually not call this at all? It's just call to RelationNeedsWAL() nothing to hard and hidden magic like not doing anything with WAL for the unlogged tables is seldomly good idea. Another small thing is that we put the API explanation comments into .c file not .h file. Didn't look at the bloom index too deeply yet. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi!
Next revision of patches is attached.
On Wed, Mar 9, 2016 at 4:56 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
I went over the latest version of this, here are my notes.
create-am.9:
+ case DO_ACCESS_METHOD:
+ snprintf(buf, bufsize,
+ "");
+ return;
Fixed. 
Missing the actual description.
+ appendPQExpBuffer(q, "CREATE ACCESS METHOD %s HANDLER %s;\n",
+ qamname, aminfo->amhandler);
Generates invalid syntax (missing TYPE).
Fixed. 
I don't like that you hardcode 'i' everywhere in the code as am type index. It would be better to define it in pg_am.h and use that.
Fixed. 
I was also debating in my head if amtype is correct name as it maps to relkind so amkind might be better name for the column, otoh then it won't map to the syntax we have agreed on for CREATE ACCESS METHOD so I guess there is no ideal name here.
I leave it amtype for now. 
object_type_map record is missing, as is getObjectTypeDescription and getObjectIdentityParts support
Fixed.
(you can check what I sent as part of sequence access methods where I fixed these things, otherwise it reuses most of your code)
Thank you! I used it as cheat sheet.
generic-xlog.9:
Not much to say here, the api makes sense to me, it will have performance impact but don't see how we can do this generically without callbacks to extension in any better way.
Yep, performance might be much less than with other resource managers. But that seems to be the only way to do this in extension since we don't want extensions to have redo callbacks. Also, this is the basic version. There could be more advances in future. In the previous version I has more effective handling of data shift inside page. But it was removed for the sake of simplicity to increase chance to commit.
One thing I don't understand is why there is support for unlogged relations. Shouldn't that be handled by whoever is using this to actually not call this at all? It's just call to RelationNeedsWAL() nothing to hard and hidden magic like not doing anything with WAL for the unlogged tables is seldomly good idea.
Besides formation of xlog record generic xlog also does copying page images and atomic replacement of them inside critical section. For sure, it could be done without generic xlog. But I found it useful for generic xlog to support this since code of extension becomes much more simple and elegant. But I can remove it if you insist.
Another small thing is that we put the API explanation comments into .c file not .h file.
Explanation was moved from .h to .c.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения
Hi. As I just said to Tomas Vondra: since your patch creates a new object type, please make sure to add a case to cover it in the object_address.sql test. That verifies some things such as pg_identify_object and related. Thanks, -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi. As I just said to Tomas Vondra: since your patch creates a new
object type, please make sure to add a case to cover it in the
object_address.sql test. That verifies some things such as
pg_identify_object and related.
Good catch, thanks! Tests were added.
I also introduced numbering into patch names to make evident the order to their application.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
On Wed, Mar 9, 2016 at 8:31 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
 
Hi!On Wed, Mar 9, 2016 at 3:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Hi. As I just said to Tomas Vondra: since your patch creates a new
object type, please make sure to add a case to cover it in the
object_address.sql test. That verifies some things such as
pg_identify_object and related.Good catch, thanks! Tests were added.I also introduced numbering into patch names to make evident the order to their application.
Nice to see progress ! I hope to see Alexander' work in 9.6.
I and Teodor will show at PGCon new GIN AM as an extension, optimized for full text search (millisecond FTS) , which we gave up to push into core.
I and Teodor will show at PGCon new GIN AM as an extension, optimized for full text search (millisecond FTS) , which we gave up to push into core.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
> Good catch, thanks! Tests were added. I don't see any objection, is consensus reached? I'm close to comiit that... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 16/03/16 15:31, Teodor Sigaev wrote: >> Good catch, thanks! Tests were added. > > I don't see any objection, is consensus reached? I'm close to comiit > that... > I did only cursory review on the bloom contrib module and don't really have complaints there, but I know you can review that one. I'd like the English of the generic_xlog.c description improved but I won't get to it before weekend. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>> I don't see any objection, is consensus reached? I'm close to comiit >> that... >> > > I did only cursory review on the bloom contrib module and don't really have > complaints there, but I know you can review that one. I'd like the English of > the generic_xlog.c description improved but I won't get to it before weekend. So, per patch status: create-amready generic-xlogneed to fix comments/docs bloom-contribneed review. I'm an original author of this module and of course I can do some review of it but, seems,it's needed more eyes. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev wrote: > > >>I don't see any objection, is consensus reached? I'm close to comiit > >>that... > >> > > > >I did only cursory review on the bloom contrib module and don't really have > >complaints there, but I know you can review that one. I'd like the English of > >the generic_xlog.c description improved but I won't get to it before weekend. > > So, per patch status: > create-am > ready Please wait a bit on this one -- I think more review is warranted. I will try to review it early next week. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alexander Korotkov wrote: > Hi! > > Thank you for review! So. Before this version of the patch was posted in Nov 4th 2015, both Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless, let's pursue this stuff without those commands". http://www.postgresql.org/message-id/54730DFD.2060703@vmware.com (Nov 2014) http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us (Oct 2014) And then Alexander posted this version, without any discussion that evidenced that those old objections were overridden. What happened here? Did somebody discuss this in unarchived meetings? If so, it would be good to know about that in this thread. I noticed this state of affairs because I started reading the complete thread in order to verify whether a consensus had been reached regarding the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only see that Jim Nasby mentioned it and Alexander said "yes they move"; nothing else. The reason for raising this is that, according to Alexander, new index AMs will want to use those structs; but current index AMs only include index_selfuncs.h, and none of them includes selfuncs.h, so it seems completely wrong to be importing all the planner knowledge embodied in selfuncs.h into extension index AMs. One observation is that the bloom AM patch (0003 of this series) uses GenericCosts but not IndexQualInfo. But btcostestimate() and gincostestimate() both use IndexQualInfo, so other AMs might well want to use it too. We could move GenericCosts and the prototype for genericcostestimate() to index_selfuncs.h; that much is pretty reasonable. But I'm not sure what to do about IndexQualInfo. We could just add forward struct declarations for RestrictInfo and Node to index_selfuncs.h. But then the extension code is going to have to import the includes were those are defined anyway. Certainly it seems that deconstruct_indexquals() (which returns a list of IndexQualInfo) is going to be needed by extension index AMs, at least ... I've made a few edits to the patch already, but I need to do some more testing. Particularly I would like to understand the relcache issues to figure out whether the current one is right. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 21/03/16 23:26, Alvaro Herrera wrote: > Alexander Korotkov wrote: >> Hi! >> >> Thank you for review! > > So. Before this version of the patch was posted in Nov 4th 2015, both > Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless, > let's pursue this stuff without those commands". > http://www.postgresql.org/message-id/54730DFD.2060703@vmware.com (Nov 2014) > http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us (Oct 2014) > And in sequence am thread Robert said the opposite. > And then Alexander posted this version, without any discussion that > evidenced that those old objections were overridden. What happened > here? Did somebody discuss this in unarchived meetings? If so, it > would be good to know about that in this thread. Well there are two main reasons for having DDL, one is dependency tracking and the other is binary upgrade. We can solve part of dependency tracking by recoding dependency between opclass and amhandler instead of opclass and access method, that would work fine. I don't know how to clean pg_am on DROP EXTENSION though without the dependency support. I also am not sure what is good way of solving binary upgrade without any kind of DDL. Adding another pg_catalog.binary_upgrade_<something> function would be potential solution if we really think DDL is bad idea for access methods in general. Actually thinking of this, we might actually need function like in any case if we are recoding dependencies on access methods (which means it would probably be better to record dependency of opclass on amhandler as mentioned before, since this is already solved for functions and if the oid of am is not referenced anywhere it does not need special handling for binary upgrade). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
So. Before this version of the patch was posted in Nov 4th 2015, both
Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless,
let's pursue this stuff without those commands".
http://www.postgresql.org/message-id/54730DFD.2060703@vmware.com (Nov 2014)
Yes, that's it. In this email Heikki asserts that INSERTs into pg_am tables in extensions would survive pg_upgrade, because pg_dump will dump CREATE EXTENSION command which execute extension script.
In my response I noticed that it's not correct.  pg_dump in binary upgrade mode works so that we will miss records in pg_am.  I haven't any answer here.
http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us (Oct 2014)
And then Alexander posted this version, without any discussion that
evidenced that those old objections were overridden. What happened
here? Did somebody discuss this in unarchived meetings? If so, it
would be good to know about that in this thread.
After that Tom Lane committed very huge patch about rework AM API.  One of aims of this patch was support for CREATE ACCESS METHOD command.
I've initially positioned this patch as refactoring for support CREATE ACCESS METHOD command.
During discussing Tom mentioned future CREATE ACCESS METHOD command.
I don't think Tom would put so much efforts in this direction if he would remain opposed to this command.
I noticed this state of affairs because I started reading the complete
thread in order to verify whether a consensus had been reached regarding
the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only
see that Jim Nasby mentioned it and Alexander said "yes they move";
nothing else. The reason for raising this is that, according to
Alexander, new index AMs will want to use those structs; but current
index AMs only include index_selfuncs.h, and none of them includes
selfuncs.h, so it seems completely wrong to be importing all the planner
knowledge embodied in selfuncs.h into extension index AMs.
One observation is that the bloom AM patch (0003 of this series) uses
GenericCosts but not IndexQualInfo. But btcostestimate() and
gincostestimate() both use IndexQualInfo, so other AMs might well want
to use it too.
We could move GenericCosts and the prototype for genericcostestimate()
to index_selfuncs.h; that much is pretty reasonable. But I'm not sure
what to do about IndexQualInfo. We could just add forward struct
declarations for RestrictInfo and Node to index_selfuncs.h. But then
the extension code is going to have to import the includes were those
are defined anyway. Certainly it seems that deconstruct_indexquals()
(which returns a list of IndexQualInfo) is going to be needed by
extension index AMs, at least ...
Thank you for highlighting these issues.
I've made a few edits to the patch already, but I need to do some more
testing.
Did you already fix the issues above or do you need me to fix them?
Particularly I would like to understand the relcache issues to
figure out whether the current one is right.
Good point.  I'll also try to find relcache issues.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov wrote: > On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > I noticed this state of affairs because I started reading the complete > > thread in order to verify whether a consensus had been reached regarding > > the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only > > see that Jim Nasby mentioned it and Alexander said "yes they move"; > > nothing else. The reason for raising this is that, according to > > Alexander, new index AMs will want to use those structs; but current > > index AMs only include index_selfuncs.h, and none of them includes > > selfuncs.h, so it seems completely wrong to be importing all the planner > > knowledge embodied in selfuncs.h into extension index AMs. > > > > One observation is that the bloom AM patch (0003 of this series) uses > > GenericCosts but not IndexQualInfo. But btcostestimate() and > > gincostestimate() both use IndexQualInfo, so other AMs might well want > > to use it too. > > > > We could move GenericCosts and the prototype for genericcostestimate() > > to index_selfuncs.h; that much is pretty reasonable. But I'm not sure > > what to do about IndexQualInfo. We could just add forward struct > > declarations for RestrictInfo and Node to index_selfuncs.h. But then > > the extension code is going to have to import the includes were those > > are defined anyway. Certainly it seems that deconstruct_indexquals() > > (which returns a list of IndexQualInfo) is going to be needed by > > extension index AMs, at least ... > > Thank you for highlighting these issues. After thinking some more about this, I think it's all right to move both IndexQualInfo and GenericCosts to selfuncs.h. The separation that we want is this: the selectivity functions themselves need access to a lot of planner knowledge, and so selfuncs.h knows a lot about planner. But the code that _calls_ the selectivity function doesn't; and index_selfuncs.h is about the latter. Properly architected extension index AMs are going to have their selectivity functions in a separate .c file which knows a lot about planner, and which includes selfuncs.h (and maybe index_selfuncs.h if it wants to piggyback on the existing selecticity functions, but that's unlikely); only the prototype for the selfunc is going to be needed elsewhere, and the rest of the index code is not going to know anything about planner stuff so it will not need to include selfuncs.h nor index_selfuncs.h. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 22, 2016 at 5:50 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote: >> And then Alexander posted this version, without any discussion that >> evidenced that those old objections were overridden. What happened >> here? Did somebody discuss this in unarchived meetings? If so, it >> would be good to know about that in this thread. > > After that Tom Lane committed very huge patch about rework AM API. One of > aims of this patch was support for CREATE ACCESS METHOD command. > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=65c5fcd353a859da9e61bfb2b92a99f12937de3b > I've initially positioned this patch as refactoring for support CREATE > ACCESS METHOD command. > http://www.postgresql.org/message-id/CAPpHfdtLiSXmXk2b4tW+4+No_1-T0raO5fOYszhO6+Sn2Om+xw@mail.gmail.com > During discussing Tom mentioned future CREATE ACCESS METHOD command. > http://www.postgresql.org/message-id/16164.1439222900@sss.pgh.pa.us > I don't think Tom would put so much efforts in this direction if he would > remain opposed to this command. Yeah, I rather thought we had consensus on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Tue, Mar 22, 2016 at 5:50 AM, Alexander Korotkov > <aekorotkov@gmail.com> wrote: > >> And then Alexander posted this version, without any discussion that > >> evidenced that those old objections were overridden. What happened > >> here? Did somebody discuss this in unarchived meetings? If so, it > >> would be good to know about that in this thread. > > > > After that Tom Lane committed very huge patch about rework AM API. One of > > aims of this patch was support for CREATE ACCESS METHOD command. > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=65c5fcd353a859da9e61bfb2b92a99f12937de3b > > I've initially positioned this patch as refactoring for support CREATE > > ACCESS METHOD command. > > http://www.postgresql.org/message-id/CAPpHfdtLiSXmXk2b4tW+4+No_1-T0raO5fOYszhO6+Sn2Om+xw@mail.gmail.com > > During discussing Tom mentioned future CREATE ACCESS METHOD command. > > http://www.postgresql.org/message-id/16164.1439222900@sss.pgh.pa.us > > I don't think Tom would put so much efforts in this direction if he would > > remain opposed to this command. > > Yeah, I rather thought we had consensus on that. I vaguely remembered so too, but was startled to see that this thread didn't have any evidence of it -- I thought I was misremembering. I'm glad we're all in the same page. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 22, 2016 at 11:53 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Alexander Korotkov wrote:
> On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> > I noticed this state of affairs because I started reading the complete
> > thread in order to verify whether a consensus had been reached regarding
> > the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only
> > see that Jim Nasby mentioned it and Alexander said "yes they move";
> > nothing else. The reason for raising this is that, according to
> > Alexander, new index AMs will want to use those structs; but current
> > index AMs only include index_selfuncs.h, and none of them includes
> > selfuncs.h, so it seems completely wrong to be importing all the planner
> > knowledge embodied in selfuncs.h into extension index AMs.
> >
> > One observation is that the bloom AM patch (0003 of this series) uses
> > GenericCosts but not IndexQualInfo. But btcostestimate() and
> > gincostestimate() both use IndexQualInfo, so other AMs might well want
> > to use it too.
> >
> > We could move GenericCosts and the prototype for genericcostestimate()
> > to index_selfuncs.h; that much is pretty reasonable. But I'm not sure
> > what to do about IndexQualInfo. We could just add forward struct
> > declarations for RestrictInfo and Node to index_selfuncs.h. But then
> > the extension code is going to have to import the includes were those
> > are defined anyway. Certainly it seems that deconstruct_indexquals()
> > (which returns a list of IndexQualInfo) is going to be needed by
> > extension index AMs, at least ...
>
> Thank you for highlighting these issues.
After thinking some more about this, I think it's all right to move both
IndexQualInfo and GenericCosts to selfuncs.h. The separation that we
want is this: the selectivity functions themselves need access to a lot
of planner knowledge, and so selfuncs.h knows a lot about planner. But
the code that _calls_ the selectivity function doesn't; and
index_selfuncs.h is about the latter. Properly architected extension
index AMs are going to have their selectivity functions in a separate .c
file which knows a lot about planner, and which includes selfuncs.h (and
maybe index_selfuncs.h if it wants to piggyback on the existing
selecticity functions, but that's unlikely); only the prototype for the
selfunc is going to be needed elsewhere, and the rest of the index code
is not going to know anything about planner stuff so it will not need to
include selfuncs.h nor index_selfuncs.h.
Right, the purposes of headers are:
 * selfuncs.h – for those who going to estimate selectivity,
 * index_selfuncs.h – for those who going to call selectivity estimation functions of built-in AMs.
From this point for view we should keep both IndexQualInfo and GenericCosts in selfuncs.h.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
I don't quite see how this is supposed to work:
+ #ifdef WAL_DEBUG
+   /*
+    * If xlog debug is enabled then check produced delta.  Result of delta
+    * application to saved image should be the same as current page state.
+    */
+   if (XLOG_DEBUG)
+   {
+       char    tmp[BLCKSZ];
+       memcpy(tmp, image, BLCKSZ);
+       applyPageRedo(tmp, pageData->data, pageData->dataLen);
+       elog(ERROR, "result of generic xlog apply doesn't match");
+   }
+ #endif
I suppose the elog(ERROR) call should be conditional ...
-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		Hi!
Thank you for committing CREATE ACCESS METHOD command!
On Thu, Mar 24, 2016 at 4:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
I don't quite see how this is supposed to work:
+ #ifdef WAL_DEBUG
+ /*
+ * If xlog debug is enabled then check produced delta. Result of delta
+ * application to saved image should be the same as current page state.
+ */
+ if (XLOG_DEBUG)
+ {
+ char tmp[BLCKSZ];
+ memcpy(tmp, image, BLCKSZ);
+ applyPageRedo(tmp, pageData->data, pageData->dataLen);
+ elog(ERROR, "result of generic xlog apply doesn't match");
+ }
+ #endif
I suppose the elog(ERROR) call should be conditional ...
Good catch.  Check condition was lost between versions.
Attached patches are rebased to master.  Now, it checks that page images match except area between pd_lower and pd_upper. I've tested it with WAL debug and it works.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения
Teodor Sigaev wrote: > So, per patch status: > create-am > ready Teodor agreed to me committing this one instead of him; thanks. I just pushed it after some mostly cosmetic adjustments. I pass the baton back to Teodor for the remaining patches in this series. Thanks, -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote:
> Teodor Sigaev wrote:
> 
> > So, per patch status:
> > create-am
> >     ready
> 
> Teodor agreed to me committing this one instead of him; thanks.  I just
> pushed it after some mostly cosmetic adjustments.
This was maybe too laconic.  I actually changed the code a good bit.
Some of the changes:
* pg_dump support got changed to use selectDumpableAccessMethod to
compare the OID to FirstNormalOid rather than embedding "10000" in the
SQL query.  This is in line with what we do for other
no-schema-qualified object types.
* I changed DROP ACCESS METHOD to use the generic DropStmt instead of
creating a new production.  I find that most of the DropFooStmt
productions are useless -- we should remove them and instead merge
everything into DropStmt.
* I changed get_object_address to use get_object_address_unqualified,
just like all other object types which use no-schema-qualified object
names.  This removes a screenful of code.  I had to revert get_am_oid to
its original state instead of adding the "amtype" argument.  I added
get_index_am_oid.
* In SGML docs (and psql help) I changed the "TYPE INDEX" literal with
"TYPE <replaceable>access_method_type</>".
* I moved OCLASS_AM to a more sensible place rather than just stuffing
it at the end of the list
* I removed more than half the includes in the new file amcmds; there
were not necessary.
* I changed this:
+                        errmsg("function %s must return type \"index_am_handler\"",
+                               NameListToString(handler_name)));
to this:
+                        errmsg("function %s must return type \"%s\"",
+                               NameListToString(handler_name),
+                               "index_am_handler")));
This eases the job of translators: 1) there's no point in presenting the
type name to translate, since it cannot be translated, and 2) all the
equivalent sentences share a single translation instead of needing a
dozen separate translations that only differ in a word that cannot be
translated anyway.  In doing this I noticed that most other uses do not
do this, so I'm going to change them too.
.. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
getAccessMethods.  And we're missing psql tab-complete support for the
new commands.
-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
.. Oh crap. I just noticed I forgot to update a comment in pg_dump's
getAccessMethods. And we're missing psql tab-complete support for the
new commands.
Attached patches fix both these issues.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
On 16/03/16 15:31, Teodor Sigaev wrote:Good catch, thanks! Tests were added.
I don't see any objection, is consensus reached? I'm close to comiit
that...
I did only cursory review on the bloom contrib module and don't really have complaints there, but I know you can review that one. I'd like the English of the generic_xlog.c description improved but I won't get to it before weekend.
What is your plans about English of the generic_xlog.c?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov wrote: > On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > > .. Oh crap. I just noticed I forgot to update a comment in pg_dump's > > getAccessMethods. And we're missing psql tab-complete support for the > > new commands. > > Attached patches fix both these issues. I see Teodor just pushed these two fixes. Thanks! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/28/16 1:26 PM, Alvaro Herrera wrote: > Alexander Korotkov wrote: >> On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com> >> wrote: >> >>> .. Oh crap. I just noticed I forgot to update a comment in pg_dump's >>> getAccessMethods. And we're missing psql tab-complete support for the >>> new commands. >> >> Attached patches fix both these issues. > > I see Teodor just pushed these two fixes. Thanks! Does that mean this patch should be closed or is there more remaining to commit? Thanks, -- -David david@pgmasters.net
On Tue, Mar 29, 2016 at 6:20 PM, David Steele <david@pgmasters.net> wrote:
 
On 3/28/16 1:26 PM, Alvaro Herrera wrote:Alexander Korotkov wrote:On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:.. Oh crap. I just noticed I forgot to update a comment in pg_dump's
getAccessMethods. And we're missing psql tab-complete support for the
new commands.
Attached patches fix both these issues.
I see Teodor just pushed these two fixes. Thanks!
Does that mean this patch should be closed or is there more remaining to commit?
There are still two pending patches:
 * Generic WAL records
 * Bloom filter contrib
Teodor is going to commit them.  But he is waiting Petr Jelinek to review the English of the first one.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
> Does that mean this patch should be closed or is there more remaining to commit? Petr promises to check english in comments/docs in generic-wal patch at this week, bloom patch depends on it. Bloom patch is ready from my point of view. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 29/03/16 17:33, Teodor Sigaev wrote: >> Does that mean this patch should be closed or is there more remaining >> to commit? > > Petr promises to check english in comments/docs in generic-wal patch at > this week, bloom patch depends on it. Bloom patch is ready from my point > of view. > And here it is. It's not perfect but it's better (I am not native speaker either). It's same as v12, just changed comments somewhat. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
> And here it is. It's not perfect but it's better (I am not native speaker > either). It's same as v12, just changed comments somewhat. Thank you, but I have a problems with applying: % patch -p1 < ~/Downloads/0002-generic-xlog.13.patch Hmm... Looks like a new-style context diff to me... ... |diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c |new file mode 100644 |index ...7ca03bf |*** a/src/backend/access/transam/generic_xlog.c |--- b/src/backend/access/transam/generic_xlog.c -------------------------- (Creating file src/backend/access/transam/generic_xlog.c...) Patching file src/backend/access/transam/generic_xlog.c using Plan A... patch: **** malformed patch at line 634: diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Petr Jelinek wrote:
> And here it is. It's not perfect but it's better (I am not native speaker
> either). It's same as v12, just changed comments somewhat.
I think this can still be improved a bit more, in particular this large
comment, which I edit inline for expediency.
> + /*-------------------------------------------------------------------------
> +  * API for construction of generic xlog records
> +  *
> +  * This API allows user to construct generic xlog records which describe
> +  * difference between pages in a generic way.  This is useful for
> +  * extensions which provide custom access methods because they can't
> +  * register their own WAL redo routines.
> +  *
> +  * Each record must be constructed by following these steps:
> +  * 1) GenericXLogStart(relation) - start construction of a generic xlog
> +  *      record for the given relation.
> +  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
> +  *      for the record.  This function returns a copy of the page
> +  *      image where modifications can be performed.  The second argument
> +  *      indicates if the block is new (i.e. a full page image should be taken).
> +  * 3) Apply modification of page images obtained in the previous step.
> +  * 4) GenericXLogFinish() - finish construction of generic xlog record.
> +  *
> +  * The xlog record construction can be canceled at any step by calling
> +  * GenericXLogAbort().  All changes made to page images copies will be
> +  * discarded.
> +  *
> +  * Please, note the following points when constructing generic xlog records.
> +  * - No direct modifications of page images are allowed!  All modifications
> +  *     must be done in the copies returned by GenericXLogRegister().  In other
> +  *     words the code which makes generic xlog records must never call
> +  *     BufferGetPage().
> +  * - Registrations of buffers (step 2) and modifications of page images
> +  *     (step 3) can be mixed in any sequence.  The only restriction is that
> +  *     you can only modify page image after registration of corresponding
> +  *     buffer.
> +  * - After registration, the buffer also can be unregistered by calling
> +  *     GenericXLogUnregister(buffer).  In this case the changes made in
> +  *     that particular page image copy will be discarded.
> +  * - Generic xlog assumes that pages are using standard layout, i.e., all
> +  *     data between pd_lower and pd_upper will be discarded.
> +  * - Maximum number of buffers simultaneously registered for a generic xlog
> +  *     record is MAX_GENERIC_XLOG_PAGES.  An error will be thrown if this limit
> +  *     is exceeded.
> +  * - Since you modify copies of page images, GenericXLogStart() doesn't
> +  *     start a critical section.  Thus, you can do memory allocation, error
> +  *     throwing etc between GenericXLogStart() and GenericXLogFinish().
> +  *     The actual critical section is present inside GenericXLogFinish().
> +  * - GenericXLogFinish() takes care of marking buffers dirty and setting their
> +  *     LSNs.  You don't need to do this explicitly.
> +  * - For unlogged relations, everything works the same except there is no
> +  *     WAL record produced.  Thus, you typically don't need to do any explicit
> +  *     checks for unlogged relations.
> +  * - If registered buffer isn't new, generic xlog record contains delta
> +  *     between old and new page images.  This delta is produced by per byte
> +  *     comparison.  This current delta mechanism is not effective for data shifts
> +  *     inside the page and may be improved in the future.
> +  * - Generic xlog redo function will acquire exclusive locks on buffers
> +  *     in the same order they were registered.  After redo of all changes,
> +  *     the locks will be released in the same order.
> +  *
> +  *
> +  * Internally, delta between pages consists of set of fragments.  Each
> +  * fragment represents changes made in given region of page.  A fragment is
> +  * described as follows:
> +  *
> +  * - offset of page region (OffsetNumber)
> +  * - length of page region (OffsetNumber)
> +  * - data - the data to place into described region ('length' number of bytes)
> +  *
> +  * Unchanged regions of page are not represented in the delta.  As a result,
> +  * the delta can be more compact than full page image.  But if the unchanged region
> +  * of the page is less than fragment header (offset and length) the delta
> +  * would be bigger than the full page image. For this reason we break into fragments
> +  * only if the unchanged region is bigger than MATCH_THRESHOLD.
> +  *
> +  * The worst case for delta size is when we didn't find any unchanged region
> +  * in the page. Then size of delta would be size of page plus size of fragment
> +  * header.
> +  */
> + #define FRAGMENT_HEADER_SIZE    (2 * sizeof(OffsetNumber))
> + #define MATCH_THRESHOLD            FRAGMENT_HEADER_SIZE
> + #define MAX_DELTA_SIZE            BLCKSZ + FRAGMENT_HEADER_SIZE
> +     ereport(ERROR,
> +             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +              errmsg("GenericXLogUnregister: registered buffer not found")));
> + }
These error messages do not conform to our style guideline.  This
particular seems like an internal error, perhaps it should be reported
with elog()?  Not really sure about this.  As for most of the others I
saw, they all have the calling function name in the errmsg() which we
don't do.
> diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
> new file mode 100644
> index 13af485..262deb2
> *** a/src/backend/replication/logical/decode.c
> --- b/src/backend/replication/logical/decode.c
> *************** LogicalDecodingProcessRecord(LogicalDeco
> *** 143,148 ****
> --- 143,149 ----
>           case RM_BRIN_ID:
>           case RM_COMMIT_TS_ID:
>           case RM_REPLORIGIN_ID:
> +         case RM_GENERIC_ID:
>               /* just deal with xid, and done */
>               ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
>                                       buf.origptr);
I'm really unsure about this one.  Here we're saying that logical
decoding won't deal at all with stuff that was done using generic WAL
records.  I think that's fine for index AMs, since logical decoding
doesn't deal with indexes anyway, but if somebody tries to implement
something that's not an index using generic WAL records, it will fail
miserably.  Now maybe that's a restriction we're okay with, but if
that's so we should say that explicitely.
I think this patch should be documented in the WAL chapter.
-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		On 29/03/16 18:25, Alvaro Herrera wrote:
>> + /*-------------------------------------------------------------------------
>> >+  * API for construction of generic xlog records
>> >+  *
>> >+  * This API allows user to construct generic xlog records which describe
>> >+  * difference between pages in a generic way.  This is useful for
>> >+  * extensions which provide custom access methods because they can't
>> >+  * register their own WAL redo routines.
>> >+  *
>> >+  * Each record must be constructed by following these steps:
>> >+  * 1) GenericXLogStart(relation) - start construction of a generic xlog
>> >+  *      record for the given relation.
>> >+  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
>> >+  *      for the record.  This function returns a copy of the page
>> >+  *      image where modifications can be performed.  The second argument
>> >+  *      indicates if the block is new (i.e. a full page image should be taken).
>> >+  * 3) Apply modification of page images obtained in the previous step.
>> >+  * 4) GenericXLogFinish() - finish construction of generic xlog record.
>> >+  *
>> >+  * The xlog record construction can be canceled at any step by calling
>> >+  * GenericXLogAbort().  All changes made to page images copies will be
>> >+  * discarded.
>> >+  *
>> >+  * Please, note the following points when constructing generic xlog records.
>> >+  * - No direct modifications of page images are allowed!  All modifications
>> >+  *     must be done in the copies returned by GenericXLogRegister().  In other
>> >+  *     words the code which makes generic xlog records must never call
>> >+  *     BufferGetPage().
>> >+  * - Registrations of buffers (step 2) and modifications of page images
>> >+  *     (step 3) can be mixed in any sequence.  The only restriction is that
>> >+  *     you can only modify page image after registration of corresponding
>> >+  *     buffer.
>> >+  * - After registration, the buffer also can be unregistered by calling
>> >+  *     GenericXLogUnregister(buffer).  In this case the changes made in
>> >+  *     that particular page image copy will be discarded.
>> >+  * - Generic xlog assumes that pages are using standard layout, i.e., all
>> >+  *     data between pd_lower and pd_upper will be discarded.
>> >+  * - Maximum number of buffers simultaneously registered for a generic xlog
>> >+  *     record is MAX_GENERIC_XLOG_PAGES.  An error will be thrown if this limit
>> >+  *     is exceeded.
>> >+  * - Since you modify copies of page images, GenericXLogStart() doesn't
>> >+  *     start a critical section.  Thus, you can do memory allocation, error
>> >+  *     throwing etc between GenericXLogStart() and GenericXLogFinish().
>> >+  *     The actual critical section is present inside GenericXLogFinish().
>> >+  * - GenericXLogFinish() takes care of marking buffers dirty and setting their
>> >+  *     LSNs.  You don't need to do this explicitly.
>> >+  * - For unlogged relations, everything works the same except there is no
>> >+  *     WAL record produced.  Thus, you typically don't need to do any explicit
>> >+  *     checks for unlogged relations.
>> >+  * - If registered buffer isn't new, generic xlog record contains delta
>> >+  *     between old and new page images.  This delta is produced by per byte
>> >+  *     comparison.  This current delta mechanism is not effective for data shifts
>> >+  *     inside the page and may be improved in the future.
>> >+  * - Generic xlog redo function will acquire exclusive locks on buffers
>> >+  *     in the same order they were registered.  After redo of all changes,
>> >+  *     the locks will be released in the same order.
>> >+  *
>> >+  *
>> >+  * Internally, delta between pages consists of set of fragments.  Each
>> >+  * fragment represents changes made in given region of page.  A fragment is
>> >+  * described as follows:
>> >+  *
>> >+  * - offset of page region (OffsetNumber)
>> >+  * - length of page region (OffsetNumber)
>> >+  * - data - the data to place into described region ('length' number of bytes)
>> >+  *
>> >+  * Unchanged regions of page are not represented in the delta.  As a result,
>> >+  * the delta can be more compact than full page image.  But if the unchanged region
>> >+  * of the page is less than fragment header (offset and length) the delta
>> >+  * would be bigger than the full page image. For this reason we break into fragments
>> >+  * only if the unchanged region is bigger than MATCH_THRESHOLD.
>> >+  *
>> >+  * The worst case for delta size is when we didn't find any unchanged region
>> >+  * in the page. Then size of delta would be size of page plus size of fragment
>> >+  * header.
>> >+  */
>> >+ #define FRAGMENT_HEADER_SIZE    (2 * sizeof(OffsetNumber))
>> >+ #define MATCH_THRESHOLD            FRAGMENT_HEADER_SIZE
>> >+ #define MAX_DELTA_SIZE            BLCKSZ + FRAGMENT_HEADER_SIZE
>
I incorporated your changes and did some additional refinements on top
of them still.
Attached is delta against v12, that should cause less issues when
merging for Teodor.
--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services
			
		Вложения
> I incorporated your changes and did some additional refinements on top of them > still. > > Attached is delta against v12, that should cause less issues when merging for > Teodor. But last version is 13th? BTW, it would be cool to add odcs in VII. Internals chapter, description should be similar to index's interface description, i.e. how to use generic WAL interface. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 29/03/16 19:29, Teodor Sigaev wrote: >> I incorporated your changes and did some additional refinements on top >> of them >> still. >> >> Attached is delta against v12, that should cause less issues when >> merging for >> Teodor. > > But last version is 13th? No, 12 is last version from Alexander afaics, I named my merge 13, but somehow the diff was broken so now I just sent diff against Alexander's work with mine + Alvaro's changes, sorry for confusion. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
I incorporated your changes and did some additional refinements on top of them
still.
Attached is delta against v12, that should cause less issues when merging for
Teodor.
But last version is 13th?
BTW, it would be cool to add odcs in VII. Internals chapter, description should be similar to index's interface description, i.e. how to use generic WAL interface.
We already have generic WAL interface description in comments to generic_xlog.c.  I'm not fan of duplicating things.  What about moving interface description from comments to docs completely?
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
On Tue, Mar 29, 2016 at 8:34 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
 
On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:I incorporated your changes and did some additional refinements on top of them
still.
Attached is delta against v12, that should cause less issues when merging for
Teodor.
But last version is 13th?
BTW, it would be cool to add odcs in VII. Internals chapter, description should be similar to index's interface description, i.e. how to use generic WAL interface.We already have generic WAL interface description in comments to generic_xlog.c. I'm not fan of duplicating things. What about moving interface description from comments to docs completely?
I heard no objections.  There is revision of patch where generic WAL interface description was moved to documentation.  This description contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier (who sent it to me privately).
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
Hello
I did a brief review of bloom contrib and I don't think I like it much.
Here are some issues I believe should be fixed before committing it to
PostgreSQL.
1) Most of the code is not commented. Every procedure should at least
have a breif description of what it does, what arguments it receives
and what it returns. Same for structures and enums.
2) There are only 3 Asserts. For sure there are much more invariants in
this contrib.
3) I don't like this code (blinsert.c):
```
typedef struct
{   BloomState      blstate;   MemoryContext   tmpCtx;   char            data[BLCKSZ];   int64           count;
} BloomBuildState;
/* ... */
/** (Re)initialize cached page in BloomBuildState.*/
static void
initCachedPage(BloomBuildState *buildstate)
{   memset(buildstate->data, 0, BLCKSZ);   BloomInitPage(buildstate->data, 0);    buildstate->count = 0;
}
```
It looks wrong because 1) blkstate and tmpCtx are left uninitialized 2)
as we discussed recently [1] you should avoid leaving "holes" with
uninitialized data in structures. Please fix this or provide a comment
that describes why things are done here the way they are done.
Perhaps there are also other places like this that I missed.
4) I don't think I like such a code either:
```
/* blutuls.c */
static BloomOptions *
makeDefaultBloomOptions(BloomOptions *opts)
{   int i;
   if (!opts)       opts = palloc0(sizeof(BloomOptions));
/* ... */
/* see also blvacuum.c and other places I could miss */
``` 
Sometimes we create a new zero-initialized structure and sometimes we
use a provided one filled with some data. If I'll use this procedure
multiple times in a loop memory will leak. Thus sometimes we need
pfree() returned value manually and sometimes we don't. Such a code is
hard to reason about. You could do it much simpler choosing only one
thing to do --- either allocate a new structure or use a provided one.
5) Code is not pgindent'ed and has many trailing spaces.
[1] http://www.postgresql.org/message-id/56EFF347.20500@anastigmatix.net
-- 
Best regards,
Aleksander Alekseev
http://eax.me/
			
		Referenced by commit commit 473b93287040b20017cc25a157cffdc5b978c254 ("Support CREATE ACCESS METHOD"), commited by
alvherre
			
		GenericXLogStart(Relation relation)
{
...    if (genericXlogStatus != GXLOG_NOT_STARTED)        ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),                errmsg("GenericXLogStart: generic xlog is already
started")));
Hmm, seems, generic wal whiil be in incorrect state if exception occurs between 
GenericXLogStart() and GenericXLogFinish() calls because static variable 
genericXlogStatus will contain GXLOG_LOGGED/GXLOG_UNLOGGED status.
Suppose, it could be solved by different ways
- remove all static variable, so, GenericXLogStart() will return an struct  (object) which incapsulated all data needed
togeneric wal work. As I can  see, in case of exception there isn't ane needing to extra cleanup. Also,  it would allow
touse generic wal for two or more relations at the same time,  although I don't know any useful example for such
feature.
- add callback via RegisterResourceReleaseCallback() which will cleanup state  of genericXlogStatus variable
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 
			
		> as we discussed recently [1] you should avoid leaving "holes" with > uninitialized data in structures. Please fix this or provide a comment > that describes why things are done here the way they are done. > [1] http://www.postgresql.org/message-id/56EFF347.20500@anastigmatix.net That discussion is about SQL-level types which could be stored on disk, not about in-memory structs -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> > http://www.postgresql.org/message-id/56EFF347.20500@anastigmatix.net > That discussion is about SQL-level types which could be stored on > disk, not about in-memory structs I must respectfully disagree. That discussion is also about memory sanitizers and using them on buildfarms. Lets say you initialize a structure like this: st->f1 = 111; st->f2 = 222; ... without using memset, so there could be a "hole" with uninitialized data somewhere in between of f1 and f2. Than some code calculates a hash of this structure or does memcpy - and 1) You get unreproducible behavior - hash is always different for the same structure, thus it is stored in different hash buckets, etc, and as a result you got bugs that sometimes reproduce and sometimes do not 2) There is one more place where sanitizers could report accesses to uninitialized values and thus they still can't be used on buildfarms where they could find a lot of serious bugs automatically. I believe MemorySanitizer is smart enough to recognize trivial memcpy case, but it could be confused in more complicated cases. Anyway I suggest continue discussion of whether we should make PostgreSQL sanitizers-friendly or not in a corresponding thread. So far no one spoke against this idea. Thus I don't think that new patches should complicate implementing it. Especially considering that it's very simple to do and even is considered a good practice according to PostgreSQL documentation. -- Best regards, Aleksander Alekseev http://eax.me/
Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> wrote: > I heard no objections. There is revision of patch where generic WAL > interface description was moved to documentation. This description > contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier Attached are a few more small fixes as an incremental patch (typos / etc.). -- Markus Nullmeier http://www.g-vo.org German Astrophysical Virtual Observatory (GAVO)
Вложения
Hi!
New revision of patches is attached.
Changes are following:
1) API of generic xlog was changed: now there is no static variables, GenericXLogStart() returns palloc'd struct.
2) Generic xlog use elog instead ereport since it reports internal errors which shouldn't happen normally.
3) Error messages doesn't contains name of the function.
4) Bloom contrib was pgindented.
5) More comments for bloomb.
6) One more assert was added to bloom.
7) makeDefaultBloomOptions was renamed to adjustBloomOptions.  Now it only modifies its parameter. palloc is done outside of this function.
8) BloomBuildState is explicitly zeroed.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
On 03/31/16 17:29, Alexander Korotkov wrote: > New revision of patches is attached. > 1) API of generic xlog was changed: now there is no static variables, > GenericXLogStart() returns palloc'd struct. Attached are two trivial documentation editing fixes for this, as an incremental patch. -- Markus Nullmeier http://www.g-vo.org German Astrophysical Virtual Observatory (GAVO)
Вложения
Hello, Alexander > Hi! > > New revision of patches is attached. Code looks much better now, thanks. Still I believe it could be improved. I don't think that using srand() / rand() in signValue procedure the way you did is such a good idea. You create a side affect (changing current randseed) which could cause problems in some cases. And there is no real need for that. For instance you could use following formula instead: hash(attno || hashVal || j) And a few more things. > + memset(opaque, 0, sizeof(BloomPageOpaqueData)); > + opaque->maxoff = 0; This looks a bit redundant. > + for (my $i = 1; $i <= 10; $i++) More idiomatic Perl would be `for my $i (1..10)`. > + UnlockReleaseBuffer(buffer); > + ReleaseBuffer(metaBuffer); > + goto away; In general I don't have anything against goto. But are you sure that using it here is really justified? -- Best regards, Aleksander Alekseev http://eax.me/
Code looks much better now, thanks. Still I believe it could be improved.
I don't think that using srand() / rand() in signValue procedure the
way you did is such a good idea. You create a side affect (changing
current randseed) which could cause problems in some cases. And there
is no real need for that. For instance you could use following formula
instead:
hash(attno || hashVal || j)
I've discussed this with Teodor privately.  Extra hash calculation could cause performance regression.  He proposed to use own random generator instead.  Implemented in attached version of patch.
And a few more things.
> + memset(opaque, 0, sizeof(BloomPageOpaqueData));
> + opaque->maxoff = 0;
This looks a bit redundant.
Fixed.
> + for (my $i = 1; $i <= 10; $i++)
More idiomatic Perl would be `for my $i (1..10)`.
Fixed.
> + UnlockReleaseBuffer(buffer);
> + ReleaseBuffer(metaBuffer);
> + goto away;
In general I don't have anything against goto. But are you sure that
using it here is really justified?
Fixed with small code duplication which seems to be better than goto.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения
Hello > Fixed. Thanks. I don't have any other suggestions at the moment. Let see whats committers opinion on this. -- Best regards, Aleksander Alekseev http://eax.me/