Re: UNDO and in-place update

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: UNDO and in-place update
Дата
Msg-id CA+TgmoZxqc+dLcHZLj=U7stvX_jfZxn+qaKnSz0gr=6251MjVg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UNDO and in-place update  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Список pgsql-hackers
On Sun, Nov 27, 2016 at 8:26 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I see.  autovacuum is certainly almost unpredictable, at least for those who are not aware of its existence and
tuning. Recently, one of our customers faced the inability to perform INSERTs because of xid wraparound. 
> Their workload is INSERT-heavy, and (inefficiently) used autocommit to insert each row, which resulted in the xid
consumptionfaster than the slow xid wraparound autovacuum. 

Yes, that can happen.  The new freeze map stuff in 9.6 should help
with it, even without doing anything like what I am proposing here.
But with this, there is no deferred work at all: once the transaction
is all-visible, the UNDO can be discarded and there's nothing further
to do.

>> > Furthermore, it maybe the best to be able to switch the method for
>> > each table and/or tablespace.  For example, in pgbench, history table
>> uses the current method, and other tables use the UNDO method.  Is it time
>> to introduce a pluggable storage system?
>>
>> IMHO, it's past time for that.
>
> Do you mean by "past time" that the community decided not to introduce pluggable storage manager?  If it's true,
that'sa pity.  But I remember that there was a discussion about pluggable storage manager at PGConf or some other event
thisyear.  Or, do you mean that the current approach should be abandoned and the UNDO approach replace it? 

I mean that we should most definitely introduce a pluggable storage
manager and that I think we should even have done it sooner, before
now.

>> I agree up to a point.  I think we need to design our own system as well
>> as we can, not just copy what others have done.  For example, the design
>> I sketched will work with all of PostgreSQL's existing index types.  You
>> need to modify each AM in order to support in-place updates when a column
>> indexed by that AM has been modified, and that's probably highly desirable,
>> but it's not a hard requirement.  I believe that's a better approach for
>> us than insisting that we have to do it in exactly the same way as some
>> other system.  Now, that doesn't mean we shouldn't learn from what works
>> well and poorly in other systems, but I think our goal here should be to
>> chart the best way forward given PostgreSQL's existing architecture and
>> its existing strengths and weaknesses, rather than to make it exactly like
>> Oracle or MySQL or anything else.  Few people on this mailing list would
>> say that either of those systems are categorically better than PostgreSQL;
>> most, I suspect, would disagree somewhat vigorously.
>
> Yes, agreed.  I didn't intend to just imitate Oracle/MySQL design.  I meant that it will be better to study in
advancewhat trouble Oracle/MySQL design has caused their users, and avoid pitfalls as much as possible.  For example,
whenI ran TPC-B benchmark against Oracle and PostgreSQL, I was embarrassed by frequent deadlocks in Oracle.  It took
sometime for me to find out that INITRANS needs to be tuned with ALTER TABLE.  PostgreSQL ran smoothly without any
tuning.

I agree that we want to stick with our principal of requiring as
little tuning as possible -- and where possible reduce tuning that is
required today.  I have only limited experience with it, but some of
those experiences were frustrating.

> I find your UNDO approach attractive.  On the other hand, I sometimes wonder where PostgreSQL is headed for.  I'm
sometimesasked by database users "How different is PostgreSQL from MySQL?"  If the UNDO approach is taken, PostgreSQL
wouldappear more similar to MySQL.  I don't say that's bad, but I wonder whether we can appeal the new feature in a big
picture. For example, the current (VACUUM) approach would prevent PostgreSQL from becoming a database for
OLTP/analyticsmixed workload, because long-running analytics queries cause tale and index bloat regardless of whether
thosequeries access the same data as the OLTP workload, wouldn't it?  Can we appeal the future of PostgreSQL and the
differencefrom MySQL as "PostgreSQL is pursuing to handle multiple workloads in the same database to better utilize
dataand IT resources.  The new UNDO approach is one step toward that." 

I think there are a lot of differences between PostgreSQL and MySQL
other than whether UNDO is used.  For example, just in the area of
storage, MySQL uses index-organized tables, while PostgreSQL's current
heap and this proposed new type of heap are both flat.  There are
differences in the types of indexing that are offered, the datatypes
available, the features available in every area of the system, and of
course the licensing.  If we get to a point where there are no real
differences in licensing or extensibility or features or any other
area between PostgreSQL and MySQL, then of course it makes no
difference which one people use, but I don't think there is any
prospect of such a thing happening.

PostgreSQL is a relatively feature-complete product at this point.  In
my view, the major thing missing is logical replication, and that's
being worked on.  However, there are two things that, IMHO, are really
hurting us.  The first is tooling, for example around backup and
recovery.  Users familiar with other systems expect things to be
easier to set up and administer than they are in PostgreSQL, and they
expect better tools for automating administrative tasks than are in
fact available.  The second is performance.  The work Andres is doing
to speed up the executor, in which Heikki has expressed interest and
shared ideas; the work on parallel query; Amit Langote's work on
partitioning; Kevin's work on materialized views; Amit Kapila's work
on hash indexes; Tomas Vondra's work on multivariate statistics; and
many other patches currently in flight are all attacks, from various
angles, on the problem of performance, and this idea is, too.  The
assumptions underlying every layer of the system need to be questioned
and challenged, and alternatives need to be explored.  Sometimes that
will mean trying things that don't work; often it will mean building
new things that provide users with choices they don't have today;
other times it will mean rewriting the functionality in place just so
it gets faster.  Some people want CTEs or window functions or XML
manipulation inside the server or more functionality in pgbench and
other people don't care about those things, but the one thing that
virtually everybody cares about is whether queries (and utility
commands) run fast.  I don't think that our performance today sucks -
we wouldn't have gotten this far if that were the case - but there's
clearly room for it to be a lot better, sometimes even by surprisingly
simple techniques (e.g. 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5).
Whether the techniques are simple or complicated, we need to put work
into making the necessary things happen; if other products optimize
and we don't, we'll fall behind.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Autovacuum breakage from a734fd5d1
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Autovacuum breakage from a734fd5d1