Re: WIP: Avoid creation of the free space map for small tables

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: WIP: Avoid creation of the free space map for small tables
Дата
Msg-id CAA4eK1LYUtHkVknVaKQDF4OjP_LzrbLfTyAG-=5PC_0xn8JiDQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Avoid creation of the free space map for small tables  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sat, Feb 2, 2019 at 7:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jan 28, 2019 at 10:03 AM John Naylor
> > <john.naylor@2ndquadrant.com> wrote:
>
> In the past few days, we have done a further analysis of each problem
> and tried to reproduce it.  We are successful in generating some form
> of reproducer for 3 out of 4 problems in the same way as it was failed
> in the buildfarm.   For the fourth symptom, we have tried a lot (even
> Andrew Dunstan has helped us to run the regression tests with the
> faulty commit on Jacana for many hours, but it didn't got reproduced)
> but not able to regenerate a failure in a similar way.  However, I
> have a theory as mentioned below why the particular test could fail
> and the fix for the same is done in the patch.  I am planning to push
> the latest version of the patch [1] which has fixes for all the
> symptoms.
>

Today, I have spent some more time to generate a test which can
reproduce the failure though it is with the help of breakpoints.  See
below:

> >
> > 4.  Failure on jacana:
> > --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
> > 2018-09-26
> > 17:53:33 -0400
> > +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
> > 2019-01-27 23:14:35
> > -0500
> > @@ -252,332 +252,7 @@
> >      ('(0,100)(0,infinity)'),
> >      ('(-infinity,0)(0,infinity)'),
> >      ('(-infinity,-infinity)(infinity,infinity)');
> > -SET enable_seqscan = false;
> > -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
> > ..
> > ..
> > TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
> > "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
> > Line:
> > 1118)
> > ..
> > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG:  server process
> > (PID 14388) exited with exit code 3
> > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL:  Failed process
> > was running: INSERT INTO box_temp
> > VALUES (NULL),
> >
> > I think the reason for this failure is same as previous (as mentioned
> > in point-3), but this can happen in a different way.  Say, we have
> > searched the local map and then try to extend a relation 'X' and in
> > the meantime, another backend has extended such that it creates FSM.
> > Now, we will reuse that page and won't clear local map.  Now, say we
> > try to insert in relation 'Y' which doesn't have FSM.  It will try to
> > set the local map and will find that it already exists, so will fail.
> > Now, the question is how it can happen in this box.sql test.  I guess
> > that is happening for some system table which is being populated by
> > Create Index statement executed just before the failing Insert.
> >

Based on the above theory, the test is as below:

Session-1
---------------
postgres=# create table test_1(c1 int, c2 char(1500));
CREATE TABLE
postgres=# create table test_2(c1 int, c2 char(1500));
CREATE TABLE
postgres=# insert into test_1 values(generate_series(1,20),'aaaa');
INSERT 0 20
postgres=# insert into test_2 values(1,'aaaa');
INSERT 0 1

Session-2
----------------
postgres=# analyze test_1;
ANALYZE
postgres=# analyze test_2;
ANALYZE
postgres=# select oid, relname, relpages from pg_class where relname
like 'test%';
  oid  |    relname     | relpages
-------+----------------+----------
 41835 | test_1         |        4
 41838 | test_2         |        1

Till here we can see that test_1 has 4 pages and test2_1 has 1 page.
At this stage, even one more record insertion in test_1 will create a
new page.  So, now we have to hit the scenario with the help of
debugger.   For session-1, attach debugger and put breakpoint in
RelationGetBufferForTuple().

Session-1
----------------
postgres=# insert into test_1 values(21,'aaaa');

It will hit the breakpoint in RelationGetBufferForTuple().  Now, add
one more breakpoint on line 542 in hio.c, aka below line:
RelationGetBufferForTuple
{
..
else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))

Press continue and stop the debugger at line 542.  Attach the debugger
for session-2 and add a breakpoint on line 580 in hio.c, aka below
line:
RelationGetBufferForTuple
{
..
buffer = ReadBufferBI(relation, P_NEW, bistate);

Session-2
---------------
postgres=# insert into test_1 values(22,'aaaa');

It will hit the breakpoint in RelationGetBufferForTuple().   Now
proceed with debugging on session-1 by one step.  This is to ensure
that session-1 doesn't get a conditional lock. Now, continue the
debugger in session-2 and after that run vacuum test_1 from session-2,
this will ensure that FSM is created for relation test_1.  So the
state of session-2 will be as below:

Session-2
----------------
postgres=# insert into test_1 values(22,'aaaa');
INSERT 0 1
postgres=# vacuum test_1;
VACUUM
postgres=# select oid, relname, relpages from pg_class where relname
like 'test%';
  oid  |    relname     | relpages
-------+----------------+----------
 41835 | test_1         |        5
 41838 | test_2         |        1

Continue the debugger in session-1.  Now insert one row in test_2 from
session-1 and kaboom:

Session-1
---------------
postgres=# insert into test_2 values(2,'aaaa');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Server logs is as below:
TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
"..\postgresql\src\backend\storage\freespace\freespace.c",
Line: 1118)
2019-02-02 16:48:06.216 IST [4044] LOG:  server process (PID 3540)
exited with exit code 3
2019-02-02 16:48:06.216 IST [4044] DETAIL:  Failed process was
running: insert into test_2 values(2,'aaaa');

This looks exactly the same as the failure in Jacana.

The patch fixes this case.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Graham Leggett
Дата:
Сообщение: Re: DNS SRV support for LDAP authentication
Следующее
От: rajan
Дата:
Сообщение: Able to do ALTER DEFAULT PRIVILEGES from a user who is not theowner